Bug 1463221
Summary: | cns-brick-multiplexing: brick process fails to restart after gluster pod failure | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | krishnaram Karthick <kramdoss> | |
Component: | core | Assignee: | Mohit Agrawal <moagrawa> | |
Status: | CLOSED ERRATA | QA Contact: | krishnaram Karthick <kramdoss> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | cns-3.6 | CC: | amukherj, hchiramm, kramdoss, nchilaka, pprakash, rcyriac, rhs-bugs, rtalur, storage-qa-internal, vinug | |
Target Milestone: | --- | Flags: | hchiramm:
needinfo-
|
|
Target Release: | RHGS 3.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | 3.3.0-devel-freeze-exception, brick-multiplexing | |||
Fixed In Version: | glusterfs-3.8.4-32 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1463613 1464072 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 04:59:42 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1464072 | |||
Bug Blocks: | 1417151, 1463613 |
Description
krishnaram Karthick
2017-06-20 11:57:47 UTC
Looking at the setup and the logs, here is my theory what has happened here: 1. brick processes were running with pid (say 1000) before the pod was deleted. 2. When the new pod came up, one non gluster process picked up the same pid 1000 after the very first brick was brought up with pid 1001 and then the subsequent brick start attempts tried to connect to the brick assuming it was running with pid 1000, the following log sequence does confirm that brick went for a connect instead of an attach [2017-06-20 11:14:51.444692] I [glusterd-utils.c:5443:glusterd_brick_start] 0-maa nagement: discovered already-running brick /var/lib/heketi/mounts/vg_a5597e6aa7aa f855a39d96b64a47787ca/brick_15ef1d866be79abfea2446a7c887d192/brick [2017-06-20 11:14:51.444731] I [MSGID: 106143] [glusterd-pmap.c:277:pmap_registrr y_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_a5597e6aa7af855a39d96b64aa 47787ca/brick_15ef1d866be79abfea2446a7c887d192/brick on port 49152 [2017-06-20 11:14:51.444775] I [rpc-clnt.c:1060:rpc_clnt_connection_init] 0-manaa gement: setting frame-timeout to 600 [2017-06-20 11:14:51.444877] I [glusterd-utils.c:5443:glusterd_brick_start] 0-maa nagement: discovered already-running brick /var/lib/heketi/mounts/vg_a5597e6aa7aa f855a39d96b64a47787ca/brick_d523daa447130512618fd72da50cad72/brick [2017-06-20 11:14:51.444901] I [MSGID: 106143] [glusterd-pmap.c:277:pmap_registrr y_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_a5597e6aa7af855a39d96b64aa 47787ca/brick_d523daa447130512618fd72da50cad72/brick on port 49152 3. and the theory is more concrete when we see that glusterd gets disconnect notifications due to this [2017-06-20 11:14:51.507841] I [socket.c:2465:socket_event_handler] 0-transport:: EPOLLERR - disconnecting now [2017-06-20 11:14:51.509668] I [MSGID: 106005] [glusterd-handler.c:5687:__glustee rd_brick_rpc_notify] 0-management: Brick 10.70.47.59:/var/lib/heketi/mounts/vg_66 1f1e5206bf507bc54aef6984e91157b/brick_08a4ad42cdf24e5e3b56bd940c6b243b/brick hass disconnected from glusterd. [2017-06-20 11:14:51.511046] I [socket.c:2465:socket_event_handler] 0-transport:: EPOLLERR - disconnecting now [2017-06-20 11:14:51.512293] I [MSGID: 106005] [glusterd-handler.c:5687:__glustee rd_brick_rpc_notify] 0-management: Brick 10.70.47.59:/var/lib/heketi/mounts/vg_ee 863cbc857b5d64d58d31afd42383234/brick_1146fd5fda435fedf6241a0f3c84a3e6/brick hass disconnected from glusterd. [2017-06-20 11:14:51.513633] I [socket.c:2465:socket_event_handler] 0-transport:: EPOLLERR - disconnecting now [2017-06-20 11:14:51.516157] I [MSGID: 106005] [glusterd-handler.c:5687:__glustee rd_brick_rpc_notify] 0-management: Brick 10.70.47.59:/var/lib/heketi/mounts/vg_ee 863cbc857b5d64d58d31afd42383234/brick_7e1cfecf225612a67464e891b391ebc5/brick has disconnected from glusterd. Ideally there are two problems here: 1. the pidfiles shouldn't be in /var/lib/glusterd as these files shouldn't be persisted on node reboot. 2. is_gf_pid_running should be enhanced in a way so that we should have a stricter check of if its a gluster process. Doing 2 would actually solve the problem for now. However we should try to finish the ongoing work https://review.gluster.org/#/c/13580/ . I think we can resolve the same without change is_gf_pid_running. Below is the code path in glusterd_brick_start in which we are trying to connect with brick if it is running. I think before call glusterd_brick_connect here we can validate socketpath, if pid(glusterfs) is not running the socketpath value will not be valid we can move out(goto) from the scope instead of calling glusterd_brick_connect. >>>>>>>>>>>>> if (is_brick_mx_enabled ()) glusterd_get_sock_from_brick_pid (pid, socketpath, sizeof(socketpath)); else glusterd_set_brick_socket_filepath (volinfo, brickinfo, socketpath, sizeof (socketpath)); gf_log (this->name, GF_LOG_DEBUG, "Using %s as sockfile for brick %s of volume %s ", socketpath, brickinfo->path, volinfo->volname); (void) glusterd_brick_connect (volinfo, brickinfo, socketpath); } return 0; } >>>>>>>>>>>>>>>>> Regards Mohit Agrawal (In reply to Mohit Agrawal from comment #10) > I think we can resolve the same without change is_gf_pid_running. > Below is the code path in glusterd_brick_start in which we are trying to > connect with brick if it is running. > > I think before call glusterd_brick_connect here we can validate socketpath, > if pid(glusterfs) is not running the socketpath value will not be valid we > can move out(goto) from the scope instead of calling glusterd_brick_connect. > > >>>>>>>>>>>>> > > if (is_brick_mx_enabled ()) > glusterd_get_sock_from_brick_pid (pid, > socketpath, > > sizeof(socketpath)); > else > glusterd_set_brick_socket_filepath (volinfo, > brickinfo, > > socketpath, > sizeof > (socketpath)); > gf_log (this->name, GF_LOG_DEBUG, > "Using %s as sockfile for brick %s of volume > %s ", > socketpath, brickinfo->path, > volinfo->volname); > (void) glusterd_brick_connect (volinfo, brickinfo, > socketpath); > } > return 0; > } > > >>>>>>>>>>>>>>>>> We can't move to out as we are actually supposed to start the brick, you'd basically need to come out of the if block and proceed further. Otherwise I like the idea. > > > > Regards > Mohit Agrawal I want to say control will come out from gf_is_service_running scope and start to execute the code from below point ret = _mk_rundir_p (volinfo); if (ret) goto out; other_brick = find_compatible_brick (conf, volinfo, brickinfo, &other_vol); if (other_brick) { ret = attach_brick (this, brickinfo, other_brick, volinfo, other_vol); if (ret == 0) { goto out; } } Regards Mohit Agrawal upstream patch : https://review.gluster.org/#/c/17601 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/110184/ Verified this bug in cns build - cns-deploy-5.0.0-15.el7rhgs.x86_64 (glusterfs-3.8.4-40.el7rhgs.x86_64) Deleting gluster pod with 100 volumes brings up the gluster pod and all the bricks are up, self-heal completes. Moving the bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |