Description of problem: In brick-mux mode, during volume stop, when glusterd sends a brick-detach message to the brick process for the last brick, the brick process responds back to glusterd with an acknowledgment and then kills itself with a SIGTERM signal. All this sounds fine. However, somehow, the response from the brick doesn't reach glusterd and instead a socket disconnect notification reaches glusterd before the response. This causes glusterd to presume that something has gone wrong during volume stop and glusterd then bails the call and fails the volume stop operation causing the test to fail. This race is reproducible by running the test tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my patch [1] [1] https://review.gluster.org/19308 ----- The source code for glusterfs_handle_terminate() also has the following comment: /* * This is terribly unsafe without quiescing or shutting * things down properly but it gets us to the point * where we can test other stuff. * * TBD: finish implementing this "detach" code properly */ Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1395812 [details] TRACE level logs of volume stop failing for tests/basic/distribute/rebal-all-nodes-migrate.t
The comment probably has nothing to do with this particular problem. The detach code is unsafe because failing to perform all cleanup actions could lead to memory/resource leaks or perhaps even crashes in the brick daemon. Those issues are already being addressed in a patch by Mohit, so perhaps you should talk to him. https://review.gluster.org/#/c/19537/ *This* problem, on the other hand (and like about a hundred similar problems that were fixed in multiplexing before anyone but me even saw it), mostly has to do with how glusterd interacts with brick processes. If we're trying to stop a brick, death of the process in which it lives is a perfectly adequate substitute for getting an RPC reply back, and the glusterd notify code should recognize that. That's how things worked before multiplexing, when there was no detach RPC and thus no response for one. IIRC, we just sent a SIGTERM to the brick process and didn't wait for anything (or clean up anything like the UNIX-domain socket). That was good enough then, so waiting for *any* signal that the brick is gone is an improvement and should be good enough now. If you really wanted to, you could have the brick wait for glusterd to close its end of the socket first before going away. Somehow, though, what I suspect would happen is that sooner or later we'd find a case where the brick daemon and glusterd are both sitting around waiting forever for the other to close first. Job security, I guess, but not good for our users. The way to build a robust distributed or multi-process system is to *reduce* dependencies on ordering of events.
REVIEW: https://review.gluster.org/19607 (glusterd: handling brick termination in brick-mux) posted (#1) for review on master by Sanju Rakonde
REVIEW: https://review.gluster.org/19607 (glusterd: handling brick termination in brick-mux) posted (#12) for review on master by Sanju Rakonde
COMMIT: https://review.gluster.org/19607 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: handling brick termination in brick-mux Problem: There's a race between the last glusterfs_handle_terminate() response sent to glusterd and the kill that happens immediately if the terminated brick is the last brick. Solution: When it is a last brick for the brick process, instead of glusterfsd killing itself, glusterd will kill the process in case of brick multiplexing. And also changing gf_attach utility accordingly. Change-Id: I386c19ca592536daa71294a13d9fc89a26d7e8c0 fixes: bz#1545048 BUG: 1545048 Signed-off-by: Sanju Rakonde <srakonde>
REVIEW: https://review.gluster.org/19794 (Revert \"glusterd: handling brick termination in brick-mux\") posted (#2) for review on master by Atin Mukherjee
COMMIT: https://review.gluster.org/19794 committed in master by "Atin Mukherjee" <amukherj> with a commit message- Revert "glusterd: handling brick termination in brick-mux" This reverts commit a60fc2ddc03134fb23c5ed5c0bcb195e1649416b. This commit was causing multiple tests to time out when brick multiplexing is enabled. With further debugging, it's found that even though the volume stop transaction is converted into mgmt_v3 to allow the remote nodes to follow the synctask framework to process the command, there are other callers of glusterd_brick_stop () which are not synctask based. Change-Id: I7aee687abc6bfeaa70c7447031f55ed4ccd64693 updates: bz#1545048
REVIEW: https://review.gluster.org/19829 (glusterd: handling brick termination in brick-mux) posted (#1) for review on master by Sanju Rakonde
COMMIT: https://review.gluster.org/19829 committed in master by "Amar Tumballi" <amarts> with a commit message- glusterd: handling brick termination in brick-mux Problem: There's a race between the glusterfs_handle_terminate() response sent to glusterd from last brick of the process and the socket disconnect event that encounters after the brick process got killed. Solution: When it is a last brick for the brick process, instead of sending GLUSTERD_BRICK_TERMINATE to brick process, glusterd will kill the process (same as we do it in case of non brick multiplecing). The test case is added for https://bugzilla.redhat.com/show_bug.cgi?id=1549996 Change-Id: If94958cd7649ea48d09d6af7803a0f9437a85503 fixes: bz#1545048 Signed-off-by: Sanju Rakonde <srakonde>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report. glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html [2] https://www.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report. glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html [2] https://www.gluster.org/pipermail/gluster-users/