Bug 1545048 - [brick-mux] process termination race while killing glusterfsd on last brick detach
Summary: [brick-mux] process termination race while killing glusterfsd on last brick d...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-14 06:58 UTC by Milind Changire
Modified: 2018-10-23 15:06 UTC (History)
6 users (show)

Fixed In Version: glusterfs-5.0
Clone Of:
Environment:
Last Closed: 2018-06-20 18:00:23 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
TRACE level logs of volume stop failing for tests/basic/distribute/rebal-all-nodes-migrate.t (8.75 MB, application/x-tar)
2018-02-14 09:02 UTC, Milind Changire
no flags Details

Description Milind Changire 2018-02-14 06:58:30 UTC
Description of problem:
In brick-mux mode, during volume stop, when glusterd sends a brick-detach message to the brick process for the last brick, the brick process responds back to glusterd with an acknowledgment and then kills itself with a SIGTERM signal. All this sounds fine. However, somehow, the response from the brick doesn't reach glusterd and instead a socket disconnect notification reaches glusterd before the response. This causes glusterd to presume that something has gone wrong during volume stop and glusterd then bails the call and fails the volume stop operation causing the test to fail.

This race is reproducible by running the test tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my patch [1]

[1] https://review.gluster.org/19308

-----

The source code for glusterfs_handle_terminate() also has the following comment:
        /*
         * This is terribly unsafe without quiescing or shutting
         * things down properly but it gets us to the point
         * where we can test other stuff.
         *
         * TBD: finish implementing this "detach" code properly
         */


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Milind Changire 2018-02-14 09:02:43 UTC
Created attachment 1395812 [details]
TRACE level logs of volume stop failing for tests/basic/distribute/rebal-all-nodes-migrate.t

Comment 2 Jeff Darcy 2018-02-14 15:14:09 UTC
The comment probably has nothing to do with this particular problem. The detach code is unsafe because failing to perform all cleanup actions could lead to memory/resource leaks or perhaps even crashes in the brick daemon. Those issues are already being addressed in a patch by Mohit, so perhaps you should talk to him.  https://review.gluster.org/#/c/19537/

*This* problem, on the other hand (and like about a hundred similar problems that were fixed in multiplexing before anyone but me even saw it), mostly has to do with how glusterd interacts with brick processes. If we're trying to stop a brick, death of the process in which it lives is a perfectly adequate substitute for getting an RPC reply back, and the glusterd notify code should recognize that. That's how things worked before multiplexing, when there was no detach RPC and thus no response for one. IIRC, we just sent a SIGTERM to the brick process and didn't wait for anything (or clean up anything like the UNIX-domain socket). That was good enough then, so waiting for *any* signal that the brick is gone is an improvement and should be good enough now.

If you really wanted to, you could have the brick wait for glusterd to close its end of the socket first before going away. Somehow, though, what I suspect would happen is that sooner or later we'd find a case where the brick daemon and glusterd are both sitting around waiting forever for the other to close first. Job security, I guess, but not good for our users. The way to build a robust distributed or multi-process system is to *reduce* dependencies on ordering of events.

Comment 3 Worker Ant 2018-02-21 07:20:06 UTC
REVIEW: https://review.gluster.org/19607 (glusterd: handling brick termination in brick-mux) posted (#1) for review on master by Sanju Rakonde

Comment 4 Worker Ant 2018-03-26 08:07:11 UTC
REVIEW: https://review.gluster.org/19607 (glusterd: handling brick termination in brick-mux) posted (#12) for review on master by Sanju Rakonde

Comment 5 Worker Ant 2018-03-28 04:27:46 UTC
COMMIT: https://review.gluster.org/19607 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: handling brick termination in brick-mux

Problem: There's a race between the last glusterfs_handle_terminate()
response sent to glusterd and the kill that happens immediately if the
terminated brick is the last brick.

Solution: When it is a last brick for the brick process, instead of glusterfsd
killing itself, glusterd will kill the process in case of brick multiplexing.
And also changing gf_attach utility accordingly.

Change-Id: I386c19ca592536daa71294a13d9fc89a26d7e8c0
fixes: bz#1545048
BUG: 1545048
Signed-off-by: Sanju Rakonde <srakonde>

Comment 6 Worker Ant 2018-03-29 14:36:36 UTC
REVIEW: https://review.gluster.org/19794 (Revert \"glusterd: handling brick termination in brick-mux\") posted (#2) for review on master by Atin Mukherjee

Comment 7 Worker Ant 2018-03-29 14:58:56 UTC
COMMIT: https://review.gluster.org/19794 committed in master by "Atin Mukherjee" <amukherj> with a commit message- Revert "glusterd: handling brick termination in brick-mux"

This reverts commit a60fc2ddc03134fb23c5ed5c0bcb195e1649416b.

This commit was causing multiple tests to time out when brick 
multiplexing is enabled. With further debugging, it's found that even 
though the volume stop transaction is converted into mgmt_v3 to allow
the remote nodes to follow the synctask framework to process the command,
there are other callers of glusterd_brick_stop () which are not synctask
based.
Change-Id: I7aee687abc6bfeaa70c7447031f55ed4ccd64693
updates: bz#1545048

Comment 8 Worker Ant 2018-04-05 20:44:45 UTC
REVIEW: https://review.gluster.org/19829 (glusterd: handling brick termination in brick-mux) posted (#1) for review on master by Sanju Rakonde

Comment 9 Worker Ant 2018-05-07 15:32:27 UTC
COMMIT: https://review.gluster.org/19829 committed in master by "Amar Tumballi" <amarts> with a commit message- glusterd: handling brick termination in brick-mux

Problem: There's a race between the glusterfs_handle_terminate()
response sent to glusterd from last brick of the process and the
socket disconnect event that encounters after the brick process
got killed.

Solution: When it is a last brick for the brick process, instead of
sending GLUSTERD_BRICK_TERMINATE to brick process, glusterd will
kill the process (same as we do it in case of non brick multiplecing).

The test case is added for https://bugzilla.redhat.com/show_bug.cgi?id=1549996

Change-Id: If94958cd7649ea48d09d6af7803a0f9437a85503
fixes: bz#1545048
Signed-off-by: Sanju Rakonde <srakonde>

Comment 10 Shyamsundar 2018-06-20 18:00:23 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 11 Shyamsundar 2018-10-23 15:06:35 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.