Bug 1560957
| Summary: | After performing remove-brick followed by add-brick operation, brick went offline state | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Atin Mukherjee <amukherj> |
| Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | mainline | CC: | bmekala, bugs, rhs-bugs, storage-qa-internal, vbellur |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-v4.1.0 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1560955 | Environment: | |
| Last Closed: | 2018-06-20 18:03:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1560955 | ||
|
Comment 1
Atin Mukherjee
2018-03-27 11:13:26 UTC
Revised steps to reproduce: 1. Create and start a volume (with more than one brick) 2. remove the first brick 3. add one more brick , this operation will take a very significant long time (because of this bug) 4. check volume status, all bricks barring the newly added one will report a N/A status. REVIEW: https://review.gluster.org/19800 (glusterd: mark port_registered to true for all running bricks with brick mux) posted (#2) for review on master by Atin Mukherjee COMMIT: https://review.gluster.org/19800 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: mark port_registered to true for all running bricks with brick mux glusterd maintains a boolean flag 'port_registered' which is used to determine if a brick has completed its portmap sign in process. This flag is (re)set in pmap_sigin and pmap_signout events. In case of brick multiplexing this flag is the identifier to determine if the very first brick with which the process is spawned up has completed its sign in process. However in case of glusterd restart when a brick is already identified as running, glusterd does a pmap_registry_bind to ensure its portmap table is updated but this flag isn't which is fine in case of non brick multiplex case but causes an issue if the very first brick which came as part of process is replaced and then the subsequent brick attach will fail. One of the way to validate this is to create and start a volume, remove the first brick and then add-brick a new one. Add-brick operation will take a very long time and post that the volume status will show all other brick status apart from the new brick as down. Solution is to set brickinfo->port_registered to true for all the running bricks when brick multiplexing is enabled. Change-Id: Ib0662d99d0fa66b1538947fd96b43f1cbc04e4ff Fixes: bz#1560957 Signed-off-by: Atin Mukherjee <amukherj> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report. glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html [2] https://www.gluster.org/pipermail/gluster-users/ |