Bug 1442787
Summary: | Brick Multiplexing: During Remove brick when glusterd of a node is stopped, the brick process gets disconnected from glusterd purview and hence losing multiplexing feature | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | glusterd | Assignee: | Samikshan Bairagya <sbairagy> | |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, gyadav, moagrawa, nchilaka, rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | brick-multiplexing | |||
Fixed In Version: | glusterfs-3.8.4-26 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1451559 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 04:37:54 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1451559 | |||
Bug Blocks: | 1417151 |
Description
Nag Pavan Chilakam
2017-04-17 14:11:55 UTC
logs at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1442787/ refer 35.45 for command logs Not able to access the logs file, while accessing we are getting error "You don't have permission to access /sosreports/nchilaka/bug.1442787/log/glusterfs/glusterd.log on this server." Please update the permission so that we can access the logs. refer https://bugzilla.redhat.com/show_bug.cgi?id=1443991#c6 for initial analysis. upstream patch : https://review.gluster.org/#/c/17101/ Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596 Downstream patches: https://code.engineering.redhat.com/gerrit/#/c/105595/ https://code.engineering.redhat.com/gerrit/#/c/105596/ Still seeing the problem on 3.8.4-25 [root@dhcp35-45 ~]# for i in {1..10};do VOL=aus;echo $VOL;gluster v remove-brick $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i start;done aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Commit failed on 10.70.35.130. Please check log file for details. ===============>it failed for volume "aus-5" aus volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick6/aus-6 is down aus volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick7/aus-7 is down aus volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick8/aus-8 is down aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. [root@dhcp35-45 ~]# gluster v status created a new vol post glusterd start., it can be seen that a new pid is started for brick processs [root@dhcp35-45 ~]# for i in 11;do VOL=aus;echo $VOL;gluster v create $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i;doneaus volume create: aus-11: success: please start the volume to access data [root@dhcp35-45 ~]# gluster v start aus-11 gluster v statuvolume start: aus-11: success [root@dhcp35-45 ~]# gluster v status aus-11 Status of volume: aus-11 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick11/aus-11 49152 0 Y 31813 Brick 10.70.35.130:/rhs/brick11/aus-11 49153 0 Y 5451 Brick 10.70.35.122:/rhs/brick11/aus-11 49152 0 Y 3650 Self-heal Daemon on localhost N/A N/A Y 1494 Self-heal Daemon on 10.70.35.23 N/A N/A N N/A Self-heal Daemon on 10.70.35.130 N/A N/A N N/A Self-heal Daemon on 10.70.35.122 N/A N/A Y 4741 Task Status of Volume aus-11 ------------------------------------------------------------------------------ There are no active volume tasks old vols: Status of volume: aus-9 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick9/aus-9 49152 0 Y 31813 Brick 10.70.35.130:/rhs/brick9/aus-9 49152 0 Y 3907 Brick 10.70.35.122:/rhs/brick9/aus-9 49152 0 Y 3650 Self-heal Daemon on localhost N/A N/A Y 382 Self-heal Daemon on 10.70.35.23 N/A N/A Y 32740 Self-heal Daemon on 10.70.35.122 N/A N/A Y 4458 Self-heal Daemon on 10.70.35.130 N/A N/A Y 5199 Task Status of Volume aus-9 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-45 ~]# So our initial RCA was wrong as we were suspecting this to be a pidfile issue. I've an easier reproducer: 1. Enable brick mux. 2. Create two volumes and start them 3. Restart GlusterD 4. Create 3rd volume and start it, here the brick for 3rd volume will pick up a new pid but following that any new bricks would continue to get attached to the brick process which was spawned post glusterd restart. So in short, post restart of glusterd, the volume which gets created get new pid for its brick and then brick mux starts working onwards. Here the issue is post restart the flag (brickinfo->started_here) with which we were finding out the compatible brick is lost and is never set to true when glusterd has to just connect to a brick in case if brick process is already running. Upstream patch : https://review.gluster.org/#/c/17307/ downstream patch : downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106803/ validation on 3.8.4-27 retired steps mentioned as when this bug was raised(in summary) and also steps in comment#12 I don't see the problem anymore . hence moving to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |