Bug 1530217
Summary: | Brick multiplexing: glustershd fails to start on a volume force start after a brick is down | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, mchangir, nchilaka, rcyriac, rhs-bugs, storage-qa-internal, vbellur | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.3.1 Async | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-52.4 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1530281 1530325 (view as bug list) | Environment: | ||
Last Closed: | 2018-01-11 02:47:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1530281, 1530325, 1530448, 1530449, 1530450 |
Description
Nag Pavan Chilakam
2018-01-02 09:43:09 UTC
also have seen this problem in below scenario too Brick mux:have many vols on a brick mux setup and do a reboot of node Have seen a couple of times shd not starting on one of the nodes at the end eg:10.70.47.44 [root@dhcp47-44 glusterfs]# gluster v status vol_1-1 Status of volume: vol_1-1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp46-8.lab.eng.blr.redhat.com:/glus ter/brick1/vol_1-1 49152 0 Y 3025 Brick dhcp46-44.lab.eng.blr.redhat.com:/glu ster/brick1/vol_1-1 49152 0 Y 8737 Brick dhcp46-53.lab.eng.blr.redhat.com:/glu ster/brick1/vol_1-1 49152 0 Y 9846 Brick dhcp46-117.lab.eng.blr.redhat.com:/gl uster/brick1/vol_1-1 49152 0 Y 8731 Brick dhcp47-44.lab.eng.blr.redhat.com:/glu ster/brick1/vol_1-1 49152 0 Y 2930 Brick dhcp47-181.lab.eng.blr.redhat.com:/gl uster/brick1/vol_1-1 49152 0 Y 9521 Self-heal Daemon on localhost N/A N/A N N/A Self-heal Daemon on dhcp46-8.lab.eng.blr.re dhat.com N/A N/A Y 3016 Self-heal Daemon on 10.70.46.53 N/A N/A Y 26092 Self-heal Daemon on 10.70.46.44 N/A N/A Y 24993 Self-heal Daemon on 10.70.47.181 N/A N/A Y 25826 Self-heal Daemon on 10.70.46.117 N/A N/A Y 24138 Task Status of Volume vol_1-1 ------------------------------------------------------------------------------ There are no active volume tasks Logs are not also refer 1530320 - Brick Multiplexing: brick still down in heal info context(glfs) even though brick is online giving qa_ack as this would be hit by customer if not fixed and it is critical. on a fresh setup ,i did a kill brick and took statedump, pmap entries are cleaned up as expected. also, i have tried the steps in description, 5 times and didnt hit this issue hence moving to verified on 3.8.4-52-4 before brick kill glusterd.pmap_port=49152 glusterd.pmap[49152].type=4 glusterd.pmap[49152].brickname=/gluster/brick1/ecvol_1-1 /gluster/brick2/ecvol_1-2 /gluster/brick3/ecvol_1-3 /gluster/brick1/ecvol_2-1 /gluster/brick2/ecvol_2-2 /gluster/brick3/ecvol_2-3 /gluster/brick1/ecvol_3-1 /gluster/brick2/ecvol_3-2 /gluster/brick3/ecvol_3-3 glusterd.client1.identifier=10.70.46.44:1023 post brick kill glusterd.pmap_port=49152 glusterd.pmap[49152].type=0 glusterd.pmap[49152].brickname=(null) however there were stale and a few inconsistent behaviors with port mapping , will raise new bugs accordingly eg: 1531452 - Brick Multiplexing: Stale glusterd portmapping entries to glusterfsd getting created sometimes on glusterd restart brick mux breaks by spawning 2 fsd instead of one, which i feel we can live with for now(and there is a bug reported already) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0083 |