Bug 1468514
Summary: | Brick Mux Setup: brick processes(glusterfsd) crash after a restart of volume which was preceded with some actions | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | ||||||
Component: | core | Assignee: | Mohit Agrawal <moagrawa> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rhgs-3.3 | CC: | amukherj, moagrawa, nchilaka, rcyriac, rhs-bugs, storage-qa-internal | ||||||
Target Milestone: | --- | ||||||||
Target Release: | RHGS 3.3.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | brick-multiplexing | ||||||||
Fixed In Version: | glusterfs-3.8.4-34 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1470533 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-09-21 05:02:13 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1470533 | ||||||||
Bug Blocks: | 1417151, 1459400 | ||||||||
Attachments: |
|
Description
Nag Pavan Chilakam
2017-07-07 10:03:35 UTC
note: b1(where umount was done and remounted) and b3 couldn't come up as glusterfsd crashed, b2 was up [root@dhcp35-45 ~]# gluster v info rep3_9 Volume Name: rep3_9 Type: Replicate Volume ID: 0bb1db31-b6cb-4b51-ace6-c9de4f16adc3 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.35.45:/rhs/brick9/rep3_9 Brick2: 10.70.35.130:/rhs/brick9/rep3_9 Brick3: 10.70.35.122:/rhs/brick9/rep3_9 Options Reconfigured: nfs.disable: on transport.address-family: inet cluster.brick-multiplex: on [root@dhcp35-45 ~]# gluster v status rep3_9 Status of volume: rep3_9 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick9/rep3_9 N/A N/A N N/A Brick 10.70.35.130:/rhs/brick9/rep3_9 49152 0 Y 9787 Brick 10.70.35.122:/rhs/brick9/rep3_9 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 29373 Self-heal Daemon on 10.70.35.138 N/A N/A Y 24803 Self-heal Daemon on 10.70.35.130 N/A N/A Y 11239 Self-heal Daemon on 10.70.35.23 N/A N/A Y 1985 Self-heal Daemon on 10.70.35.112 N/A N/A Y 19752 Self-heal Daemon on 10.70.35.122 N/A N/A Y 30171 Task Status of Volume rep3_9 ------------------------------------------------------------------------------ There are no active volume tasks Created attachment 1295251 [details]
thread bt
logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1468514 cores available in dedicated server directories Created attachment 1295255 [details]
cli output
Hi Nag, Test build is available at below link, https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13623735 Please try and share the result if issue is still reproduce after apply the patch Regards Mohit Agrawal upstream patch : https://review.gluster.org/17767 Below is downstream link of patch https://code.engineering.redhat.com/gerrit/112271 blocks on_qa validation of bZ# 1468514 refer https://bugzilla.redhat.com/show_bug.cgi?id=1459400#c8 (In reply to nchilaka from comment #18) > blocks on_qa validation of bZ# 1468514 > refer https://bugzilla.redhat.com/show_bug.cgi?id=1459400#c8 sorry, I meant it blocks BZ#1459400 refer https://bugzilla.redhat.com/show_bug.cgi?id=1459400#c8 Hi Nag, Thanks for sharing the core dump,this core dump is not similar to previous core dump, this time brick process is getting crash in changetimerecorder xlator,please file a separate bugzilla to fix this. Kindly this bugzilla update to verified state also. Regards Mohit Agrawal (In reply to Mohit Agrawal from comment #22) > Hi Nag, > > Thanks for sharing the core dump,this core dump is not similar to previous > core dump, this time brick process is getting crash in changetimerecorder > xlator,please file a separate bugzilla to fix this. > > Kindly this bugzilla update to verified state also. > > Regards > Mohit Agrawal Raised a new bz#1472129 on_qa validation: re-ran the case mentioned in description===>crash not hit and also reran volume restarts for about 150 times , but didn't hit this crash. However hit another crash(all details mentioned in previous comment ie comment#23), hence moving to verified (In reply to nchilaka from comment #24) > on_qa validation: > re-ran the case mentioned in description===>crash not hit > and also reran volume restarts for about 150 times , but didn't hit this > crash. > However hit another crash(all details mentioned in previous comment ie > comment#23), hence moving to verified test build:3.8.4-34 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |