Bug 1593865 - shd crash on startup
Summary: shd crash on startup
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.3
Hardware: x86_64
OS: Linux
Target Milestone: ---
: RHGS 3.4.0
Assignee: Ravishankar N
QA Contact: nchilaka
: 1519105 (view as bug list)
Depends On: 1596513 1597229 1597230
Blocks: 1503137 1582526 1597663 1598340
TreeView+ depends on / blocked
Reported: 2018-06-21 17:27 UTC by John Strunk
Modified: 2018-10-22 06:04 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.12.2-14
Doc Type: Bug Fix
Doc Text:
glusterd can send heal related requests to self-heal daemon before the latter's graph is fully initialized. In this case, the self-heal daemon used to crash when trying to access certain data structures. With the fix, if the self-heal daemon receives a request before its graph is initialized, it ignores the request.
Clone Of:
Last Closed: 2018-09-04 06:49:14 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1595752 None CLOSED [GSS] Core dump getting created inside gluster pods 2019-04-09 06:16:46 UTC
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:50:42 UTC

Internal Links: 1595752

Description John Strunk 2018-06-21 17:27:20 UTC
Description of problem:
When gluster starts up after a reboot, sometimes self-heal daemon crashes. Result is that volumes don't heal until manual intervention to restart shd.

Version-Release number of selected component (if applicable):
rhgs 3.3.1

$ rpm -aq | grep gluster

How reproducible:
Happens approximately 10% of the time on reboot

Steps to Reproduce:
1. Stop glusterd, bricks, and mounts as per admin guide
2. shutdown -r now
3. check gluster vol status post reboot

Actual results:
Approx 10% of the time, self-heal daemon will not be running, and the pid will be NA in gluster vol status

Expected results:
shd should start up and run properly after reboot

Additional info:

Comment 5 Atin Mukherjee 2018-07-02 04:01:28 UTC
upstream patch : https://review.gluster.org/20422

Comment 11 nchilaka 2018-07-25 15:00:15 UTC

tc#1 polarion RHG3-13523 -->PASS
1. create a replica 3 volume and start it.
2. `while true; do gluster volume heal <volname>;sleep 0.5; done` in one terminal.
3. In another terminal, keep running 'service glusterd restart`

I was seen crash frequently before fix, but now with fix, I didnt see this problem , after running test for an hour

hence moving to verified

However note hit other issues, for which bugs have been reported
BZ#1608352 - brick (glusterfsd) crashed at in pl_trace_flush
BZ#1607888 - backtrace seen in glusterd log when triggering glusterd restart on issuing of index heal (TC#RHG3-13523)

also retried steps in description

didnt hit the shd crash

Comment 14 Srijita Mukherjee 2018-09-03 13:34:06 UTC
Doc text looks good to me.

Comment 15 errata-xmlrpc 2018-09-04 06:49:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 16 Ravishankar N 2018-10-22 06:04:48 UTC
*** Bug 1519105 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.