Description of problem: When gluster starts up after a reboot, sometimes self-heal daemon crashes. Result is that volumes don't heal until manual intervention to restart shd. Version-Release number of selected component (if applicable): rhgs 3.3.1 $ rpm -aq | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-cli-3.8.4-54.10.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.10.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.10.el7rhgs.x86_64 glusterfs-api-3.8.4-54.10.el7rhgs.x86_64 python-gluster-3.8.4-54.10.el7rhgs.noarch gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 pcp-pmda-gluster-4.1.0-0.201805281909.git68ab4b18.el7.x86_64 glusterfs-libs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.10.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.2.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.5.x86_64 glusterfs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-server-3.8.4-54.10.el7rhgs.x86_64 glusterfs-rdma-3.8.4-54.10.el7rhgs.x86_64 How reproducible: Happens approximately 10% of the time on reboot Steps to Reproduce: 1. Stop glusterd, bricks, and mounts as per admin guide 2. shutdown -r now 3. check gluster vol status post reboot Actual results: Approx 10% of the time, self-heal daemon will not be running, and the pid will be NA in gluster vol status Expected results: shd should start up and run properly after reboot Additional info:
upstream patch : https://review.gluster.org/20422
testversion:3.12.2-14 tc#1 polarion RHG3-13523 -->PASS 1. create a replica 3 volume and start it. 2. `while true; do gluster volume heal <volname>;sleep 0.5; done` in one terminal. 3. In another terminal, keep running 'service glusterd restart` I was seen crash frequently before fix, but now with fix, I didnt see this problem , after running test for an hour hence moving to verified However note hit other issues, for which bugs have been reported BZ#1608352 - brick (glusterfsd) crashed at in pl_trace_flush BZ#1607888 - backtrace seen in glusterd log when triggering glusterd restart on issuing of index heal (TC#RHG3-13523) also retried steps in description didnt hit the shd crash
Doc text looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607
*** Bug 1519105 has been marked as a duplicate of this bug. ***