Description of problem: Currently, systemd is used to manage glusterd, but after the initial start, it does not ensure glusterd continues to run. Within limits, systemd should attempt to restart glusterd if it crashes in order to better handle transient failures. Version-Release number of selected component (if applicable): glusterfs-fuse-3.12.2-25.el7rhgs.x86_64 python2-gluster-3.12.2-25.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-libs-3.12.2-25.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64 glusterfs-cli-3.12.2-25.el7rhgs.x86_64 glusterfs-api-3.12.2-25.el7rhgs.x86_64 glusterfs-3.12.2-25.el7rhgs.x86_64 vdsm-gluster-4.19.43-2.3.el7rhgs.noarch glusterfs-server-3.12.2-25.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 pcp-pmda-gluster-4.3.0-0.201812061439.git24488c63.el7.x86_64 glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 glusterfs-rdma-3.12.2-25.el7rhgs.x86_64 How reproducible: 100%... if glusterd crashes, it stays down. Steps to Reproduce: 1. Encounter glusterd SEGV 2. Observe the lack of restart Actual results: Glusterd is not automatically restarted on failure Expected results: For occasional crashes, we should use systemd to restart glusterd Additional info: This request comes from my experience maintaining openshift.io. We encounter periodic crashes of gd, usually due to monitoring operations. In order to have automatic recovery from these crashes, I have adjusted the unit file as follows... In the [Service] section, I have added: StartLimitBurst=3 StartLimitIntervalSec=3600 StartLimitInterval=3600 Restart=on-abnormal RestartSec=60 The above causes systemd to automatically restart glusterd if it crashes. It will restart up to 3 times over a 1 hour period. This has the effect of masking the occasional failure, but will leave the daemon down if failures exceed the threshold (at which point other monitoring will raise an alert). We should consider incorporating the above (or a variant thereof) into the standard distribution.
REVIEW: https://review.gluster.org/23751 (glusterd: start glusterd automatically on abnormal shutdown) posted (#1) for review on master by Sanju Rakonde
REVIEW: https://review.gluster.org/23751 (glusterd: start glusterd automatically on abnormal shutdown) merged (#2) on master by MOHIT AGRAWAL