1663557 – RFE: systemd should restart glusterd on crash

Bug 1663557 - RFE: systemd should restart glusterd on crash

Summary: RFE: systemd should restart glusterd on crash

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 3
Assignee:	Srijan Sivakumar
QA Contact:	milind
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1776264
TreeView+	depends on / blocked

Reported:	2019-01-04 20:46 UTC by John Strunk
Modified:	2020-12-17 04:50 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-6.0-38
Doc Type:	Enhancement
Doc Text:	With this update, systemd restarts the glusterd service automatically with a limit of 6 times per hour when the glusterd service crashes.
Clone Of:
Clones:	1776264 (view as bug list)
Environment:
Last Closed:	2020-12-17 04:50:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:5603	0	None	None	None	2020-12-17 04:50:47 UTC

Description John Strunk 2019-01-04 20:46:51 UTC

Description of problem:
Currently, systemd is used to manage glusterd, but after the initial start, it does not ensure glusterd continues to run. Within limits, systemd should attempt to restart glusterd if it crashes in order to better handle transient failures.

Version-Release number of selected component (if applicable):
glusterfs-fuse-3.12.2-25.el7rhgs.x86_64
python2-gluster-3.12.2-25.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-libs-3.12.2-25.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64
glusterfs-cli-3.12.2-25.el7rhgs.x86_64
glusterfs-api-3.12.2-25.el7rhgs.x86_64
glusterfs-3.12.2-25.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
glusterfs-server-3.12.2-25.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
pcp-pmda-gluster-4.3.0-0.201812061439.git24488c63.el7.x86_64
glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
glusterfs-rdma-3.12.2-25.el7rhgs.x86_64

How reproducible:
100%... if glusterd crashes, it stays down.

Steps to Reproduce:
1. Encounter glusterd SEGV
2. Observe the lack of restart

Actual results:
Glusterd is not automatically restarted on failure

Expected results:
For occasional crashes, we should use systemd to restart glusterd

Additional info:
This request comes from my experience maintaining openshift.io. We encounter periodic crashes of gd, usually due to monitoring operations. In order to have automatic recovery from these crashes, I have adjusted the unit file as follows...
In the [Service] section, I have added:

StartLimitBurst=3
StartLimitIntervalSec=3600
StartLimitInterval=3600
Restart=on-abnormal
RestartSec=60

The above causes systemd to automatically restart glusterd if it crashes. It will restart up to 3 times over a 1 hour period. This has the effect of masking the occasional failure, but will leave the daemon down if failures exceed the threshold (at which point other monitoring will raise an alert).

We should consider incorporating the above (or a variant thereof) into the standard distribution.

Comment 2 Yaniv Kaul 2019-06-13 09:16:51 UTC

Sounds easy to implement. Any status update?

Comment 3 Yaniv Kaul 2019-07-01 06:21:30 UTC

Ping?

Comment 5 Sanju 2019-11-25 12:00:51 UTC

upstream patch: https://review.gluster.org/#/c/glusterfs/+/23751

Comment 6 Sanju 2020-01-07 10:09:58 UTC

https://review.gluster.org/#/c/glusterfs/+/23970/ changes number of times glusterd restarts on its own after crash. We set it to with patch mentioned on comment 5 but with https://bugzilla.redhat.com/show_bug.cgi?id=1782200#c6 changing it to 6.

Thanks,
Sanju

Comment 17 errata-xmlrpc 2020-12-17 04:50:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603

Note You need to log in before you can comment on or make changes to this bug.