1092183 – GlusterFS and Systemd - start/stop is broken

Bug 1092183 - GlusterFS and Systemd - start/stop is broken

Summary: GlusterFS and Systemd - start/stop is broken

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-28 23:00 UTC by Alexander Murashkin
Modified:	2016-06-17 15:57 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-06-17 15:57:47 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Possible glustershd.service file (666 bytes, text/plain) 2014-04-28 23:00 UTC, Alexander Murashkin	no flags	Details
View All

Description Alexander Murashkin 2014-04-28 23:00:37 UTC

Created attachment 890616 [details]
Possible glustershd.service file

Description of problem:

Specific problem - there is no systemd way to stop (and reload?) glusterfs self-heal-daemon and NFS related daemons.
General problem  - gluster*.service scripts are broken

There is bug 1022542 that is related.

Version-Release number of selected component (if applicable):

glusterfs-server-3.5.0-2.fc20.x86_64
glusterfs-3.5.0-2.fc20.x86_64

Even after all gluster (systemd) services are stopped there are still gluster processes running (glustershd and rpc.statd). In the case of glustershd see the attached service file that can be added.

In general, gluster <-> systemd is broken. 
- There is no way to stop all previously started processes. 
- There are multiple systemd services but they do not own gluster processes. 
- If some process is stopped there is no way to start it back. 

Long term solution will be implementing gluster <-> systemd in such a way that
- systemd services match and own correspondent gluster process set so each process set can be individually stopped/started/reloaded
- gluster, possibly if configured, using systemd API instead of direct process launching
- gluster utility can start/stop/reload gluster processes. 

See also: 

http://blog.nixpanic.net/2013/12/gluster-and-not-restarting-brick.html

How reproducible:

Steps to Reproduce (specific problem - not all processes are stopped):

1. Start GlusterFS server daemons on two servers

    systemctl start glusterd.service
    systemctl start glusterfsd.service

2. Check processes

# ps -ef | grep 'gluste[r]|stat[d]'
root     14168     1  0 00:07 ?        00:00:57 /usr/sbin/glusterd -p /run/glusterd.pid
root     14289     1  0 00:09 ?        00:00:00 /usr/sbin/glusterfsd -s xxx --volfile-id yyy ...
root     14307     1  0 00:09 ?        00:00:55 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid ...
rpcuser  32599     1  0 00:09 ?        00:00:00 /sbin/rpc.statd

3. Stop GlusterFS server daemons in this order

    systemctl stop glusterfsd.service
    systemctl stop glusterd.service

4. Check processes

# ps -ef | grep 'gluste[r]|stat[d]'
root     14307     1  0 00:09 ?        00:00:55 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid ...
rpcuser  32599     1  0 00:09 ?        00:00:00 /sbin/rpc.statd

5. Cleanup - kill glusterfs and statd processes

killall --wait glusterfs
killall --wait rpc.statd

Actual results:

Even after all gluster (systemd) services are stopped there are still gluster processes running (glustershd and rpc.statd)

Expected results:

No such processes.

Additional info:

*** There are multiple systemd services but they do not own gluster processes. 

Running systemctl status glusterd.service shows that glusterd.service owns all processes

  CGroup: /system.slice/glusterd.service
           ├─1294 /usr/sbin/glusterd -p /run/glusterd.pid
           ├─1314 /usr/sbin/glusterfsd -s xxx --volfile-id yyy...
           ├─1324 /usr/sbin/glusterfs -s localhost --volfile-id .../glustershd
           └─1325 /sbin/rpc.statd 

After stopping glusterd.service systemd status shows everything OK

 Main PID: 1294 (code=exited, status=0/SUCCESS)

but glusterfsd, glustershd, and rpc.statd are still running.

glusterfsd.service does not own glusterfsd processes and status shows nothing useful

  Process: 1634 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 1634 (code=exited, status=0/SUCCESS)

but stopping glusterfsd.service will kill processes that it does not own.

*** Want to know where glusterfsd related systemd messages? Not in glusterfsd.service status output.

*** Now that glusterfsd.service is stopped how do we start it back?

Possible midterm solution to glusterfs <-> systemd mismatch is to have 
- only one systemd service that starts/stops/reloads all processes
- control tool that can start/stop/reload individual processes (a script or an addon to gluster tool)

The control tool can be used, for example, during glusterfs software updates.

Comment 1 Niels de Vos 2014-09-17 21:40:21 UTC

(In reply to Alexander Murashkin from comment #0)
...
> Long term solution will be implementing gluster <-> systemd in such a way
> that
> - systemd services match and own correspondent gluster process set so each
> process set can be individually stopped/started/reloaded
> - gluster, possibly if configured, using systemd API instead of direct
> process launching
> - gluster utility can start/stop/reload gluster processes. 

Indeed, that should be a long-term goal. I do not think we can easily fix this otherwise.

Comment 2 Niels de Vos 2016-06-17 15:57:47 UTC

This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.