Bug 1022542

Summary:	glusterfsd stop command does not stop bricks
Product:	[Fedora] Fedora	Reporter:	Michael Cronenworth <mike>
Component:	glusterfs	Assignee:	Niels de Vos <ndevos>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	19	CC:	barumuga, Bert.Deknuydt, bugzilla, cfeller, gluster-bugs, joe, jonathansteffan, kkeithle, mike, ndevos, pasik, silas
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.4.1-3.fc19	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-12-14 03:05:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michael Cronenworth 2013-10-23 13:40:44 UTC

Description of problem:
I have two bricks that start up with "systemctl start glusterd". When I issue "systemctl stop glusterd" the bricks remain running unless I kill them manually.


Version-Release number of selected component (if applicable):
glusterfs-3.4.1-1.fc19.x86_64


How reproducible: Always


Steps to Reproduce:
1. systemctl start glusterd
2. systemctl stop glusterd
3. ps -ef | grep gluster

Actual results: glusterd process stops, but bricks remain running.


Expected results: All gluster processes stopped.

Comment 1 Kaleb KEITHLEY 2013-10-23 13:59:24 UTC

This is intentional behavior.

Many of our users want to stop glusterd, e.g. for an upgrade, but leave the glusterfsd (brick server) running.

If you have any questions ping JoeJulian in #gluster (on freenode.net). He's one leading proponent for the current behavior.

Comment 2 Michael Cronenworth 2013-10-23 20:40:51 UTC

Reopening per:
https://lists.fedoraproject.org/pipermail/devel/2013-October/thread.html

`systemctl stop glusterfsd` does not stop bricks.

Comment 3 Joe Julian 2013-10-23 21:20:59 UTC

Just following up since I'm not on that mailing list: In my own configurations, I have 60 bricks per server. It is a severe issue to have a package update restart all 60 bricks simultaneously.

This isn't a problem on init.d based distros since the glusterfsd stop script runs properly at shutdown, which is the only time you'd really want it to stop the service in an automated way (IMHO).

I agree that glusterfsd.service needs to stop correctly at shutdown (or when stopped manually) in systemd, of course.

Comment 4 Niels de Vos 2013-10-26 16:22:00 UTC

I think stopping the glusterfsd processes works correctly with this glusterfsd.service file:


-- %< -- /etc/systemd/system/glusterfsd.service -- %< --
[Unit]
Description=GlusterFS brick processes (stopping only)
After=glusterd.service

[Service]
Type=oneshot
RemainsAfterExit=yes
ExecStart=/bin/true
ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true"
ExecReload=/bin/killall -HUP glusterfsd

[Install]
WantedBy=multi-user.target
-- >% -- >% --

Usage:
- save as /etc/systemd/system/glusterfsd.service
- update systemd with the new configuration:
  # systemctl daemon-reload
- stop the brick processes
  # systemctl stop glusterfsd.service
- stop glusterd
  # systemctl restart glusterd.service
- enable the glusterfsd.service for stopping again
  # systemctl start glusterfsd.service

This should work over a reboot as well. Could someone else please verify that? If it works as expected, I'll file a patch for review+inclusion.

Comment 5 Michael Cronenworth 2013-10-26 17:44:41 UTC

Good work, but there is one typo.

RemainsAfterExit=yes needs to be RemainAfterExit=yes

You also have to "start" the service first (to make it active) before "stop" will function.

Comment 6 Niels de Vos 2013-10-27 08:07:39 UTC

(In reply to Michael Cronenworth from comment #5)
> Good work, but there is one typo.
> 
> RemainsAfterExit=yes needs to be RemainAfterExit=yes

Ah, thanks! I was already wondering why I did not see a change in the 'systemctl status' output.

Comment 7 Niels de Vos 2013-10-27 11:24:41 UTC

This is in fact a Fedora packaging bug. There is no glusterfsd.service in the upstream repository. I'll propose adding that later.

Comment 8 Fedora Update System 2013-10-27 14:32:52 UTC

glusterfs-3.4.1-3.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/glusterfs-3.4.1-3.fc18

Comment 9 Fedora Update System 2013-10-27 14:33:14 UTC

glusterfs-3.4.1-3.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/glusterfs-3.4.1-3.fc19

Comment 10 Fedora Update System 2013-10-27 14:33:33 UTC

glusterfs-3.4.1-3.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/glusterfs-3.4.1-3.fc20

Comment 11 Niels de Vos 2013-10-27 16:47:02 UTC

(In reply to Joe Julian from comment #3)
> Just following up since I'm not on that mailing list: In my own
> configurations, I have 60 bricks per server. It is a severe issue to have a
> package update restart all 60 bricks simultaneously.

I am not sure if I agree with this. Not restarting the glusterfsd processes will keep the previous binaries running. Users that install the update because they want the new features and/or bugfixes will not be aware that they have not 'activated' the new binaries yet. The output from 'rpm -q glusterfs-server' would also not return the version of the binaries that are running... It is very dubious if these disadvantages are less important than a delay while restarting the bricks.

I only now just noticed that the glusterfs.spec tries (always did) to restart the glusterfsd.service. Because the glusterfsd.service was severely broken in the previous releases, that never caused any harm. Updating glusterfs-3.4.1-3 with a future version will execute this (force a restart of glusterfsd processes):

    /bin/systemctl try-restart glusterfsd.service
    /bin/systemctl try-restart glusterd.service

If this is really a blocking issue, speak up again (provide negative karma) and we can pull 3.4.1-3 from the updates-testing repository and provide a 3.4.1-4 that does not do this. However, note that all/most daemons do restart themselves after an update, and so show all the examples of .service and init.d scripts:
- https://fedoraproject.org/wiki/Packaging:ScriptletSnippets#Systemd
- https://fedoraproject.org/wiki/Packaging:SysVInitScript#Initscripts_in_spec_file_scriptlets

Comment 12 Fedora Update System 2013-10-27 18:11:47 UTC

Package glusterfs-3.4.1-3.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glusterfs-3.4.1-3.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-20091/glusterfs-3.4.1-3.fc20
then log in and leave karma (feedback).

Comment 13 Joe Julian 2013-10-28 23:37:58 UTC

GlusterFS is different from every other package I can think of in that it provides a clustered service. Updating the glusterfsd binary is unsafe as a potentially automated process (ie. puppet, yum-cron, etc) as you cannot ensure that other servers aren't restarting their glusterfsd simultaneously and causing split-brain.

Most (if not all) other clustered services are provided by the kernel - which is not binary-installed on upgrade (I don't think anyone would stand for a reboot in the rpm scripts).

If this must be installed in such a fashion, the only safe way to do this would be to walk the volume list and stop each volume before stopping the service.

Comment 14 Michael Cronenworth 2013-10-29 04:30:43 UTC

I think it is unfair to ask for support for upgrading running binaries to not disrupt your services. It certainly isn't something that Fedora has a policy for (the kernel is a special case and isn't an analogy for glusterfs). IMHO you need to adjust your update process at your data center (automatic is hazardous if you value your data integrity as you seem to). Updates can and should be handled as a volatile process. This was the entire reason the Desktop received systemd support to install updates after a reboot, and before system startup, instead of inline.

Comment 15 Kaleb KEITHLEY 2013-10-31 11:50:12 UTC

*** Bug 989045 has been marked as a duplicate of this bug. ***

Comment 16 Joe Julian 2013-10-31 13:39:38 UTC

I do not think it's unfair due to the fact that I'm usually the one offering said support.

None of the rest of your reply addresses anything in comment 13.

Comment 17 Kaleb KEITHLEY 2013-10-31 14:14:16 UTC

Let's skip the fair/unfair rhetoric, it's not helping solve anything.

Gluster predates systemd. The behavior of glusterfsd.service originated with  /etc/init.d/glusterfsd.

It's worth investigating the idea of the yum update scheduling the update to be installed during system startup. Perhaps you (Michael) could show us how that would work.

Comment 18 Joe Julian 2013-10-31 14:55:15 UTC

This bug may also be relevant as it would provide a safer way to prevent split-brain: https://bugzilla.redhat.com/show_bug.cgi?id=872601

As discussed on IRC, another option may just be a sysconfig setting to affect the restart behavior.

Comment 19 Niels de Vos 2013-11-06 09:04:04 UTC

(In reply to Joe Julian from comment #13)
> GlusterFS is different from every other package I can think of in that it
> provides a clustered service. Updating the glusterfsd binary is unsafe as a
> potentially automated process (ie. puppet, yum-cron, etc) as you cannot
> ensure that other servers aren't restarting their glusterfsd simultaneously
> and causing split-brain.

One way to solve this with the current systemd units that are availeble in
glusterfs-3.4.1-3 is:

# systemctl disable glusterfsd.service
# cp /usr/lib/systemd/system/glusterfsd.service \
  /etc/systemd/system/multi-user.target.wants/glusterfsd-shutdown-only.service
# systemctl daemon-reload

If the glusterfsd.service is still active, stopping it will send a 'kill' to
the bricks. Upon the next reboot, the default glusterfsd.service will not get
started (so not stopped, restarted on updates), but the copied
glusterfsd-shutdown-only.service will be active, and will stop the brick
processes on poweroff/reboot.

Joe, is this a solution you can accept and use? If so, please update the karma
on
https://admin.fedoraproject.org/updates/FEDORA-2013-20091/glusterfs-3.4.1-3.fc20.
Maybe you or I should document thus in a blog or on the Gluster Wiki?

Comment 20 Niels de Vos 2013-11-18 13:04:27 UTC

*** Bug 1031640 has been marked as a duplicate of this bug. ***

Comment 21 Niels de Vos 2013-12-11 08:52:31 UTC

(In reply to Niels de Vos from comment #19)
> Joe, is this a solution you can accept and use? If so, please update the
> karma
> on
> https://admin.fedoraproject.org/updates/FEDORA-2013-20091/glusterfs-3.4.1-3.
> fc20.
> Maybe you or I should document thus in a blog or on the Gluster Wiki?

http://blog.nixpanic.net/2013/12/gluster-and-not-restarting-brick.html should contain a clear explanation on how to do achieve not restarting brick processes on updating.

If there are no objections, I will push this update to stable soon.

Comment 22 Fedora Update System 2013-12-14 03:05:29 UTC

glusterfs-3.4.1-3.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 23 Fedora Update System 2013-12-15 03:34:02 UTC

glusterfs-3.4.1-3.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2013-12-15 03:38:32 UTC

glusterfs-3.4.1-3.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.