Bug 1022542
Summary: | glusterfsd stop command does not stop bricks | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael Cronenworth <mike> |
Component: | glusterfs | Assignee: | Niels de Vos <ndevos> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 19 | CC: | barumuga, Bert.Deknuydt, bugzilla, cfeller, gluster-bugs, joe, jonathansteffan, kkeithle, mike, ndevos, pasik, silas |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.1-3.fc19 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-12-14 03:05:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michael Cronenworth
2013-10-23 13:40:44 UTC
This is intentional behavior. Many of our users want to stop glusterd, e.g. for an upgrade, but leave the glusterfsd (brick server) running. If you have any questions ping JoeJulian in #gluster (on freenode.net). He's one leading proponent for the current behavior. Reopening per: https://lists.fedoraproject.org/pipermail/devel/2013-October/thread.html `systemctl stop glusterfsd` does not stop bricks. Just following up since I'm not on that mailing list: In my own configurations, I have 60 bricks per server. It is a severe issue to have a package update restart all 60 bricks simultaneously. This isn't a problem on init.d based distros since the glusterfsd stop script runs properly at shutdown, which is the only time you'd really want it to stop the service in an automated way (IMHO). I agree that glusterfsd.service needs to stop correctly at shutdown (or when stopped manually) in systemd, of course. I think stopping the glusterfsd processes works correctly with this glusterfsd.service file: -- %< -- /etc/systemd/system/glusterfsd.service -- %< -- [Unit] Description=GlusterFS brick processes (stopping only) After=glusterd.service [Service] Type=oneshot RemainsAfterExit=yes ExecStart=/bin/true ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true" ExecReload=/bin/killall -HUP glusterfsd [Install] WantedBy=multi-user.target -- >% -- >% -- Usage: - save as /etc/systemd/system/glusterfsd.service - update systemd with the new configuration: # systemctl daemon-reload - stop the brick processes # systemctl stop glusterfsd.service - stop glusterd # systemctl restart glusterd.service - enable the glusterfsd.service for stopping again # systemctl start glusterfsd.service This should work over a reboot as well. Could someone else please verify that? If it works as expected, I'll file a patch for review+inclusion. Good work, but there is one typo. RemainsAfterExit=yes needs to be RemainAfterExit=yes You also have to "start" the service first (to make it active) before "stop" will function. (In reply to Michael Cronenworth from comment #5) > Good work, but there is one typo. > > RemainsAfterExit=yes needs to be RemainAfterExit=yes Ah, thanks! I was already wondering why I did not see a change in the 'systemctl status' output. This is in fact a Fedora packaging bug. There is no glusterfsd.service in the upstream repository. I'll propose adding that later. glusterfs-3.4.1-3.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/glusterfs-3.4.1-3.fc18 glusterfs-3.4.1-3.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/glusterfs-3.4.1-3.fc19 glusterfs-3.4.1-3.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/glusterfs-3.4.1-3.fc20 (In reply to Joe Julian from comment #3) > Just following up since I'm not on that mailing list: In my own > configurations, I have 60 bricks per server. It is a severe issue to have a > package update restart all 60 bricks simultaneously. I am not sure if I agree with this. Not restarting the glusterfsd processes will keep the previous binaries running. Users that install the update because they want the new features and/or bugfixes will not be aware that they have not 'activated' the new binaries yet. The output from 'rpm -q glusterfs-server' would also not return the version of the binaries that are running... It is very dubious if these disadvantages are less important than a delay while restarting the bricks. I only now just noticed that the glusterfs.spec tries (always did) to restart the glusterfsd.service. Because the glusterfsd.service was severely broken in the previous releases, that never caused any harm. Updating glusterfs-3.4.1-3 with a future version will execute this (force a restart of glusterfsd processes): /bin/systemctl try-restart glusterfsd.service /bin/systemctl try-restart glusterd.service If this is really a blocking issue, speak up again (provide negative karma) and we can pull 3.4.1-3 from the updates-testing repository and provide a 3.4.1-4 that does not do this. However, note that all/most daemons do restart themselves after an update, and so show all the examples of .service and init.d scripts: - https://fedoraproject.org/wiki/Packaging:ScriptletSnippets#Systemd - https://fedoraproject.org/wiki/Packaging:SysVInitScript#Initscripts_in_spec_file_scriptlets Package glusterfs-3.4.1-3.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing glusterfs-3.4.1-3.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-20091/glusterfs-3.4.1-3.fc20 then log in and leave karma (feedback). GlusterFS is different from every other package I can think of in that it provides a clustered service. Updating the glusterfsd binary is unsafe as a potentially automated process (ie. puppet, yum-cron, etc) as you cannot ensure that other servers aren't restarting their glusterfsd simultaneously and causing split-brain. Most (if not all) other clustered services are provided by the kernel - which is not binary-installed on upgrade (I don't think anyone would stand for a reboot in the rpm scripts). If this must be installed in such a fashion, the only safe way to do this would be to walk the volume list and stop each volume before stopping the service. I think it is unfair to ask for support for upgrading running binaries to not disrupt your services. It certainly isn't something that Fedora has a policy for (the kernel is a special case and isn't an analogy for glusterfs). IMHO you need to adjust your update process at your data center (automatic is hazardous if you value your data integrity as you seem to). Updates can and should be handled as a volatile process. This was the entire reason the Desktop received systemd support to install updates after a reboot, and before system startup, instead of inline. *** Bug 989045 has been marked as a duplicate of this bug. *** I do not think it's unfair due to the fact that I'm usually the one offering said support. None of the rest of your reply addresses anything in comment 13. Let's skip the fair/unfair rhetoric, it's not helping solve anything. Gluster predates systemd. The behavior of glusterfsd.service originated with /etc/init.d/glusterfsd. It's worth investigating the idea of the yum update scheduling the update to be installed during system startup. Perhaps you (Michael) could show us how that would work. This bug may also be relevant as it would provide a safer way to prevent split-brain: https://bugzilla.redhat.com/show_bug.cgi?id=872601 As discussed on IRC, another option may just be a sysconfig setting to affect the restart behavior. (In reply to Joe Julian from comment #13) > GlusterFS is different from every other package I can think of in that it > provides a clustered service. Updating the glusterfsd binary is unsafe as a > potentially automated process (ie. puppet, yum-cron, etc) as you cannot > ensure that other servers aren't restarting their glusterfsd simultaneously > and causing split-brain. One way to solve this with the current systemd units that are availeble in glusterfs-3.4.1-3 is: # systemctl disable glusterfsd.service # cp /usr/lib/systemd/system/glusterfsd.service \ /etc/systemd/system/multi-user.target.wants/glusterfsd-shutdown-only.service # systemctl daemon-reload If the glusterfsd.service is still active, stopping it will send a 'kill' to the bricks. Upon the next reboot, the default glusterfsd.service will not get started (so not stopped, restarted on updates), but the copied glusterfsd-shutdown-only.service will be active, and will stop the brick processes on poweroff/reboot. Joe, is this a solution you can accept and use? If so, please update the karma on https://admin.fedoraproject.org/updates/FEDORA-2013-20091/glusterfs-3.4.1-3.fc20. Maybe you or I should document thus in a blog or on the Gluster Wiki? *** Bug 1031640 has been marked as a duplicate of this bug. *** (In reply to Niels de Vos from comment #19) > Joe, is this a solution you can accept and use? If so, please update the > karma > on > https://admin.fedoraproject.org/updates/FEDORA-2013-20091/glusterfs-3.4.1-3. > fc20. > Maybe you or I should document thus in a blog or on the Gluster Wiki? http://blog.nixpanic.net/2013/12/gluster-and-not-restarting-brick.html should contain a clear explanation on how to do achieve not restarting brick processes on updating. If there are no objections, I will push this update to stable soon. glusterfs-3.4.1-3.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report. glusterfs-3.4.1-3.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. glusterfs-3.4.1-3.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. |