Bug 690121
Summary: | dmeventd does not react to SIGTERM | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael Young <m.a.young> | ||||
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | agk, atu, bmarzins, bmr, circular, dwysocha, heinzm, johannbg, jonathan, lpoetter, lvm-team, mbroz, metherid, mschmidt, msnitzer, notting, plautrba, prajnoha, prockai | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-11-07 14:51:38 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Michael Young
2011-03-23 11:53:15 UTC
We actually have a 3min timeout on everything. Please boot with "systemd.log_level=debug" and "systemd.log_target=kmsg", then paste the last output generated on screen before the system hangs on shutdown (Photo if necessary). I haven't yet managed to get any useful logging information out as rsyslog shuts down before the delay happens, and I haven't yet got anything on the screen. Is there any way to leave rsyslog running? Incidentally, I did eventually wait long enough to work out that the system was shutting down eventually, after one or more periods of 3 minutes had passed, but 3 minutes is much too long to wait in most cases because few people will wait that long, particularly if there are multiple 3 minute timeouts. I think it might be lvm2-monitor that isn't shutting down cleanly for some reason, though in a slightly different set up I have problems with libvirt as well. I have worked out how to reproduce the lvm2-monitor issue. If one of the logical volumes has a snapshot, then systemctl list-units | grep lvm reports lvm2-monitor.service loaded active running LSB: Monitoring of LVM2 mi rrors, snapshots etc. using dmeventd or progress polling and there is a (probably 3 minute) delay on shutdown. If the system doesn't have any snapshots it reports "exited" rather than "running" and shuts down without significant delay. Please place an executable script like the following in /lib/systemd/systemd-shutdown: #!/bin/sh mount / -orw,remount dmesg > /shutdown.dmesg mount / -oro,remount Then, reboot with "systemd.log_level=debug systemd.log_target=kmsg" on the kernel cmdline. Shut down, and wait until things time out and the machine goes down cleanly after 3min. On next reboot look for the /shutdown.dmesg file and attach it here. This should explain in detail what is going wrong and what exactly needs to timeout here. Created attachment 495314 [details]
dmesg output from shutdown
I am attaching the shutdown log in the situation where I see libvirtd issues as well as an lvm one.
Here's the interesting excerpt: [ 1574.041572] systemd[1]: lvm2-monitor.service stopping timed out. Killing. [ 1574.043385] systemd[1]: lvm2-monitor.service changed stop-sigterm -> stop-sigkill [ 1574.045161] systemd[1]: Running GC... [ 1574.124767] systemd[1]: Received SIGCHLD from PID 953 (dmeventd). [ 1574.126794] systemd[1]: Got SIGCHLD for process 953 (dmeventd) [ 1574.129091] systemd[1]: Child 953 died (code=killed, status=9/KILL) [ 1574.130853] systemd[1]: Child 953 belongs to lvm2-monitor.service [ 1574.132854] systemd[1]: lvm2-monitor.service: main process exited, code=killed, status=9 [ 1574.134629] systemd[1]: lvm2-monitor.service changed stop-sigkill -> failed It seems dmeventd did not react to SIGTERM and after a time out needs to be killed with SIGKILL. Reassigning to lvm. The shutdown sequence needs looking at then: dmeventd should die only when it has no clients left. Clients should be explicitly disabled first, then it should shutdown OK. (If it dropped all its clients itself, there'd be data integrity risks - clients must explicitly unregister themselves as no longer needing the monitoring service at the right points during the shutdown sequence. Different classes of devices may want to unregister themselves at different points during that sequence.) The sequence to use is likely to be the inverse of the sequence used during initialisation. Traditionally, people focus on getting init right and forget about shutdown:) Take the actual sequence of lvchange (or vgchange) and mount commands during initialisation, and invert it as far as possible for shutdown. Generally, you'll unmount a filesystem, then run lvchange -an on its LV. First for clustered/networked devices (before the cluster/network goes away). Then for local devices that aren't tied to the root filesystem and which can be unmounted OK. Then right at the end - ideally when there's nothing more to write to them - there'll be the root filesystem and perhaps one or two more that you can't unmount and have to shutdown with lvchange --monitor n before stopping the lvm monitor service and completing the system shutdown. The implication from comment #12 is that lvm-monitor has never worked as designed? (Because we've never done any of that before.) It's been pointed out that if there were package upgrades while the system was booted, a 'dmeventd -R' might need to be run before remounting read only and stopping monitoring. Re: comment #13, lvm-monitor is relatively new, and in RHEL the configurations we generally see avoid these problems. (I don't recall support cases related to this.) But with systemd, we're now trying to do everything 'right' to support configurations we expect will become more common. This scheme is really broken and cannot work. If glibc or any other library dmeventd uses (or dmeventd itself) is upgraded, then these binary files cannot really be deleted on the file system they reside on until dmeventd itself is terminated and stops referencing them. Linux will refuse unmounts until those programs stopped running. That basically means with the current scheme you can never cleanly shutdown the file systems if the LVM tools are stored on the FS itself, since you require them be stopped after the fs is gone. That means LVM means constantly dirty fail systems at boot when the user dared to upgrade the system. This needs to be fixed properly: allow all tools to be terminated cleanly at any time, so that everything can be unmounted/remounted read-only. Then, drop a tiny executable into /lib/systemd/systemd-shutdown which will be executed as very last step of the shutdown, and can sync or dispatch whatever events might have been queued in the meantime. In F16 we hope to improve the shutdown logic one substantial step further: actually unmount the rootfs itself as part of the late shutdown, by pivot_root()ing into a tmpfs and thus releasing the root fs. With that in place we cannot let any other process run until that time. Your drop-in binary would be copied onto the tmpfs however, and thus would need to be statically compiled. Anyway, the current scheme of kill-after-umount is borked. You *must* be able to kill the process first, and umount/remount-ro afterwards. (In reply to comment #16) > If glibc or any other library > dmeventd uses (or dmeventd itself) is upgraded, then these binary files cannot > really be deleted on the file system they reside on until dmeventd itself is > terminated and stops referencing them. After the upgrade, dmeventd should of course have been restarted to pick up the new files. (Yes, that's a separate problem for rpm/yum to learn how to deal with...) If you're not sure whether or not this happened, as mentioned in comment #14, run 'dmeventd -R' at the appropriate point of the shutdown sequence to reload it afresh. (This starts a new instance, then the old daemon hands over its state to the new one and exits.) > Anyway, the current scheme of kill-after-umount is borked. You *must* be able > to kill the process first, and umount/remount-ro afterwards. You can do what you need today already without any changes to dm/lvm as I've already described (i.e. no need for any kill if you turn off the monitoring cleanly). It opens a window where events would be unhandled but as you say, that's unavoidable until we have the new pivot solution. (A 'dmeventd.static -R' could likely deal with that future handover too.) So, how does this work exactly? Are you saying that if you involve "dmeventd -R" then you can terminate the running instance with SIGTERM? Can you please update the LVM init script so that it can terminate dmeventd cleanly? dmeventd -R is not documented in the man page. -R is mentioned in the upstream man page so will hit rawhide next time we update (but the functionality is already in). What -R will do for you is eliminate any refs to unlinked files from upgrades where that's the problem. Are you getting the error 'Not stopping monitoring, this is a dangerous operation. Please use force-stop to override.' from the script? SIGTERM won't ever work. If you need forcibly to make it exit (until your clean system shutdown is developed) you have to invoke the script with 'force-stop' so the right commands get run from *outside the daemon* to shut it down cleanly. Though dmeventd does not react to SIGTERM (and it won't) if it still has devices registered for monitoring, we're using systemd units now where we're switching off monitoring (and we updated the SysV init script to avoid the bug #681582). The hang at shutdown as mentioned in comment #0 should not occur anymore. |