Bug 690121

Summary: dmeventd does not react to SIGTERM
Product: [Fedora] Fedora Reporter: Michael Young <m.a.young>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: agk, atu, bmarzins, bmr, circular, dwysocha, heinzm, johannbg, jonathan, lpoetter, lvm-team, mbroz, metherid, mschmidt, msnitzer, notting, plautrba, prajnoha, prockai
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-07 14:51:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output from shutdown none

Description Michael Young 2011-03-23 11:53:15 UTC
I have tried systemd on a few systems, and it is rare that I get a clean shutdown of the system, and often have to turn the power off. I realize that something is probably causing this (though it isn't obvious how to work out what that is), but systemd should have a fall back timeout anyway to stop it hanging for an unreasonable length of time (for most users this probably means about 30 seconds to a minute without some indication that something is happening before they reach for the off switch).
Upstart and its predecessors were much better at getting clean shutdowns.

Comment 1 Lennart Poettering 2011-03-29 00:01:19 UTC
We actually have a 3min timeout on everything.

Comment 2 Lennart Poettering 2011-04-12 12:10:54 UTC
Please boot with "systemd.log_level=debug" and "systemd.log_target=kmsg", then paste the last output generated on screen before the system hangs on shutdown (Photo if necessary).

Comment 3 Michael Young 2011-04-13 22:05:07 UTC
I haven't yet managed to get any useful logging information out as rsyslog shuts down before the delay happens, and I haven't yet got anything on the screen. Is there any way to leave rsyslog running?

Incidentally, I did eventually wait long enough to work out that the system was shutting down eventually, after one or more periods of 3 minutes had passed, but 3 minutes is much too long to wait in most cases because few people will wait that long, particularly if there are multiple 3 minute timeouts.

Comment 4 Michael Young 2011-04-18 21:26:32 UTC
I think it might be lvm2-monitor that isn't shutting down cleanly for some reason, though in a slightly different set up I have problems with libvirt as well.

Comment 5 Michael Young 2011-04-20 21:11:11 UTC
I have worked out how to reproduce the lvm2-monitor issue. If one of the logical volumes has a snapshot, then systemctl list-units | grep lvm reports
lvm2-monitor.service      loaded active running       LSB: Monitoring of LVM2 mi
rrors, snapshots etc. using dmeventd or progress polling
and there is a (probably 3 minute) delay on shutdown. If the system doesn't have any snapshots it reports "exited" rather than "running" and shuts down without significant delay.

Comment 6 Lennart Poettering 2011-04-27 02:14:01 UTC
Please place an executable script like the following in /lib/systemd/systemd-shutdown:

#!/bin/sh
mount / -orw,remount
dmesg > /shutdown.dmesg
mount / -oro,remount

Then, reboot with "systemd.log_level=debug systemd.log_target=kmsg" on the kernel cmdline. Shut down, and wait until things time out and the machine goes down cleanly after 3min.

On next reboot look for the /shutdown.dmesg file and attach it here.

This should explain in detail what is going wrong and what exactly needs to timeout here.

Comment 7 Michael Young 2011-04-27 18:53:25 UTC
Created attachment 495314 [details]
dmesg output from shutdown

I am attaching the shutdown log in the situation where I see libvirtd issues as well as an lvm one.

Comment 8 Lennart Poettering 2011-04-28 17:53:24 UTC
Here's the interesting excerpt:

[ 1574.041572] systemd[1]: lvm2-monitor.service stopping timed out. Killing.
[ 1574.043385] systemd[1]: lvm2-monitor.service changed stop-sigterm -> stop-sigkill
[ 1574.045161] systemd[1]: Running GC...
[ 1574.124767] systemd[1]: Received SIGCHLD from PID 953 (dmeventd).
[ 1574.126794] systemd[1]: Got SIGCHLD for process 953 (dmeventd)
[ 1574.129091] systemd[1]: Child 953 died (code=killed, status=9/KILL)
[ 1574.130853] systemd[1]: Child 953 belongs to lvm2-monitor.service
[ 1574.132854] systemd[1]: lvm2-monitor.service: main process exited, code=killed, status=9
[ 1574.134629] systemd[1]: lvm2-monitor.service changed stop-sigkill -> failed

It seems dmeventd did not react to SIGTERM and after a time out needs to be killed with SIGKILL.

Reassigning to lvm.

Comment 9 Alasdair Kergon 2011-04-28 18:04:50 UTC
The shutdown sequence needs looking at then:  dmeventd should die only when it has no clients left.  Clients should be explicitly disabled first, then it should shutdown OK.

Comment 10 Alasdair Kergon 2011-04-28 18:13:09 UTC
(If it dropped all its clients itself, there'd be data integrity risks - clients must explicitly unregister themselves as no longer needing the monitoring service at the right points during the shutdown sequence.  Different classes of devices may want to unregister themselves at different points during that sequence.)

Comment 11 Alasdair Kergon 2011-04-28 18:15:22 UTC
The sequence to use is likely to be the inverse of the sequence used during initialisation.  Traditionally, people focus on getting init right and forget about shutdown:)

Comment 12 Alasdair Kergon 2011-04-28 18:32:49 UTC
Take the actual sequence of lvchange (or vgchange) and mount commands during initialisation, and invert it as far as possible for shutdown. Generally, you'll unmount a filesystem, then run lvchange -an on its LV.  First for clustered/networked devices (before the cluster/network goes away).  Then for local devices that aren't tied to the root filesystem and which can be unmounted OK.   Then right at the end - ideally when there's nothing more to write to them - there'll be the root filesystem and perhaps one or two more that you can't unmount and have to shutdown with lvchange --monitor n before stopping the lvm monitor service and completing the system shutdown.

Comment 13 Bill Nottingham 2011-04-28 18:38:40 UTC
The implication from comment #12 is that lvm-monitor has never worked as designed? (Because we've never done any of that before.)

Comment 14 Alasdair Kergon 2011-04-28 18:58:09 UTC
It's been pointed out that if there were package upgrades while the system was booted, a 'dmeventd -R' might need to be run before remounting read only and stopping monitoring.

Comment 15 Alasdair Kergon 2011-04-28 19:04:58 UTC
Re: comment #13, lvm-monitor is relatively new, and in RHEL the configurations we generally see avoid these problems.  (I don't recall support cases related to this.)  But with systemd, we're now trying to do everything 'right' to support configurations we expect will become more common.

Comment 16 Lennart Poettering 2011-04-30 01:57:10 UTC
This scheme is really broken and cannot work. If glibc or any other library dmeventd uses (or dmeventd itself) is upgraded, then these binary files cannot really be deleted on the file system they reside on until dmeventd itself is terminated and stops referencing them. Linux will refuse unmounts until those programs stopped running. That basically means with the current scheme you can never cleanly shutdown the file systems if the LVM tools are stored on the FS itself, since you require them be stopped after the fs is gone. That means LVM means constantly dirty fail systems at boot when the user dared to upgrade the system.

This needs to be fixed properly: allow all tools to be terminated cleanly at any time, so that everything can be unmounted/remounted read-only. Then, drop a tiny executable into /lib/systemd/systemd-shutdown which will be executed as very last step of the shutdown, and can sync or dispatch whatever events might have been queued in the meantime.

In F16 we hope to improve the shutdown logic one substantial step further: actually unmount the rootfs itself as part of the late shutdown, by pivot_root()ing into a tmpfs and thus releasing the root fs. With that in place we cannot let any other process run until that time. Your drop-in binary would be copied onto the tmpfs however, and thus would need to be statically compiled.

Anyway, the current scheme of kill-after-umount is borked. You *must* be able to kill the process first, and umount/remount-ro afterwards.

Comment 17 Alasdair Kergon 2011-04-30 13:46:51 UTC
(In reply to comment #16)
> If glibc or any other library
> dmeventd uses (or dmeventd itself) is upgraded, then these binary files cannot
> really be deleted on the file system they reside on until dmeventd itself is
> terminated and stops referencing them.

After the upgrade, dmeventd should of course have been restarted to pick up the new files.  (Yes, that's a separate problem for rpm/yum to learn how to deal with...)

If you're not sure whether or not this happened, as mentioned in comment #14, run 'dmeventd -R' at the appropriate point of the shutdown sequence to reload it afresh.  (This starts a new instance, then the old daemon hands over its state to the new one and exits.)

> Anyway, the current scheme of kill-after-umount is borked. You *must* be able
> to kill the process first, and umount/remount-ro afterwards.

You can do what you need today already without any changes to dm/lvm as I've already described (i.e. no need for any kill if you turn off the monitoring cleanly).  It opens a window where events would be unhandled but as you say, that's unavoidable until we have the new pivot solution.  (A 'dmeventd.static -R' could likely deal with that future handover too.)

Comment 18 Lennart Poettering 2011-05-01 22:26:47 UTC
So, how does this work exactly? Are you saying that if you involve "dmeventd -R" then you can terminate the running instance with SIGTERM?

Can you please update the LVM init script so that it can terminate dmeventd cleanly?

dmeventd -R is not documented in the man page.

Comment 19 Alasdair Kergon 2011-05-01 23:32:47 UTC
-R is mentioned in the upstream man page so will hit rawhide next time we update (but the functionality is already in).

What -R will do for you is eliminate any refs to unlinked files from upgrades where that's the problem.

Are you getting the error 'Not stopping monitoring, this is a dangerous operation. Please use force-stop to override.' from the script?

SIGTERM won't ever work.  If you need forcibly to make it exit (until your clean system shutdown is developed) you have to invoke the script with 'force-stop' so the right commands get run from *outside the daemon* to shut it down cleanly.

Comment 20 Peter Rajnoha 2011-11-07 14:51:38 UTC
Though dmeventd does not react to SIGTERM (and it won't) if it still has devices registered for monitoring, we're using systemd units now where we're switching off monitoring  (and we updated the SysV init script to avoid the bug #681582). 

The hang at shutdown as mentioned in comment #0 should not occur anymore.