Description of problem: See https://bugzilla.redhat.com/show_bug.cgi?id=681582#c61. Actual results: SysV init script seems to be executed properly, but still the hang occurs during systemd's systemctl execution. Additional info: Attaching logs from the lvm2-monitor sysv init script execution and strace of the "systemctl stop lvm2-monitor.service" that hangs.
Created attachment 529256 [details] Log from the lvm2-monitor SysV script execution on "lvm2-monitor stop"
Created attachment 529257 [details] Strace log of "systemctl stop lvm2-monitor.service"
Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg", reproduce the hang and attach the output of "dmesg". While the hang is occuring, does "systemctl status lvm2-monitor.service" run from another terminal finish successfully? What does it say?
After clean boot I was able to stop lvm2-monitor service properly and immediately. But after I ran virt-manager, started up a virtual machine (using LV as its disk) and destroyed the virtual machine, the problem with stopping lvm2-monitor service occurred. While "systemctl stop lvm2-monitor.service" was in progress (hanging), I ran this in another terminal: # systemctl status lvm2-monitor.service lvm2-monitor.service - LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling Loaded: loaded (/etc/rc.d/init.d/lvm2-monitor) Active: deactivating (stop-sigterm) since Thu, 20 Oct 2011 15:30:15 +0200; 22s ago Process: 2935 ExecStop=/etc/rc.d/init.d/lvm2-monitor stop (code=exited, status=0/SUCCESS) Process: 2786 ExecStart=/etc/rc.d/init.d/lvm2-monitor start (code=exited, status=0/SUCCESS) Main PID: 2795 (dmeventd) CGroup: name=systemd:/system/lvm2-monitor.service └ 2795 /sbin/dmeventd After the "systemctl stop lvm2-monitor.service" ended (in several minutes), I gathered the output of "dmesg" and I append it here.
Created attachment 529278 [details] dmesg output
So it did not hang completely, but it took 5 minutes to finish. "/etc/rc.d/init.d/lvm2-monitor stop" finished with success, but PID 2795 kept running. systemd tried to kill it with SIGTERM. After 5 minutes it was still running, so systemd killed it with SIGKILL.
(In reply to comment #6) > So it did not hang completely, but it took 5 minutes to finish. > "/etc/rc.d/init.d/lvm2-monitor stop" finished with success, but PID 2795 kept > running. systemd tried to kill it with SIGTERM. After 5 minutes it was still > running, so systemd killed it with SIGKILL. "lvm2-monitor stop" causes all monitored devices (lvm2 snaphsot and mirror volumes) to be unregistered from dmeventd monitoring. Of course, the dmeventd is still there and if there's nothing to monitor, it could be killed by SIGTERM normally. If dmeventd still has devices that are being monitored, SIGKILL is needed (SIGTERM will be ignored in this case). But based on the log - all the LVM2 volumes were unmonitored: 3 logical volume(s) in volume group "encvg" unmonitored 12 logical volume(s) in volume group "vg" unmonitored Kamil, can you please try running quickly this sequence (assuming the "encvg" and "vg" are the only vgs on your machine): vgchange --monitor n encvg vgchange --monitor n vg kill <dmeventd_pid> ...and check if dmeventd is really killed and gone. If not, then this is probably an lvm2 bug where devices are not fully unmonitored for some reason.
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
(In reply to comment #7) > Kamil, can you please try running quickly this sequence (assuming the "encvg" > and "vg" are the only vgs on your machine): > > vgchange --monitor n encvg > vgchange --monitor n vg > kill <dmeventd_pid> > > ...and check if dmeventd is really killed and gone. This works. Also I tested that when I run "systemctl stop lvm2-monitor.service" and it waits for a long time, and when I switch to another terminal and send SIGTERM to dmeventd, then systemctl call ends immediately. It seems as if systemd didn't send SIGTERM to dmeventd at all (or it was ignored, but my manual signal is not ignored).
(In reply to comment #9) > It seems as if systemd didn't send SIGTERM to dmeventd at all (or it was > ignored, but my manual signal is not ignored). You can verify if the SIGTERM is being sent: echo "1" > /sys/kernel/debug/tracing/events/syscalls/sys_enter_kill/enable systemctl stop lvm2-monitor.service & cat /sys/kernel/debug/tracing/trace Also check /var/log/messages for something like: There are still devices being monitored. Refusing to exit.
I no longer have Fedora 15, I upgraded to Fedora 16. Does it still apply? I do not see any lvm related shutdown problems now. > Also check /var/log/messages for something like: > There are still devices being monitored. > Refusing to exit. I see these messages all the time on all my machines (but the systems usually shut down OK).
(In reply to comment #11) > I no longer have Fedora 15, I upgraded to Fedora 16. Does it still apply? I do > not see any lvm related shutdown problems now. > We have native systemd unit files for LVM2 in F16 so the problem should not occur anymore in F16+. > > Also check /var/log/messages for something like: > > There are still devices being monitored. > > Refusing to exit. > > I see these messages all the time on all my machines (but the systems usually > shut down OK). The message is issued only if you try to kill the dmeventd process without unregistering monitored devices before - the "vgchange --monitor n" call in the stop function of the LVM2 SysV init script. But this one is executed properly with the updated package (bug #681582) as shown by the log that is attached. So those "Refusing to exit" messages originate by sending the SIGTERM before calling the "lvm2-monitor stop" script, it seems.
(In reply to comment #12) > The message is issued only if you try to kill the dmeventd process without > unregistering monitored devices before - the "vgchange --monitor n" call in the > stop function of the LVM2 SysV init script. Peter, is the "vgchange --monitor n" call completely synchronous? I.e., does it exit only after the monitoring is stopped and it is immediately safe to kill dmeventd? If the unmonitoring is done asynchronously, dmeventd may get the SIGTERM too early...
(In reply to comment #13) > is the "vgchange --monitor n" call completely synchronous? It is - we're reading a status line from dmeventd containing a return value and a sequence number to compare it with request sequence number. That should be handled fine...
Is this still an issue or can this bug be closed?
I have Fedora 16 and I don't see any more problems. I have no idea about F15 users.
If anyone experience this on F15 please reopen this bug thanks.