747582 – Old SysV init script is executed properly (lvm2-monitor) but systemctl hangs

Bug 747582 - Old SysV init script is executed properly (lvm2-monitor) but systemctl hangs

Summary: Old SysV init script is executed properly (lvm2-monitor) but systemctl hangs

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-10-20 12:26 UTC by Peter Rajnoha
Modified:	2012-01-25 16:26 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-01-25 16:26:58 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Log from the lvm2-monitor SysV script execution on "lvm2-monitor stop" (1.95 KB, text/plain) 2011-10-20 12:27 UTC, Peter Rajnoha	no flags	Details
Strace log of "systemctl stop lvm2-monitor.service" (9.81 KB, text/plain) 2011-10-20 12:27 UTC, Peter Rajnoha	no flags	Details
dmesg output (284.00 KB, text/plain) 2011-10-20 13:36 UTC, Kamil Páral	no flags	Details
View All

Description Peter Rajnoha 2011-10-20 12:26:06 UTC

Description of problem:
See https://bugzilla.redhat.com/show_bug.cgi?id=681582#c61.

Actual results:
SysV init script seems to be executed properly, but still the hang occurs during systemd's systemctl execution.

Additional info:
Attaching logs from the lvm2-monitor sysv init script execution and strace of the "systemctl stop lvm2-monitor.service" that hangs.

Comment 1 Peter Rajnoha 2011-10-20 12:27:01 UTC

Created attachment 529256 [details]
Log from the lvm2-monitor SysV script execution on "lvm2-monitor stop"

Comment 2 Peter Rajnoha 2011-10-20 12:27:54 UTC

Created attachment 529257 [details]
Strace log of "systemctl stop lvm2-monitor.service"

Comment 3 Michal Schmidt 2011-10-20 12:36:16 UTC

Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg", reproduce the hang and attach the output of "dmesg".

While the hang is occuring, does "systemctl status lvm2-monitor.service" run from another terminal finish successfully? What does it say?

Comment 4 Kamil Páral 2011-10-20 13:36:15 UTC

After clean boot I was able to stop lvm2-monitor service properly and immediately. But after I ran virt-manager, started up a virtual machine (using LV as its disk) and destroyed the virtual machine, the problem with stopping lvm2-monitor service occurred.

While "systemctl stop lvm2-monitor.service" was in progress (hanging), I ran this in another terminal:

# systemctl status lvm2-monitor.service 
lvm2-monitor.service - LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
	  Loaded: loaded (/etc/rc.d/init.d/lvm2-monitor)
	  Active: deactivating (stop-sigterm) since Thu, 20 Oct 2011 15:30:15 +0200; 22s ago
	 Process: 2935 ExecStop=/etc/rc.d/init.d/lvm2-monitor stop (code=exited, status=0/SUCCESS)
	 Process: 2786 ExecStart=/etc/rc.d/init.d/lvm2-monitor start (code=exited, status=0/SUCCESS)
	Main PID: 2795 (dmeventd)
	  CGroup: name=systemd:/system/lvm2-monitor.service
		  └ 2795 /sbin/dmeventd

After the "systemctl stop lvm2-monitor.service" ended (in several minutes), I gathered the output of "dmesg" and I append it here.

Comment 5 Kamil Páral 2011-10-20 13:36:39 UTC

Created attachment 529278 [details]
dmesg output

Comment 6 Michal Schmidt 2011-10-20 14:02:45 UTC

So it did not hang completely, but it took 5 minutes to finish.
"/etc/rc.d/init.d/lvm2-monitor stop" finished with success, but PID 2795 kept running. systemd tried to kill it with SIGTERM. After 5 minutes it was still running, so systemd killed it with SIGKILL.

Comment 7 Peter Rajnoha 2011-10-20 14:35:33 UTC

(In reply to comment #6)
> So it did not hang completely, but it took 5 minutes to finish.
> "/etc/rc.d/init.d/lvm2-monitor stop" finished with success, but PID 2795 kept
> running. systemd tried to kill it with SIGTERM. After 5 minutes it was still
> running, so systemd killed it with SIGKILL.

"lvm2-monitor stop" causes all monitored devices (lvm2 snaphsot and mirror volumes) to be unregistered from dmeventd monitoring. Of course, the dmeventd is still there and if there's nothing to monitor, it could be killed by SIGTERM normally.

If dmeventd still has devices that are being monitored, SIGKILL is needed (SIGTERM will be ignored in this case).

But based on the log - all the LVM2 volumes were unmonitored:

  3 logical volume(s) in volume group "encvg" unmonitored
  12 logical volume(s) in volume group "vg" unmonitored

Kamil, can you please try running quickly this sequence (assuming the "encvg" and "vg" are the only vgs on your machine):

  vgchange --monitor n encvg
  vgchange --monitor n vg
  kill <dmeventd_pid>

...and check if dmeventd is really killed and gone. If not, then this is probably an lvm2 bug where devices are not fully unmonitored for some reason.

Comment 8 Fedora Admin XMLRPC Client 2011-10-20 16:29:12 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Kamil Páral 2011-10-21 11:54:17 UTC

(In reply to comment #7)
> Kamil, can you please try running quickly this sequence (assuming the "encvg"
> and "vg" are the only vgs on your machine):
> 
>   vgchange --monitor n encvg
>   vgchange --monitor n vg
>   kill <dmeventd_pid>
> 
> ...and check if dmeventd is really killed and gone.

This works.

Also I tested that when I run "systemctl stop lvm2-monitor.service" and it waits for a long time, and when I switch to another terminal and send SIGTERM to dmeventd, then systemctl call ends immediately.

It seems as if systemd didn't send SIGTERM to dmeventd at all (or it was ignored, but my manual signal is not ignored).

Comment 10 Michal Schmidt 2011-11-20 21:40:11 UTC

(In reply to comment #9)
> It seems as if systemd didn't send SIGTERM to dmeventd at all (or it was
> ignored, but my manual signal is not ignored).

You can verify if the SIGTERM is being sent:

echo "1" > /sys/kernel/debug/tracing/events/syscalls/sys_enter_kill/enable
systemctl stop lvm2-monitor.service &
cat /sys/kernel/debug/tracing/trace


Also check /var/log/messages for something like:
 There are still devices being monitored.
 Refusing to exit.

Comment 11 Kamil Páral 2011-11-21 17:48:43 UTC

I no longer have Fedora 15, I upgraded to Fedora 16. Does it still apply? I do not see any lvm related shutdown problems now.

> Also check /var/log/messages for something like:
>  There are still devices being monitored.
>  Refusing to exit.

I see these messages all the time on all my machines (but the systems usually shut down OK).

Comment 12 Peter Rajnoha 2011-11-22 08:33:07 UTC

(In reply to comment #11)
> I no longer have Fedora 15, I upgraded to Fedora 16. Does it still apply? I do
> not see any lvm related shutdown problems now.
> 

We have native systemd unit files for LVM2 in F16 so the problem should not occur anymore in F16+.

> > Also check /var/log/messages for something like:
> >  There are still devices being monitored.
> >  Refusing to exit.
> 
> I see these messages all the time on all my machines (but the systems usually
> shut down OK).

The message is issued only if you try to kill the dmeventd process without unregistering monitored devices before - the "vgchange --monitor n" call in the stop function of the LVM2 SysV init script. But this one is executed properly with the updated package (bug #681582) as shown by the log that is attached. So those "Refusing to exit" messages originate by sending the SIGTERM before calling the "lvm2-monitor stop" script, it seems.

Comment 13 Michal Schmidt 2011-11-22 12:15:46 UTC

(In reply to comment #12)
> The message is issued only if you try to kill the dmeventd process without
> unregistering monitored devices before - the "vgchange --monitor n" call in the
> stop function of the LVM2 SysV init script.

Peter,
is the "vgchange --monitor n" call completely synchronous? I.e., does it exit only after the monitoring is stopped and it is immediately safe to kill dmeventd?
If the unmonitoring is done asynchronously, dmeventd may get the SIGTERM too early...

Comment 14 Peter Rajnoha 2011-11-22 13:04:01 UTC

(In reply to comment #13)
> is the "vgchange --monitor n" call completely synchronous?

It is - we're reading a status line from dmeventd containing a return value and a sequence number to compare it with request sequence number. That should be handled fine...

Comment 15 Jóhann B. Guðmundsson 2012-01-25 13:42:59 UTC

Is this still an issue or can this bug be closed?

Comment 16 Kamil Páral 2012-01-25 14:02:41 UTC

I have Fedora 16 and I don't see any more problems. I have no idea about F15 users.

Comment 17 Jóhann B. Guðmundsson 2012-01-25 16:26:58 UTC

If anyone experience this on F15 please reopen this bug thanks.

Note You need to log in before you can comment on or make changes to this bug.