Red Hat Bugzilla – Bug 843587
Hang on socket if A.service autostarts B.service and "After=B.service" is defined for A.service
Last modified: 2014-10-14 03:26:38 EDT
Service A uses resources of service B via socket and service B is configured to
autostart on-demand via that socket. At the same time, we need to assure that
service A is always stopped *before* service B itself is stopped on shutdown.
Also, service B was stopped (manually) during normal system run before the
Trying to stop service A results in a hang if service B is not running and
trying to autostart - service B is not autostarted in this case and A waits
for B socket.
Now, to be more concrete, we have these units:
Description=LVM2 metadata daemon socket
Description=LVM2 metadata daemon
ExecStartPost=/usr/sbin/lvm pvscan --cache
Description=Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
ExecStart=/usr/sbin/lvm vgchange --monitor y
ExecStop=/usr/sbin/lvm vgchange --monitor n
The "After=lvm2-lvmetad.service" in the lvm2-monitor.service is used to make
sure that the lvmetad service is always stopped after the monitor service
(otherwise, without this, I ended up with the lvmetad service being stopped
prematurely and the monitor service trying to use the dead service).
Now, if someone calls "systemctl stop lvm2-lvmetad.service", the service is
stopped. That's perfectly fine as the service should be autostarted on next
access to its socket... But the "After=lvm2-lvmetad.service" in lvm2-monitor
service casues the deadlock:
 rawhide/~ # systemctl stop lvm2-lvmetad.service
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
 rawhide/~ # systemctl stop lvm2-monitor.service
...looking further at where this hangs - it's the "vgchange --monitor n" call
(the ExecStop action) which tries to communicate with lvmetad through the socket
and it's waiting on the socket that is not handled and no autostart happens...
The lvmetad service stays inactive:
 rawhide/~ # systemctl status lvm2-lvmetad.service
lvm2-lvmetad.service - LVM2 metadata daemon
Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled)
Active: inactive (dead) since Thu, 26 Jul 2012 16:53:47 +0200; 1min 23s ago
So systemd has failed to autostart it based on the socket access.
 rawhide/~ # systemctl list-jobs
JOB UNIT TYPE STATE
321 lvm2-monitor.service stop running
322 lvm2-lvmetad.service start waiting
This does not happen if that "After=lvm2-lvmetad.service" statement is not used.
But then, on shutdown, lvmetad is stopped first before lvm2-monitor.service,
which is wrong.
If using the "After" statement this way is not correct, what is the alternative way to make sure that the services are stopped in right order on shutdown in this scenario?
(In reply to comment #0)
>  rawhide/~ # systemctl list-jobs
> JOB UNIT TYPE STATE
> 321 lvm2-monitor.service stop running
> 322 lvm2-lvmetad.service start waiting
> This does not happen if that "After=lvm2-lvmetad.service" statement is not
So, to summarize: The deadlock occurs because the lvm2-monitor.service/stop process waits on lvm2-lvmetad.service to become available, but lvm2-lvmetad.service/start cannot proceed, because of the existing ordering dependency between the two units and the general rule that stop jobs run before start jobs.
This needs some thinking.
Meanwhile, as a workaround, would it be acceptable for you to avoid the late on-demand activation by adding "Requires=lvm2-lvmetad.service" to the lvm2-monitor.service unit?
(In reply to comment #1)
> Meanwhile, as a workaround, would it be acceptable for you to avoid the late
> on-demand activation by adding "Requires=lvm2-lvmetad.service" to the
> lvm2-monitor.service unit?
Unfortunately not... lvm2-monitor is idependend service of lvmetad which is just an LVM cache daemon to speed things up (and replace direct scan for devices) + it provides autoactivation feature through udev by listening to udev events. The monitoring, like most of the other commands, just use lvmetad as a shortcut to information about existing metadata without the need to touch devices directly... But all the commands can fallback to standard functionality without lvmetad enhancement.
Lvmetad is still under development, so we'd like to make it disabled by default and only use it if global/use_lvmetad=1 lvm.conf setting is used. So having it on-demand perfectly suits our needs - "instantiate only if really needed".
Also, adding "Requires=lvm2-lvmetad.service" for lvm2-monitor.service could cause the monitoring service to fail as lvmetad won't start if global/use_lvmetad=0 is used which is default for now until it gets more mature and stabilizes - lvmetad unit would fail and so the monitoring unit becasue of the requirement that would be added.
What we need is just to apply the ordering if lvmetad is enabled, but no requirement.
(In reply to comment #2)
> Lvmetad is still under development, so we'd like to make it disabled by
> default and only use it if global/use_lvmetad=1 lvm.conf setting is used. So
> having it on-demand perfectly suits our needs - "instantiate only if really
(...if use_lvmetad=0 is set in lvm.conf, all LVM commands will honor this setting and so no command will try to communicate with lvmetad through the socket and the lvmetad won't be instantiaded, saving resources that would be otherwise uselessly consumed... However, one can override those global settings, e.g. it could be through lvm2app library! And it this case, we need to instantiate the daemon. So the on-demand really suits here...)
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here:
It seems this is now handled a bit better:
lvm2-lvmetad.socket failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction is destructive.
So the transaction requested is cancelled automatically which avoids the hang.
Further, for all the other services calling lvm commands that try to connect to the lvmetad daemon, the commands provide a warning:
WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
...which is correct since the service has been stoppped and there's no lvmetad instance running during shutdown when there are still some other services/commands requesting it.
Considering this is a corner case where the lvmetad service is stopped *manually* and lvm.conf is not changed accordingly (so a configuration error actually), such behaviour at shutdown is acceptable - the error messages are correct here. Also, the hang which happened before is avoided as systemd can handle this situation better now - I'm closing this report.