843587 – Hang on socket if A.service autostarts B.service and "After=B.service" is defined for A.service

Bug 843587 - Hang on socket if A.service autostarts B.service and "After=B.service" is defined for A.service

Summary: Hang on socket if A.service autostarts B.service and "After=B.service" is def...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	19
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	871527
TreeView+	depends on / blocked

Reported:	2012-07-26 17:13 UTC by Peter Rajnoha
Modified:	2014-10-14 07:26 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-10-14 07:26:38 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Peter Rajnoha 2012-07-26 17:13:45 UTC

Problem scenario:
Service A uses resources of service B via socket and service B is configured to
autostart on-demand via that socket. At the same time, we need to assure that
service A is always stopped *before* service B itself is stopped on shutdown.
Also, service B was stopped (manually) during normal system run before the
shutdown.

Consequence:
Trying to stop service A results in a hang if service B is not running and
trying to autostart - service B is not autostarted in this case and A waits
for B socket.

Now, to be more concrete, we have these units:
A=lvm2-monitor.service, B=lvm2-lvmetad.service

---

(lvm2-lvmetad.socket)
[Unit]
Description=LVM2 metadata daemon socket
DefaultDependencies=no
[Socket]
ListenStream=/run/lvm/lvmetad.socket
SocketMode=0600
[Install]
WantedBy=sockets.target

(lvm2-lvmetad.service)
[Unit]
Description=LVM2 metadata daemon
Requires=lvm2-lvmetad.socket
After=lvm2-lvmetad.socket
DefaultDependencies=no
Conflicts=shutdown.target
[Service]
Type=forking
NonBlocking=true
ExecStart=/usr/sbin/lvmetad
ExecStartPost=/usr/sbin/lvm pvscan --cache
ExecReload=/usr/sbin/lvmetad -R
Environment=SD_ACTIVATION=1
Restart=on-abort
PIDFile=/run/lvmetad.pid
[Install]
WantedBy=sysinit.target

(lvm2-monitor.service)
[Unit]
Description=Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
Requires=dm-event.socket
After=dm-event.socket lvm2-lvmetad.service
Before=local-fs.target
DefaultDependencies=no
Conflicts=shutdown.target
[Service]
Type=oneshot
Environment=LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES=1
ExecStart=/usr/sbin/lvm vgchange --monitor y
ExecStop=/usr/sbin/lvm vgchange --monitor n
RemainAfterExit=yes
[Install]
WantedBy=sysinit.target

---

The "After=lvm2-lvmetad.service" in the lvm2-monitor.service is used to make
sure that the lvmetad service is always stopped after the monitor service
(otherwise, without this, I ended up with the lvmetad service being stopped
prematurely and the monitor service trying to use the dead service).

Now, if someone calls "systemctl stop lvm2-lvmetad.service", the service is
stopped. That's perfectly fine as the service should be autostarted on next
access to its socket... But the "After=lvm2-lvmetad.service" in lvm2-monitor
service casues the deadlock:

[0] rawhide/~ # systemctl stop lvm2-lvmetad.service 
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
  lvm2-lvmetad.socket
[0] rawhide/~ # systemctl stop lvm2-monitor.service 
  [HANGS HERE]

...looking further at where this hangs - it's the "vgchange --monitor n" call
(the ExecStop action) which tries to communicate with lvmetad through the socket
and it's waiting on the socket that is not handled and no autostart happens...

The lvmetad service stays inactive:

[1] rawhide/~ # systemctl status lvm2-lvmetad.service
lvm2-lvmetad.service - LVM2 metadata daemon
	  Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled)
	  Active: inactive (dead) since Thu, 26 Jul 2012 16:53:47 +0200; 1min 23s ago

So systemd has failed to autostart it based on the socket access.

[0] rawhide/~ # systemctl list-jobs
 JOB UNIT                      TYPE            STATE  
 321 lvm2-monitor.service      stop            running
 322 lvm2-lvmetad.service      start           waiting

This does not happen if that "After=lvm2-lvmetad.service" statement is not used.
But then, on shutdown, lvmetad is stopped first before lvm2-monitor.service,
which is wrong.

If using the "After" statement this way is not correct, what is the alternative way to make sure that the services are stopped in right order on shutdown in this scenario?

Comment 1 Michal Schmidt 2012-07-27 09:06:38 UTC

(In reply to comment #0)
> [0] rawhide/~ # systemctl list-jobs
>  JOB UNIT                      TYPE            STATE  
>  321 lvm2-monitor.service      stop            running
>  322 lvm2-lvmetad.service      start           waiting
> 
> This does not happen if that "After=lvm2-lvmetad.service" statement is not
> used.

So, to summarize: The deadlock occurs because the lvm2-monitor.service/stop process waits on lvm2-lvmetad.service to become available, but lvm2-lvmetad.service/start cannot proceed, because of the existing ordering dependency between the two units and the general rule that stop jobs run before start jobs.

This needs some thinking.

Meanwhile, as a workaround, would it be acceptable for you to avoid the late on-demand activation by adding "Requires=lvm2-lvmetad.service" to the lvm2-monitor.service unit?

Comment 2 Peter Rajnoha 2012-07-27 15:34:02 UTC

(In reply to comment #1)
> Meanwhile, as a workaround, would it be acceptable for you to avoid the late
> on-demand activation by adding "Requires=lvm2-lvmetad.service" to the
> lvm2-monitor.service unit?

Unfortunately not... lvm2-monitor is idependend service of lvmetad which is just an LVM cache daemon to speed things up (and replace direct scan for devices) + it provides autoactivation feature through udev by listening to udev events. The monitoring, like most of the other commands, just use lvmetad as a shortcut to information about existing metadata without the need to touch devices directly... But all the commands can fallback to standard functionality without lvmetad enhancement.

Lvmetad is still under development, so we'd like to make it disabled by default and only use it if global/use_lvmetad=1 lvm.conf setting is used. So having it on-demand perfectly suits our needs - "instantiate only if really needed".

Also, adding "Requires=lvm2-lvmetad.service" for lvm2-monitor.service could cause the monitoring service to fail as lvmetad won't start if global/use_lvmetad=0 is used which is default for now until it gets more mature and stabilizes - lvmetad unit would fail and so the monitoring unit becasue of the requirement that would be added.

What we need is just to apply the ordering if lvmetad is enabled, but no requirement.

Comment 3 Peter Rajnoha 2012-07-27 15:40:20 UTC

(In reply to comment #2)
> Lvmetad is still under development, so we'd like to make it disabled by
> default and only use it if global/use_lvmetad=1 lvm.conf setting is used. So
> having it on-demand perfectly suits our needs - "instantiate only if really
> needed".

(...if use_lvmetad=0 is set in lvm.conf, all LVM commands will honor this setting and so no command will try to communicate with lvmetad through the socket and the lvmetad won't be instantiaded, saving resources that would be otherwise uselessly consumed... However, one can override those global settings, e.g. it could be through lvm2app library! And it this case, we need to instantiate the daemon. So the on-demand really suits here...)

Comment 4 Fedora End Of Life 2013-04-03 17:44:55 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 5 Peter Rajnoha 2014-10-14 07:26:38 UTC

It seems this is now handled a bit better:

  lvm2-lvmetad.socket failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction is destructive.

So the transaction requested is cancelled automatically which avoids the hang.
Further, for all the other services calling lvm commands that try to connect to the lvmetad daemon, the commands provide a warning:

  WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

...which is correct since the service has been stoppped and there's no lvmetad instance running during shutdown when there are still some other services/commands requesting it.

Considering this is a corner case where the lvmetad service is stopped *manually* and lvm.conf is not changed accordingly (so a configuration error actually), such behaviour at shutdown is acceptable - the error messages are correct here. Also, the hang which happened before is avoided as systemd can handle this situation better now - I'm closing this report.

Note You need to log in before you can comment on or make changes to this bug.