Bug 1060134

Summary: The dm-event.service reload goes wrong - it passes failed state and completely new instance is used in the end
Product: [Fedora] Fedora Reporter: Peter Rajnoha <prajnoha>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
Status: CLOSED ERRATA QA Contact: James Hartsock <hartsjc>
Severity: high Docs Contact:
Priority: high    
Version: 20CC: agk, bmarzins, bmr, dwysocha, hartsjc, heinzm, jonathan, lvm-team, msnitzer, okozina, prajnoha, prockai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.106-1.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1060737 (view as bug list) Environment:
Last Closed: 2014-04-20 01:28:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Rajnoha 2014-01-31 10:57:33 UTC
When trying to reload dm-event.service, all looks OK from the high-level status:

[0] raw/~ # systemctl status dm-event
dm-event.service - Device-mapper event daemon
   Loaded: loaded (/usr/lib/systemd/system/dm-event.service; disabled)
   Active: active (running) since Fri 2014-01-31 11:40:32 CET; 24s ago
     Docs: man:dmeventd(8)
  Process: 604 ExecStart=/usr/sbin/dmeventd (code=exited, status=0/SUCCESS)
 Main PID: 605 (dmeventd)
   CGroup: /system.slice/dm-event.service
           `-605 /usr/sbin/dmeventd

Jan 31 11:40:32 raw.virt dmeventd[605]: dmeventd ready for processing.
Jan 31 11:40:32 raw.virt lvm[605]: Monitoring snapshot vg-lvol1

[0] raw/~ # systemctl reload dm-event

[0] raw/~ # systemctl status dm-event
dm-event.service - Device-mapper event daemon
   Loaded: loaded (/usr/lib/systemd/system/dm-event.service; disabled)
   Active: active (running) since Fri 2014-01-31 11:41:14 CET; 4s ago
     Docs: man:dmeventd(8)
  Process: 1198 ExecReload=/usr/sbin/dmeventd -R (code=exited, status=0/SUCCESS)
  Process: 1205 ExecStart=/usr/sbin/dmeventd (code=exited, status=0/SUCCESS)
 Main PID: 1206 (dmeventd)
   CGroup: /system.slice/dm-event.service
           `-1206 /usr/sbin/dmeventd

Jan 31 11:41:14 raw.virt dmeventd[1206]: dmeventd ready for processing.



But looking in more detail in the systemd debug log, we can see things are not going as expected:

Jan 31 11:40:30 raw.virt systemd[1]: Installed new job dm-event.socket/start as 128
...
Jan 31 11:40:30 raw.virt systemd[1]: dm-event.socket changed dead -> listening
Jan 31 11:40:30 raw.virt systemd[1]: Job dm-event.socket/start finished, result=done
Jan 31 11:40:30 raw.virt systemd[1]: Listening on Device-mapper event daemon FIFOs.
Jan 31 11:40:30 raw.virt systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Jan 31 11:40:30 raw.virt systemd[1]: About to execute: /usr/sbin/lvm vgchange --monitor
...
Jan 31 11:40:32 raw.virt systemd[1]: Incoming traffic on dm-event.socket
Jan 31 11:40:32 raw.virt systemd[1]: Trying to enqueue job dm-event.service/start/replace
Jan 31 11:40:32 raw.virt systemd[1]: Installed new job dm-event.service/start as 275
Jan 31 11:40:32 raw.virt systemd[1]: Enqueued job dm-event.service/start as 275
Jan 31 11:40:32 raw.virt systemd[1]: dm-event.socket changed listening -> running
Jan 31 11:40:32 raw.virt systemd[1]: Starting Device-mapper event daemon...
Jan 31 11:40:32 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd
Jan 31 11:40:32 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 604
Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service changed dead -> start
Jan 31 11:40:32 raw.virt systemd[604]: Executing: /usr/sbin/dmeventd
Jan 31 11:40:32 raw.virt systemd[1]: Received SIGCHLD from PID 604 (dmeventd).
Jan 31 11:40:32 raw.virt systemd[1]: Got SIGCHLD for process 604 (dmeventd)
Jan 31 11:40:32 raw.virt systemd[1]: Child 604 died (code=exited, status=0/SUCCESS)
Jan 31 11:40:32 raw.virt systemd[1]: Child 604 belongs to dm-event.service
Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0
Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state start
Jan 31 11:40:32 raw.virt systemd[1]: Main PID loaded: 605
Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service changed start -> running
Jan 31 11:40:32 raw.virt systemd[1]: Job dm-event.service/start finished, result=done
Jan 31 11:40:32 raw.virt systemd[1]: Started Device-mapper event daemon.
...

  ^^ this part is OK - dm-event.service automatically activated on boot from the dm-event.socket unit


And below is the "reload" action that goes wrong:

Jan 31 11:41:08 raw.virt systemd[1]: Trying to enqueue job dm-event.service/reload/replace
Jan 31 11:41:08 raw.virt systemd[1]: Installed new job dm-event.service/reload as 668
Jan 31 11:41:08 raw.virt systemd[1]: Enqueued job dm-event.service/reload as 668
Jan 31 11:41:08 raw.virt systemd[1]: Reloading Device-mapper event daemon.
Jan 31 11:41:08 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd -R
Jan 31 11:41:08 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 1198
Jan 31 11:41:08 raw.virt systemd[1]: dm-event.service changed running -> reload
...
Jan 31 11:41:08 raw.virt systemd[1]: Received SIGCHLD from PID 605 (dmeventd).
Jan 31 11:41:08 raw.virt systemd[1]: Got SIGCHLD for process 605 (dmeventd)
Jan 31 11:41:08 raw.virt systemd[1]: Child 605 died (code=killed, status=9/KILL)
Jan 31 11:41:08 raw.virt systemd[1]: Child 605 belongs to dm-event.service
Jan 31 11:41:08 raw.virt systemd[1]: dm-event.service: main process exited, code=killed, status=9/KILL
Jan 31 11:41:14 raw.virt dmeventd[1198]: No input from event server.
Jan 31 11:41:14 raw.virt dmeventd[1199]: dmeventd ready for processing.
Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1198 (dmeventd).
Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1198 (dmeventd)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1198 died (code=exited, status=0/SUCCESS)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1198 belongs to dm-event.service
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state reload
Jan 31 11:41:14 raw.virt systemd[1]: Main PID changing: 0 -> 1199
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed reload -> running
Jan 31 11:41:14 raw.virt systemd[1]: Job dm-event.service/reload finished, result=done
Jan 31 11:41:14 raw.virt systemd[1]: Reloaded Device-mapper event daemon.
...
Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1199 (dmeventd).
Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1199 (dmeventd)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1199 died (code=killed, status=9/KILL)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1199 belongs to dm-event.service
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: main process exited, code=killed, status=9/KILL
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed running -> failed
Jan 31 11:41:14 raw.virt systemd[1]: Unit dm-event.service entered failed state.
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket got notified about service death (failed permanently: no)
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed running -> listening
Jan 31 11:41:14 raw.virt systemd[1]: Incoming traffic on dm-event.socket
Jan 31 11:41:14 raw.virt systemd[1]: Trying to enqueue job dm-event.service/start/replace
Jan 31 11:41:14 raw.virt systemd[1]: Installed new job dm-event.service/start as 669
Jan 31 11:41:14 raw.virt systemd[1]: Enqueued job dm-event.service/start as 669
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed listening -> running
Jan 31 11:41:14 raw.virt systemd[1]: Starting Device-mapper event daemon...
Jan 31 11:41:14 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd
Jan 31 11:41:14 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 1202
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed failed -> start
...
Jan 31 11:41:14 raw.virt systemd[1202]: Executing: /usr/sbin/dmeventd
...
Jan 31 11:41:14 raw.virt dmeventd[1203]: dmeventd ready for processing.
Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1202 (dmeventd).
Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1202 (dmeventd)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1202 died (code=exited, status=0/SUCCESS)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1202 belongs to dm-event.service
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state start
Jan 31 11:41:14 raw.virt systemd[1]: PID 1203 read from file /run/dmeventd.pid does not exist.
Jan 31 11:41:14 raw.virt systemd[1]: Setting watch for dm-event.service's PID file /run/dmeventd.pid
Jan 31 11:41:14 raw.virt systemd[1]: Trying to read dm-event.service's PID file /run/dmeventd.pid in case it changed
Jan 31 11:41:14 raw.virt systemd[1]: Stopping watch for dm-event.service's PID file /run/dmeventd.pid
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed start -> failed
Jan 31 11:41:14 raw.virt systemd[1]: Job dm-event.service/start finished, result=failed
Jan 31 11:41:14 raw.virt systemd[1]: Failed to start Device-mapper event daemon.
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket got notified about service death (failed permanently: no)
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed running -> listening
...
Jan 31 11:41:14 raw.virt systemd[1]: Unit dm-event.service entered failed state.
Jan 31 11:41:14 raw.virt systemd[1]: Incoming traffic on dm-event.socket
Jan 31 11:41:14 raw.virt systemd[1]: Trying to enqueue job dm-event.service/start/replace
Jan 31 11:41:14 raw.virt systemd[1]: Installed new job dm-event.service/start as 674
Jan 31 11:41:14 raw.virt systemd[1]: Enqueued job dm-event.service/start as 674
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed listening -> running
Jan 31 11:41:14 raw.virt systemd[1]: Starting Device-mapper event daemon...
Jan 31 11:41:14 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd
Jan 31 11:41:14 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 1205
Jan 31 11:41:14 raw.virt systemd[1205]: Executing: /usr/sbin/dmeventd
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed failed -> start
...
Jan 31 11:41:14 raw.virt dmeventd[1206]: dmeventd ready for processing.
Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1205 (dmeventd).
Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1205 (dmeventd)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1205 died (code=exited, status=0/SUCCESS)
Jan 31 11:41:14 raw.virt systemd[1]: Child 1205 belongs to dm-event.service
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state start
Jan 31 11:41:14 raw.virt systemd[1]: Main PID loaded: 1206
Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed start -> running
Jan 31 11:41:14 raw.virt systemd[1]: Job dm-event.service/start finished, result=done
Jan 31 11:41:14 raw.virt systemd[1]: Started Device-mapper event daemon.


As can be seen from the log above, the dm-event.service unit even passes "failed" state and then it's instantiated again only because the dm-event.socket unit triggers a *completely new* instance (just because the FIFO is accessed). This is normally not visible in simple "systemctl status" as indicated at the beginning of this comment.

Comment 1 Peter Rajnoha 2014-01-31 12:37:46 UTC
I think this could work for the dmeventd restart in systemd environment:

  - dmeventd -R calls the "restart" fn
  - we get the state from the old dmeventd
  - we order the old dmeventd to die
  
  (up until here, the procedure is the same as it was before)

  - if under systemd management, send request to register all the original event registrations directly to the fifo
  - this instantiates a new dmeventd which will read the initial registrations right after it's started
  - the dmeventd -R call finishes
  - the newly instantiated dmeventd continues....

Comment 2 Peter Rajnoha 2014-02-03 13:33:22 UTC
The direct outcome of this bug is that we lose existing event monitoring on dmeventd daemon restart (that normally happens on package update!).

Comment 4 Fedora Update System 2014-04-15 10:29:05 UTC
lvm2-2.02.106-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/lvm2-2.02.106-1.fc20

Comment 5 Fedora Update System 2014-04-16 09:25:41 UTC
Package lvm2-2.02.106-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing lvm2-2.02.106-1.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-5241/lvm2-2.02.106-1.fc20
then log in and leave karma (feedback).

Comment 6 Fedora Update System 2014-04-20 01:28:46 UTC
lvm2-2.02.106-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 7 James Hartsock 2015-06-17 22:47:22 UTC
As customer needs something to get systems usable and found that a sleep in dracut-pre-pivot.service was working, I came up with the following so we could be a little more dynamic in our waiting period.


Script example to wait for mlx4-setup.sh
~~~
# cat /usr/bin/mlx4-wait.sh
#!/bin/bash
PATH=/bin:/usr/bin
tty -s 2>/dev/null || exec >>/run/initramfs/mlx4-wait.log 2>&1

Sleep=3
Max=20

count=1
#while ps -C mlx4-setup.sh >/dev/null 2>&1 && [[ ${count:=1} -lt ${Max=3} ]] ; do
while ps -flC modprobe | grep -q mlx4 && [[ ${count:=1} -lt ${Max=3} ]] ; do
  #ps -fC mlx4-setup.sh
  ps -ef | grep -e mlx4 -e modprobe
  echo "Sleeping for ${Sleep:=3} sec (${count} of ${Max})" && sleep ${Sleep}
  echo
  count=$((count+1))
done
~~~


Then had them add it to the ExecPre for dracut-pre-pivot.service
~~~
# cp -p /usr/lib/systemd/system/dracut-pre-pivot.service /usr/lib/systemd/system/dracut-pre-pivot.service.orig
# echo -e "/^ExecStart=\ni\nExecStartPre=-/usr/bin/mlx4-wait.sh\n.\nw\nq" | ed -s /usr/lib/systemd/system/dracut-pre-pivot.service
# cp -p /usr/lib/dracut/modules.d/05rdma/module-setup.sh /usr/lib/dracut/modules.d/05rdma/module-setup.sh.orig
# echo -e "/mlx4-setup.sh\ni\n\tinst /usr/bin/mlx4-wait.sh\n.\nw\nq" | ed -s /usr/lib/dracut/modules.d/05rdma/module-setup.sh
# systemctl daemon-reload
# dracut -f
# kexec -l /boot/vmlinuz-$(uname -r) \
  --initrd=/boot/initramfs-$(uname -r).img \
  --reuse-cmdline
# systemctl kexec
~~~



This all worked for system with 3 (ie 6ports) ConnectX-3 Cards.  However a system that has 7 cards (14 ports) we were hitting issues.  We found that we had to wait 130 seconds.  Once was in place we were able to get this system up.