When trying to reload dm-event.service, all looks OK from the high-level status: [0] raw/~ # systemctl status dm-event dm-event.service - Device-mapper event daemon Loaded: loaded (/usr/lib/systemd/system/dm-event.service; disabled) Active: active (running) since Fri 2014-01-31 11:40:32 CET; 24s ago Docs: man:dmeventd(8) Process: 604 ExecStart=/usr/sbin/dmeventd (code=exited, status=0/SUCCESS) Main PID: 605 (dmeventd) CGroup: /system.slice/dm-event.service `-605 /usr/sbin/dmeventd Jan 31 11:40:32 raw.virt dmeventd[605]: dmeventd ready for processing. Jan 31 11:40:32 raw.virt lvm[605]: Monitoring snapshot vg-lvol1 [0] raw/~ # systemctl reload dm-event [0] raw/~ # systemctl status dm-event dm-event.service - Device-mapper event daemon Loaded: loaded (/usr/lib/systemd/system/dm-event.service; disabled) Active: active (running) since Fri 2014-01-31 11:41:14 CET; 4s ago Docs: man:dmeventd(8) Process: 1198 ExecReload=/usr/sbin/dmeventd -R (code=exited, status=0/SUCCESS) Process: 1205 ExecStart=/usr/sbin/dmeventd (code=exited, status=0/SUCCESS) Main PID: 1206 (dmeventd) CGroup: /system.slice/dm-event.service `-1206 /usr/sbin/dmeventd Jan 31 11:41:14 raw.virt dmeventd[1206]: dmeventd ready for processing. But looking in more detail in the systemd debug log, we can see things are not going as expected: Jan 31 11:40:30 raw.virt systemd[1]: Installed new job dm-event.socket/start as 128 ... Jan 31 11:40:30 raw.virt systemd[1]: dm-event.socket changed dead -> listening Jan 31 11:40:30 raw.virt systemd[1]: Job dm-event.socket/start finished, result=done Jan 31 11:40:30 raw.virt systemd[1]: Listening on Device-mapper event daemon FIFOs. Jan 31 11:40:30 raw.virt systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Jan 31 11:40:30 raw.virt systemd[1]: About to execute: /usr/sbin/lvm vgchange --monitor ... Jan 31 11:40:32 raw.virt systemd[1]: Incoming traffic on dm-event.socket Jan 31 11:40:32 raw.virt systemd[1]: Trying to enqueue job dm-event.service/start/replace Jan 31 11:40:32 raw.virt systemd[1]: Installed new job dm-event.service/start as 275 Jan 31 11:40:32 raw.virt systemd[1]: Enqueued job dm-event.service/start as 275 Jan 31 11:40:32 raw.virt systemd[1]: dm-event.socket changed listening -> running Jan 31 11:40:32 raw.virt systemd[1]: Starting Device-mapper event daemon... Jan 31 11:40:32 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd Jan 31 11:40:32 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 604 Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service changed dead -> start Jan 31 11:40:32 raw.virt systemd[604]: Executing: /usr/sbin/dmeventd Jan 31 11:40:32 raw.virt systemd[1]: Received SIGCHLD from PID 604 (dmeventd). Jan 31 11:40:32 raw.virt systemd[1]: Got SIGCHLD for process 604 (dmeventd) Jan 31 11:40:32 raw.virt systemd[1]: Child 604 died (code=exited, status=0/SUCCESS) Jan 31 11:40:32 raw.virt systemd[1]: Child 604 belongs to dm-event.service Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0 Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state start Jan 31 11:40:32 raw.virt systemd[1]: Main PID loaded: 605 Jan 31 11:40:32 raw.virt systemd[1]: dm-event.service changed start -> running Jan 31 11:40:32 raw.virt systemd[1]: Job dm-event.service/start finished, result=done Jan 31 11:40:32 raw.virt systemd[1]: Started Device-mapper event daemon. ... ^^ this part is OK - dm-event.service automatically activated on boot from the dm-event.socket unit And below is the "reload" action that goes wrong: Jan 31 11:41:08 raw.virt systemd[1]: Trying to enqueue job dm-event.service/reload/replace Jan 31 11:41:08 raw.virt systemd[1]: Installed new job dm-event.service/reload as 668 Jan 31 11:41:08 raw.virt systemd[1]: Enqueued job dm-event.service/reload as 668 Jan 31 11:41:08 raw.virt systemd[1]: Reloading Device-mapper event daemon. Jan 31 11:41:08 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd -R Jan 31 11:41:08 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 1198 Jan 31 11:41:08 raw.virt systemd[1]: dm-event.service changed running -> reload ... Jan 31 11:41:08 raw.virt systemd[1]: Received SIGCHLD from PID 605 (dmeventd). Jan 31 11:41:08 raw.virt systemd[1]: Got SIGCHLD for process 605 (dmeventd) Jan 31 11:41:08 raw.virt systemd[1]: Child 605 died (code=killed, status=9/KILL) Jan 31 11:41:08 raw.virt systemd[1]: Child 605 belongs to dm-event.service Jan 31 11:41:08 raw.virt systemd[1]: dm-event.service: main process exited, code=killed, status=9/KILL Jan 31 11:41:14 raw.virt dmeventd[1198]: No input from event server. Jan 31 11:41:14 raw.virt dmeventd[1199]: dmeventd ready for processing. Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1198 (dmeventd). Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1198 (dmeventd) Jan 31 11:41:14 raw.virt systemd[1]: Child 1198 died (code=exited, status=0/SUCCESS) Jan 31 11:41:14 raw.virt systemd[1]: Child 1198 belongs to dm-event.service Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state reload Jan 31 11:41:14 raw.virt systemd[1]: Main PID changing: 0 -> 1199 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed reload -> running Jan 31 11:41:14 raw.virt systemd[1]: Job dm-event.service/reload finished, result=done Jan 31 11:41:14 raw.virt systemd[1]: Reloaded Device-mapper event daemon. ... Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1199 (dmeventd). Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1199 (dmeventd) Jan 31 11:41:14 raw.virt systemd[1]: Child 1199 died (code=killed, status=9/KILL) Jan 31 11:41:14 raw.virt systemd[1]: Child 1199 belongs to dm-event.service Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: main process exited, code=killed, status=9/KILL Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed running -> failed Jan 31 11:41:14 raw.virt systemd[1]: Unit dm-event.service entered failed state. Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket got notified about service death (failed permanently: no) Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed running -> listening Jan 31 11:41:14 raw.virt systemd[1]: Incoming traffic on dm-event.socket Jan 31 11:41:14 raw.virt systemd[1]: Trying to enqueue job dm-event.service/start/replace Jan 31 11:41:14 raw.virt systemd[1]: Installed new job dm-event.service/start as 669 Jan 31 11:41:14 raw.virt systemd[1]: Enqueued job dm-event.service/start as 669 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed listening -> running Jan 31 11:41:14 raw.virt systemd[1]: Starting Device-mapper event daemon... Jan 31 11:41:14 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd Jan 31 11:41:14 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 1202 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed failed -> start ... Jan 31 11:41:14 raw.virt systemd[1202]: Executing: /usr/sbin/dmeventd ... Jan 31 11:41:14 raw.virt dmeventd[1203]: dmeventd ready for processing. Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1202 (dmeventd). Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1202 (dmeventd) Jan 31 11:41:14 raw.virt systemd[1]: Child 1202 died (code=exited, status=0/SUCCESS) Jan 31 11:41:14 raw.virt systemd[1]: Child 1202 belongs to dm-event.service Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state start Jan 31 11:41:14 raw.virt systemd[1]: PID 1203 read from file /run/dmeventd.pid does not exist. Jan 31 11:41:14 raw.virt systemd[1]: Setting watch for dm-event.service's PID file /run/dmeventd.pid Jan 31 11:41:14 raw.virt systemd[1]: Trying to read dm-event.service's PID file /run/dmeventd.pid in case it changed Jan 31 11:41:14 raw.virt systemd[1]: Stopping watch for dm-event.service's PID file /run/dmeventd.pid Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed start -> failed Jan 31 11:41:14 raw.virt systemd[1]: Job dm-event.service/start finished, result=failed Jan 31 11:41:14 raw.virt systemd[1]: Failed to start Device-mapper event daemon. Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket got notified about service death (failed permanently: no) Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed running -> listening ... Jan 31 11:41:14 raw.virt systemd[1]: Unit dm-event.service entered failed state. Jan 31 11:41:14 raw.virt systemd[1]: Incoming traffic on dm-event.socket Jan 31 11:41:14 raw.virt systemd[1]: Trying to enqueue job dm-event.service/start/replace Jan 31 11:41:14 raw.virt systemd[1]: Installed new job dm-event.service/start as 674 Jan 31 11:41:14 raw.virt systemd[1]: Enqueued job dm-event.service/start as 674 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.socket changed listening -> running Jan 31 11:41:14 raw.virt systemd[1]: Starting Device-mapper event daemon... Jan 31 11:41:14 raw.virt systemd[1]: About to execute: /usr/sbin/dmeventd Jan 31 11:41:14 raw.virt systemd[1]: Forked /usr/sbin/dmeventd as 1205 Jan 31 11:41:14 raw.virt systemd[1205]: Executing: /usr/sbin/dmeventd Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed failed -> start ... Jan 31 11:41:14 raw.virt dmeventd[1206]: dmeventd ready for processing. Jan 31 11:41:14 raw.virt systemd[1]: Received SIGCHLD from PID 1205 (dmeventd). Jan 31 11:41:14 raw.virt systemd[1]: Got SIGCHLD for process 1205 (dmeventd) Jan 31 11:41:14 raw.virt systemd[1]: Child 1205 died (code=exited, status=0/SUCCESS) Jan 31 11:41:14 raw.virt systemd[1]: Child 1205 belongs to dm-event.service Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service: control process exited, code=exited status=0 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service got final SIGCHLD for state start Jan 31 11:41:14 raw.virt systemd[1]: Main PID loaded: 1206 Jan 31 11:41:14 raw.virt systemd[1]: dm-event.service changed start -> running Jan 31 11:41:14 raw.virt systemd[1]: Job dm-event.service/start finished, result=done Jan 31 11:41:14 raw.virt systemd[1]: Started Device-mapper event daemon. As can be seen from the log above, the dm-event.service unit even passes "failed" state and then it's instantiated again only because the dm-event.socket unit triggers a *completely new* instance (just because the FIFO is accessed). This is normally not visible in simple "systemctl status" as indicated at the beginning of this comment.
I think this could work for the dmeventd restart in systemd environment: - dmeventd -R calls the "restart" fn - we get the state from the old dmeventd - we order the old dmeventd to die (up until here, the procedure is the same as it was before) - if under systemd management, send request to register all the original event registrations directly to the fifo - this instantiates a new dmeventd which will read the initial registrations right after it's started - the dmeventd -R call finishes - the newly instantiated dmeventd continues....
The direct outcome of this bug is that we lose existing event monitoring on dmeventd daemon restart (that normally happens on package update!).
The fix has been pushed upstream: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=8a8abc5ed9358f45b127927a20f844a7ef122466 https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=fbeb08f320414cf0b5ae22096504bc4293fe939a https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=90286fa0e96dde8f79c1d06eff4a878cb3024283 https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=73f30ed6a4b85c3b70aa39efa14ce78be5bf8a3a https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=290e58b0b6f001b2f3563304fd17d606eff51ba2
lvm2-2.02.106-1.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/lvm2-2.02.106-1.fc20
Package lvm2-2.02.106-1.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing lvm2-2.02.106-1.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-5241/lvm2-2.02.106-1.fc20 then log in and leave karma (feedback).
lvm2-2.02.106-1.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
As customer needs something to get systems usable and found that a sleep in dracut-pre-pivot.service was working, I came up with the following so we could be a little more dynamic in our waiting period. Script example to wait for mlx4-setup.sh ~~~ # cat /usr/bin/mlx4-wait.sh #!/bin/bash PATH=/bin:/usr/bin tty -s 2>/dev/null || exec >>/run/initramfs/mlx4-wait.log 2>&1 Sleep=3 Max=20 count=1 #while ps -C mlx4-setup.sh >/dev/null 2>&1 && [[ ${count:=1} -lt ${Max=3} ]] ; do while ps -flC modprobe | grep -q mlx4 && [[ ${count:=1} -lt ${Max=3} ]] ; do #ps -fC mlx4-setup.sh ps -ef | grep -e mlx4 -e modprobe echo "Sleeping for ${Sleep:=3} sec (${count} of ${Max})" && sleep ${Sleep} echo count=$((count+1)) done ~~~ Then had them add it to the ExecPre for dracut-pre-pivot.service ~~~ # cp -p /usr/lib/systemd/system/dracut-pre-pivot.service /usr/lib/systemd/system/dracut-pre-pivot.service.orig # echo -e "/^ExecStart=\ni\nExecStartPre=-/usr/bin/mlx4-wait.sh\n.\nw\nq" | ed -s /usr/lib/systemd/system/dracut-pre-pivot.service # cp -p /usr/lib/dracut/modules.d/05rdma/module-setup.sh /usr/lib/dracut/modules.d/05rdma/module-setup.sh.orig # echo -e "/mlx4-setup.sh\ni\n\tinst /usr/bin/mlx4-wait.sh\n.\nw\nq" | ed -s /usr/lib/dracut/modules.d/05rdma/module-setup.sh # systemctl daemon-reload # dracut -f # kexec -l /boot/vmlinuz-$(uname -r) \ --initrd=/boot/initramfs-$(uname -r).img \ --reuse-cmdline # systemctl kexec ~~~ This all worked for system with 3 (ie 6ports) ConnectX-3 Cards. However a system that has 7 cards (14 ports) we were hitting issues. We found that we had to wait 130 seconds. Once was in place we were able to get this system up.