Bug 1543167
| Summary: | LVM2 Metadata and Device-Mapper daemons do not shut down correctly after a snapshot merge | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Rob Leese <rleese> |
| Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> |
| lvm2 sub component: | LVM Metadata / lvmetad | QA Contact: | cluster-qe <cluster-qe> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | agk, akarvi, cmarthal, heinzm, jbrassow, loberman, mcsontos, msnitzer, mtowey, prajnoha, rhandlin, zkabelac |
| Version: | 7.5 | ||
| Target Milestone: | rc | ||
| Target Release: | 7.5 | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.179-3.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-30 11:02:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1546181, 1546815, 1549666 | ||
|
Description
Rob Leese
2018-02-07 21:25:03 UTC
*** Bug 1543991 has been marked as a duplicate of this bug. *** Pushed upstream 2 fixes: https://www.redhat.com/archives/lvm-devel/2018-June/msg00100.html https://www.redhat.com/archives/lvm-devel/2018-June/msg00099.html It might be needed to also enhance systemd service files to lvmetad socked conflicts with shutdown.target and possibly we may also want to disable usage of lvmetad (--config 'global/use_lvmetad=0') with unmonitoring VG. What is the new expected behavior to verify in the latest rpms? It still appears the 1m 31s timeout exists, although, I didn't see any "Sending SIGTERM" or watchdog messages this time. Is this now correct?
[root@host-093 ~]# lvcreate -L 512M -s -n snaplv /dev/rhel_host-093/root
Logical volume "snaplv" created.
[root@host-093 ~]# lvs -a -o +devices
LV VG Attr LSize Pool Origin Data% Meta% Devices
[lvol0_pmspare] rhel_host-093 ewi------- 4.00m /dev/vda2(205)
pool00 rhel_host-093 twi-aotz-- <4.79g 48.34 29.79 pool00_tdata(0)
[pool00_tdata] rhel_host-093 Twi-ao---- <4.79g /dev/vda2(206)
[pool00_tmeta] rhel_host-093 ewi-ao---- 4.00m /dev/vda2(1431)
root rhel_host-093 owi-aotz-- <4.79g pool00 48.34
snaplv rhel_host-093 swi-a-s--- 512.00m root 0.03 /dev/vda2(1432)
swap rhel_host-093 -wi-ao---- 820.00m /dev/vda2(0)
[root@host-093 ~]# dd if=/dev/zero of=/root/tempfile bs=1M count=300 ; sync
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 0.986615 s, 319 MB/s
[root@host-093 ~]# lvconvert --merge rhel_host-093/snaplv
Delaying merge since origin is open.
Merging of snapshot rhel_host-093/snaplv will occur on next activation of rhel_host-093/root.
[root@host-093 ~]# lvs -ao +devices
LV VG Attr LSize Pool Origin Data% Meta% Devices
[lvol0_pmspare] rhel_host-093 ewi------- 4.00m /dev/vda2(205)
pool00 rhel_host-093 twi-aotz-- <4.79g 52.45 32.13 pool00_tdata(0)
[pool00_tdata] rhel_host-093 Twi-ao---- <4.79g /dev/vda2(206)
[pool00_tmeta] rhel_host-093 ewi-ao---- 4.00m /dev/vda2(1431)
root rhel_host-093 Owi-aotz-- <4.79g pool00 52.45
[snaplv] rhel_host-093 Swi-a-s--- 512.00m root 58.92 /dev/vda2(1432)
swap rhel_host-093 -wi-ao---- 820.00m /dev/vda2(0)
[root@host-093 ~]# reboot
Connection to host-093 closed by remote host.
Connection to host-093 closed.
Aug 13 16:33:38 host-093 lvm[19824]: Monitoring snapshot rhel_host--093-snaplv.
Aug 13 16:33:38 host-093 lvm[19824]: Monitoring thin pool rhel_host--093-pool00-tpool.
Aug 13 16:34:41 host-093 systemd: Stopping Session 1 of user root.
Aug 13 16:34:41 host-093 systemd: Removed slice system-selinux\x2dpolicy\x2dmigrate\x2dlocal\x2dchanges.slice.
Aug 13 16:34:41 host-093 systemd: Stopped target rpc_pipefs.target.
Aug 13 16:34:41 host-093 systemd: Stopping Authorization Manager...
Aug 13 16:34:41 host-093 systemd: Closed LVM2 poll daemon socket.
Aug 13 16:34:41 host-093 systemd: Unmounting RPC Pipe File System...
Aug 13 16:34:41 host-093 systemd: Stopped target Multi-User System.
Aug 13 16:34:41 host-093 systemd: Stopping libstoragemgmt plug-in server daemon...
Aug 13 16:34:41 host-093 systemd: Stopping Postfix Mail Transport Agent...
Aug 13 16:34:41 host-093 systemd: Stopping OpenSSH server daemon...
[448637.958323] beah-beaker-backend[1716]: 2018-08-13 16:34:41,133 backend set_controller: INFO Connection to controller lost.
[448638.028693] beah-beaker-backend[1716]: 2018-08-13 16:34:41,138 backend linfo: INFO BackendFactory: Lost connection. Reason: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.
[448638.029750] beah-beaker-backend[1716]: ]
[448638.031579] beah-beaker-backend[1716]: 2018-08-13 16:34:41,140 backend set_controller: INFO Connection to controller lost.
[448638.044334] beah-fwd-backend[1713]: 2018-08-13 16:34:41,235 backend linfo: INFO BackendFactory: Lost connection. Reason: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.
[448638.045574] beah-fwd-backend[1713]: ]
[448638.046702] beah-srv[1715]: This is a Controller server.
[448638.049000] beah-srv[1715]: Type <Ctrl-C> to exit. Alternatively issue a kill command from cmd_backend.
[[32m OK [0m] Removed slice User Slice of root.
Stopping Login Service...
[[32m OK [0m] Stopped /etc/rc.d/rc.local Compatibility.
Stopping Permit User Sessions...
[[32m OK [0m] Removed slice system-serial\x2dgetty.slice.
[[32m OK [0m] Stopped Login Service.
[[32m OK [0m] Stopped LSB: Starts the Spacewalk Daemon.
[[32m OK [0m] Stopped target Network is Online.
[[32m OK [0m] Stopped target System Time Synchronized.
[[32m OK [0m] Stopped Wait for chrony to synchronize system clock.
Stopping NTP client/server...
[[32m OK [0m] Stopped Permit User Sessions.
[[32m OK [0m] Stopped NTP client/server.
[[32m OK [0m] Stopped target Remote File Systems.
[[32m OK [0m] Stopped target Remote File Systems (Pre).
[[32m OK [0m] Stopped target NFS client services.
Stopping GSSAPI Proxy Daemon...
[[32m OK [0m] Stopped GSSAPI Proxy Daemon.
[[32m OK [0m] Stopped VDO volume services.
[[32m OK [0m] Stopped Availability of block devices.
[[32m OK [0m] Stopped Login and scanning of iSCSI devices.
Stopping Logout off all iSCSI sessions on shutdown...
Stopping LVM2 metadata daemon...
[ ***] (1 of 2) A stop job is running for ...ata daemon (1min 16s / 1min 30s)
[ *] (1 of 2) A stop job is running for ...data daemon (1min 31s / 3min 1s)[448727.270759] audit_printk_skb: 18 callbacks suppressed
[448727.272145] type=1130 audit(1534196172.094:22740): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=lvm2-lvmetad comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[ OK ] Stopped LVM2 metadata daemon.
[ OK ] Closed LVM2 metadata daemon socket.
[[448729.521268] type=1130 audit(1534196174.345:22741): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dm-event comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
OK ] Stopped Device-mapper event daemon.
[ OK ] Reached target Shutdown.
[448727.272145] type=1130 audit(1534196172.094:22740): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=lvm2-lvmetad comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[[32m OK [0m] Stopped LVM2 metadata daemon.
[[32m OK [0m] Closed LVM2 metadata daemon socket.
[[448729.521268] type=1130 audit(1534196174.345:22741): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dm-event comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[32m OK [0m] Stopped Device-mapper event daemon.
[[32m OK [0m] Reached target Shutdown.
[root@host-093 ~]# uptime
16:39:06 up 1 min, 1 user, load average: 0.72, 0.27, 0.10
[root@host-093 ~]# lvs -a -o +devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
[lvol0_pmspare] rhel_host-093 ewi------- 4.00m /dev/vda2(205)
pool00 rhel_host-093 twi-aotz-- <4.79g 52.45 32.13 pool00_tdata(0)
[pool00_tdata] rhel_host-093 Twi-ao---- <4.79g /dev/vda2(206)
[pool00_tmeta] rhel_host-093 ewi-ao---- 4.00m /dev/vda2(1431)
root rhel_host-093 Vwi-aotz-- <4.79g pool00 52.45
swap rhel_host-093 -wi-ao---- 820.00m /dev/vda2(0)
Timeout is definitely a bug. The question is - is it hanging on dmeventd - or now it's lvmetad being a 'resource holder/blocker' It's always good idea to enable 'systemd' debuging I've on my rawhide - so with 'Alt+F9' there is quick way how to jump to systemd shell and be able to collect at least 'ps aux' trace and just current journal log and systemctl status - would be possible to attachem then to this BZ ? I tried the same example in comment #8 twice, and both times shutdown "just worked". Marking this verified in the latest rpms. I'll file a new bug for the hang in comment #8, if I'm ever able to reproduce it again. 3.10.0-954.el7.x86_64 lvm2-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 lvm2-libs-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 lvm2-cluster-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-libs-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-event-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-event-libs-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017 [root@host-073 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Devices [lvol0_pmspare] rhel_host-073 ewi------- 4.00m /dev/vda2(205) pool00 rhel_host-073 twi-aotz-- <4.79g 69.09 52.15 pool00_tdata(0) [pool00_tdata] rhel_host-073 Twi-ao---- <4.79g /dev/vda2(206) [pool00_tmeta] rhel_host-073 ewi-ao---- 4.00m /dev/vda2(1431) root rhel_host-073 Vwi-aotz-- <4.79g pool00 69.09 swap rhel_host-073 -wi-ao---- 820.00m /dev/vda2(0) [root@host-073 ~]# lvcreate -L 512M -s -n snaplv /dev/rhel_host-073/root Logical volume "snaplv" created. [root@host-073 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Devices [lvol0_pmspare] rhel_host-073 ewi------- 4.00m /dev/vda2(205) pool00 rhel_host-073 twi-aotz-- <4.79g 69.09 52.15 pool00_tdata(0) [pool00_tdata] rhel_host-073 Twi-ao---- <4.79g /dev/vda2(206) [pool00_tmeta] rhel_host-073 ewi-ao---- 4.00m /dev/vda2(1431) root rhel_host-073 owi-aotz-- <4.79g pool00 69.09 snaplv rhel_host-073 swi-a-s--- 512.00m root 0.03 /dev/vda2(1432) swap rhel_host-073 -wi-ao---- 820.00m /dev/vda2(0) [root@host-073 ~]# dd if=/dev/zero of=/root/tempfile bs=1M count=300 ; sync 300+0 records in 300+0 records out 314572800 bytes (315 MB) copied, 1.40484 s, 224 MB/s [root@host-073 ~]# lvconvert --merge rhel_host-073/snaplv Delaying merge since origin is open. Merging of snapshot rhel_host-073/snaplv will occur on next activation of rhel_host-073/root. Sep 25 14:08:48 host-073 dmeventd[2590]: No longer monitoring thin pool rhel_host--073-pool00-tpool. Sep 25 14:08:48 host-073 lvm[2590]: Monitoring snapshot rhel_host--073-snaplv. Sep 25 14:08:48 host-073 lvm[2590]: Monitoring thin pool rhel_host--073-pool00-tpool. [ OK ] Unmounted Configuration File System. [ OK ] Stopped target Swap. Deactivating swap /dev/disk/by-id/d...WLRYznWyZ1kyvlHolhi1355jpFiQh... [ 5368.843381] XFS (vda1): Unmounting Filesystem [ OK ] Unmounted /boot. [ OK ] Stopped target Local File Systems (Pre). [ OK [ 5368.860927] type=1131 audit(1537902621.445:233): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' ] Stopped Create Static Device Nodes in /dev. Stopping Monitoring of LVM2 mirrors... dmeventd or progress polling... [ OK [ 5368.873228] type=1131 audit(1537902621.458:234): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-remount-fs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' ] Stopped Remount Root and Kernel File Systems. [ OK ] Deactivated swap /dev/rhel_host-073/swap. [ OK ] Deactivated swap /dev/disk/by-uuid/...b5e-8b73-439e-bc47-7a1403bcda7c. [ OK ] Deactivated swap /dev/disk/by-id/dm...59WLRYznWyZ1kyvlHolhi1355jpFiQh. [ OK ] Deactivated swap /dev/disk/by-id/dm-name-rhel_host--073-swap. [ OK ] Deactivated swap /dev/dm-7. [ OK ] Deactivated swap /dev/mapper/rhel_host--073-swap. [ OK ] Reached target Unmount All Filesystems. [ OK ] Stopped Monitoring of LVM2 mirrors,...ng dmeventd or progress polling. Stopping LVM2 metadata daemon... Stopping Device-mapper event daemon... [ OK ] Stopped LVM2 metadata daemon. [ OK ] Stopped Device-mapper event daemon. [ OK ] Reached target Shutdown. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3193 |