Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1543167

Summary: LVM2 Metadata and Device-Mapper daemons do not shut down correctly after a snapshot merge
Product: Red Hat Enterprise Linux 7 Reporter: Rob Leese <rleese>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: LVM Metadata / lvmetad QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, akarvi, cmarthal, heinzm, jbrassow, loberman, mcsontos, msnitzer, mtowey, prajnoha, rhandlin, zkabelac
Version: 7.5   
Target Milestone: rc   
Target Release: 7.5   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.179-3.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 11:02:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1546181, 1546815, 1549666    

Description Rob Leese 2018-02-07 21:25:03 UTC
Description of problem:
When creating LVM snapshots of partitions, and then rolling the snapshots back (using lvconvert --merge) on the following reboot the "LVM2 Metadata" & "Device-mapper" daemons fail to stop and are timed out/killed by systemd.

Version-Release number of selected component (if applicable):
device-mapper-1.02.145-5.el7.x86_64
kernel-3.10.0-830.el7.x86_64
lvm2-2.02.176-5.el7.x86_64

How reproducible:


Steps to Reproduce:

[root@localhost ~]# lvcreate -L 512M -s -n snaplv /dev/rhel00/root
  Using default stripesize 64.00 KiB.
  Logical volume "snaplv" created.

[root@localhost ~]# dd if=/dev/zero of=/root/tempfile bs=1M count=300 ; sync
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 0.119493 s, 2.6 GB/s

[root@localhost ~]# lvconvert --merge /dev/rhel00/snaplv
  Can't merge until origin volume is closed.
  Merging of snapshot rhel00/snaplv will occur on next activation of rhel00/root.

[root@localhost ~]# lvs -ao +devices
  LV       VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root     rhel00 Owi-aos---  <6.20g                                                     /dev/sda2(205) 
  [snaplv] rhel00 Swi-a-s--- 512.00m      root   59.22                                   /dev/sda2(1791)
  swap     rhel00 -wi-ao---- 820.00m                                                     /dev/sda2(0)   

[root@localhost ~]# reboot


Actual results:
We see the 2 services fail to stop correctly, and they time out

Expected results:
Services should shut down correctly

Additional info:

Comment 5 Zdenek Kabelac 2018-06-09 17:42:45 UTC
*** Bug 1543991 has been marked as a duplicate of this bug. ***

Comment 6 Zdenek Kabelac 2018-06-11 20:45:17 UTC
Pushed upstream 2 fixes:


https://www.redhat.com/archives/lvm-devel/2018-June/msg00100.html
https://www.redhat.com/archives/lvm-devel/2018-June/msg00099.html


It might be needed to also enhance systemd service files to  lvmetad socked conflicts with shutdown.target and possibly we may also want to disable usage of lvmetad (--config 'global/use_lvmetad=0') with unmonitoring VG.

Comment 8 Corey Marthaler 2018-08-13 22:02:27 UTC
What is the new expected behavior to verify in the latest rpms? It still appears the 1m 31s timeout exists, although, I didn't see any "Sending SIGTERM" or watchdog messages this time. Is this now correct?



[root@host-093 ~]# lvcreate -L 512M -s -n snaplv /dev/rhel_host-093/root
  Logical volume "snaplv" created.
[root@host-093 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%   Devices        
  [lvol0_pmspare] rhel_host-093 ewi-------   4.00m                              /dev/vda2(205) 
  pool00          rhel_host-093 twi-aotz--  <4.79g               48.34  29.79   pool00_tdata(0)
  [pool00_tdata]  rhel_host-093 Twi-ao----  <4.79g                              /dev/vda2(206) 
  [pool00_tmeta]  rhel_host-093 ewi-ao----   4.00m                              /dev/vda2(1431)
  root            rhel_host-093 owi-aotz--  <4.79g pool00        48.34                         
  snaplv          rhel_host-093 swi-a-s--- 512.00m        root   0.03           /dev/vda2(1432)
  swap            rhel_host-093 -wi-ao---- 820.00m                              /dev/vda2(0)   
[root@host-093 ~]# dd if=/dev/zero of=/root/tempfile bs=1M count=300 ; sync
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 0.986615 s, 319 MB/s
[root@host-093 ~]# lvconvert --merge rhel_host-093/snaplv
  Delaying merge since origin is open.
  Merging of snapshot rhel_host-093/snaplv will occur on next activation of rhel_host-093/root.
[root@host-093 ~]# lvs -ao +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%   Devices        
  [lvol0_pmspare] rhel_host-093 ewi-------   4.00m                              /dev/vda2(205) 
  pool00          rhel_host-093 twi-aotz--  <4.79g               52.45  32.13   pool00_tdata(0)
  [pool00_tdata]  rhel_host-093 Twi-ao----  <4.79g                              /dev/vda2(206) 
  [pool00_tmeta]  rhel_host-093 ewi-ao----   4.00m                              /dev/vda2(1431)
  root            rhel_host-093 Owi-aotz--  <4.79g pool00        52.45                         
  [snaplv]        rhel_host-093 Swi-a-s--- 512.00m        root   58.92          /dev/vda2(1432)
  swap            rhel_host-093 -wi-ao---- 820.00m                              /dev/vda2(0)   
[root@host-093 ~]# reboot
Connection to host-093 closed by remote host.
Connection to host-093 closed.


Aug 13 16:33:38 host-093 lvm[19824]: Monitoring snapshot rhel_host--093-snaplv.
Aug 13 16:33:38 host-093 lvm[19824]: Monitoring thin pool rhel_host--093-pool00-tpool.
Aug 13 16:34:41 host-093 systemd: Stopping Session 1 of user root.
Aug 13 16:34:41 host-093 systemd: Removed slice system-selinux\x2dpolicy\x2dmigrate\x2dlocal\x2dchanges.slice.
Aug 13 16:34:41 host-093 systemd: Stopped target rpc_pipefs.target.
Aug 13 16:34:41 host-093 systemd: Stopping Authorization Manager...
Aug 13 16:34:41 host-093 systemd: Closed LVM2 poll daemon socket.
Aug 13 16:34:41 host-093 systemd: Unmounting RPC Pipe File System...
Aug 13 16:34:41 host-093 systemd: Stopped target Multi-User System.
Aug 13 16:34:41 host-093 systemd: Stopping libstoragemgmt plug-in server daemon...
Aug 13 16:34:41 host-093 systemd: Stopping Postfix Mail Transport Agent...
Aug 13 16:34:41 host-093 systemd: Stopping OpenSSH server daemon...
[448637.958323] beah-beaker-backend[1716]: 2018-08-13 16:34:41,133 backend set_controller: INFO Connection to controller lost.
[448638.028693] beah-beaker-backend[1716]: 2018-08-13 16:34:41,138 backend linfo: INFO BackendFactory: Lost connection.  Reason: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.
[448638.029750] beah-beaker-backend[1716]: ]
[448638.031579] beah-beaker-backend[1716]: 2018-08-13 16:34:41,140 backend set_controller: INFO Connection to controller lost.
[448638.044334] beah-fwd-backend[1713]: 2018-08-13 16:34:41,235 backend linfo: INFO BackendFactory: Lost connection.  Reason: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.
[448638.045574] beah-fwd-backend[1713]: ]
[448638.046702] beah-srv[1715]: This is a Controller server.
[448638.049000] beah-srv[1715]: Type <Ctrl-C> to exit. Alternatively issue a kill command from cmd_backend.
[[32m  OK  [0m] Removed slice User Slice of root.
         Stopping Login Service...
[[32m  OK  [0m] Stopped /etc/rc.d/rc.local Compatibility.
         Stopping Permit User Sessions...
[[32m  OK  [0m] Removed slice system-serial\x2dgetty.slice.
[[32m  OK  [0m] Stopped Login Service.
[[32m  OK  [0m] Stopped LSB: Starts the Spacewalk Daemon.
[[32m  OK  [0m] Stopped target Network is Online.
[[32m  OK  [0m] Stopped target System Time Synchronized.
[[32m  OK  [0m] Stopped Wait for chrony to synchronize system clock.
         Stopping NTP client/server...
[[32m  OK  [0m] Stopped Permit User Sessions.
[[32m  OK  [0m] Stopped NTP client/server.
[[32m  OK  [0m] Stopped target Remote File Systems.
[[32m  OK  [0m] Stopped target Remote File Systems (Pre).
[[32m  OK  [0m] Stopped target NFS client services.
         Stopping GSSAPI Proxy Daemon...
[[32m  OK  [0m] Stopped GSSAPI Proxy Daemon.
[[32m  OK  [0m] Stopped VDO volume services.
[[32m  OK  [0m] Stopped Availability of block devices.
[[32m  OK  [0m] Stopped Login and scanning of iSCSI devices.
         Stopping Logout off all iSCSI sessions on shutdown...
         Stopping LVM2 metadata daemon...


[   ***] (1 of 2) A stop job is running for ...ata daemon (1min 16s / 1min 30s)
[     *] (1 of 2) A stop job is running for ...data daemon (1min 31s / 3min 1s)[448727.270759] audit_printk_skb: 18 callbacks suppressed
[448727.272145] type=1130 audit(1534196172.094:22740): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=lvm2-lvmetad comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[  OK  ] Stopped LVM2 metadata daemon.
[  OK  ] Closed LVM2 metadata daemon socket.
[[448729.521268] type=1130 audit(1534196174.345:22741): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dm-event comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
  OK  ] Stopped Device-mapper event daemon.
[  OK  ] Reached target Shutdown.


[448727.272145] type=1130 audit(1534196172.094:22740): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=lvm2-lvmetad comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[[32m  OK  [0m] Stopped LVM2 metadata daemon. 
[[32m  OK  [0m] Closed LVM2 metadata daemon socket.
[[448729.521268] type=1130 audit(1534196174.345:22741): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dm-event comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'   
[32m  OK  [0m] Stopped Device-mapper event daemon.
[[32m  OK  [0m] Reached target Shutdown.



[root@host-093 ~]# uptime
 16:39:06 up 1 min,  1 user,  load average: 0.72, 0.27, 0.10
[root@host-093 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  [lvol0_pmspare] rhel_host-093 ewi-------   4.00m                                                       /dev/vda2(205) 
  pool00          rhel_host-093 twi-aotz--  <4.79g               52.45  32.13                            pool00_tdata(0)
  [pool00_tdata]  rhel_host-093 Twi-ao----  <4.79g                                                       /dev/vda2(206) 
  [pool00_tmeta]  rhel_host-093 ewi-ao----   4.00m                                                       /dev/vda2(1431)
  root            rhel_host-093 Vwi-aotz--  <4.79g pool00        52.45                                                  
  swap            rhel_host-093 -wi-ao---- 820.00m                                                       /dev/vda2(0)

Comment 9 Zdenek Kabelac 2018-08-29 15:07:40 UTC
Timeout is definitely a bug.

The question is - is it hanging on dmeventd - or now it's lvmetad being a 'resource holder/blocker'

It's always good idea to enable  'systemd' debuging I've on my rawhide - so with 'Alt+F9' there is quick way how to jump to systemd shell and be able to collect at least 'ps aux'  trace
and just current journal log and systemctl status -  would be possible to attachem then to this BZ ?

Comment 10 Corey Marthaler 2018-09-25 19:14:54 UTC
I tried the same example in comment #8 twice, and both times shutdown "just worked". Marking this verified in the latest rpms. I'll file a new bug for the hang in comment #8, if I'm ever able to reproduce it again.


3.10.0-954.el7.x86_64
lvm2-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
lvm2-libs-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
lvm2-cluster-2.02.180-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-libs-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-event-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-event-libs-1.02.149-8.el7    BUILT: Mon Sep 10 04:45:22 CDT 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017




[root@host-073 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%   Devices        
  [lvol0_pmspare] rhel_host-073 ewi-------   4.00m                              /dev/vda2(205) 
  pool00          rhel_host-073 twi-aotz--  <4.79g               69.09  52.15   pool00_tdata(0)
  [pool00_tdata]  rhel_host-073 Twi-ao----  <4.79g                              /dev/vda2(206) 
  [pool00_tmeta]  rhel_host-073 ewi-ao----   4.00m                              /dev/vda2(1431)
  root            rhel_host-073 Vwi-aotz--  <4.79g pool00        69.09
  swap            rhel_host-073 -wi-ao---- 820.00m                              /dev/vda2(0)   

[root@host-073 ~]# lvcreate -L 512M -s -n snaplv /dev/rhel_host-073/root
  Logical volume "snaplv" created.
[root@host-073 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%   Devices        
  [lvol0_pmspare] rhel_host-073 ewi-------   4.00m                              /dev/vda2(205) 
  pool00          rhel_host-073 twi-aotz--  <4.79g               69.09  52.15   pool00_tdata(0)
  [pool00_tdata]  rhel_host-073 Twi-ao----  <4.79g                              /dev/vda2(206) 
  [pool00_tmeta]  rhel_host-073 ewi-ao----   4.00m                              /dev/vda2(1431)
  root            rhel_host-073 owi-aotz--  <4.79g pool00        69.09
  snaplv          rhel_host-073 swi-a-s--- 512.00m        root   0.03           /dev/vda2(1432)
  swap            rhel_host-073 -wi-ao---- 820.00m                              /dev/vda2(0)   

[root@host-073 ~]# dd if=/dev/zero of=/root/tempfile bs=1M count=300 ; sync
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 1.40484 s, 224 MB/s

[root@host-073 ~]# lvconvert --merge rhel_host-073/snaplv
  Delaying merge since origin is open.
  Merging of snapshot rhel_host-073/snaplv will occur on next activation of rhel_host-073/root.

Sep 25 14:08:48 host-073 dmeventd[2590]: No longer monitoring thin pool rhel_host--073-pool00-tpool.
Sep 25 14:08:48 host-073 lvm[2590]: Monitoring snapshot rhel_host--073-snaplv.
Sep 25 14:08:48 host-073 lvm[2590]: Monitoring thin pool rhel_host--073-pool00-tpool.


[  OK  ] Unmounted Configuration File System.
[  OK  ] Stopped target Swap.
         Deactivating swap /dev/disk/by-id/d...WLRYznWyZ1kyvlHolhi1355jpFiQh...
[ 5368.843381] XFS (vda1): Unmounting Filesystem
[  OK  ] Unmounted /boot.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  [ 5368.860927] type=1131 audit(1537902621.445:233): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
] Stopped Create Static Device Nodes in /dev.
         Stopping Monitoring of LVM2 mirrors... dmeventd or progress polling...
[  OK  [ 5368.873228] type=1131 audit(1537902621.458:234): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-remount-fs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
] Stopped Remount Root and Kernel File Systems.
[  OK  ] Deactivated swap /dev/rhel_host-073/swap.
[  OK  ] Deactivated swap /dev/disk/by-uuid/...b5e-8b73-439e-bc47-7a1403bcda7c.
[  OK  ] Deactivated swap /dev/disk/by-id/dm...59WLRYznWyZ1kyvlHolhi1355jpFiQh.
[  OK  ] Deactivated swap /dev/disk/by-id/dm-name-rhel_host--073-swap.
[  OK  ] Deactivated swap /dev/dm-7.
[  OK  ] Deactivated swap /dev/mapper/rhel_host--073-swap.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped Monitoring of LVM2 mirrors,...ng dmeventd or progress polling.
         Stopping LVM2 metadata daemon...
         Stopping Device-mapper event daemon...
[  OK  ] Stopped LVM2 metadata daemon.
[  OK  ] Stopped Device-mapper event daemon.
[  OK  ] Reached target Shutdown.

Comment 12 errata-xmlrpc 2018-10-30 11:02:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3193