Bug 1704165

Summary: [RHEL7]lvm2-lvmetad.service always failed to stop
Product: Red Hat Enterprise Linux 7 Reporter: Frank Liang <xiliang>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: LVM Metadata / lvmetad QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agk, cheshi, heinzm, jbrassow, ldoktor, ldu, leiwang, linl, mcsontos, msnitzer, prajnoha, vkuznets, wshi, yanqzhan, yisun, zkabelac
Version: 7.7   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.184-3.el7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 13:10:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frank Liang 2019-04-29 09:08:04 UTC
Start a RHEL7.7 xen hvm guest, lvm2-lvmetad.service cannot be stopped within timeout.
When I tried to restart this service, there is new fail message recorded in  journalctl log.

Apr 29 16:40:27 dhcp-14-158.nay.redhat.com systemd[1]: lvm2-lvmetad.service stop-sigterm timed out. Killing.
Apr 29 16:40:27 dhcp-14-158.nay.redhat.com systemd[1]: lvm2-lvmetad.service: main process exited, code=killed, status=9/KILL
Apr 29 16:40:27 dhcp-14-158.nay.redhat.com systemd[1]: Stopped LVM2 metadata daemon.
Apr 29 16:40:27 dhcp-14-158.nay.redhat.com systemd[1]: Unit lvm2-lvmetad.service entered failed state.
Apr 29 16:40:27 dhcp-14-158.nay.redhat.com systemd[1]: lvm2-lvmetad.service failed.
Apr 29 16:40:27 dhcp-14-158.nay.redhat.com systemd[1]: Started LVM2 metadata daemon.

Because it cannot be stopped, system will wait 90s for it timeout in each reboot.
[root@dhcp-14-158 ~]# reboot
[     *] A stop job is running for LVM2 metadata daemon (48s / 1min 31s)

[root@dhcp-14-158 ~]# lsblk
NAME                     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda                     202:0    0    8G  0 disk
├─xvda1                  202:1    0    1G  0 part /boot
└─xvda2                  202:2    0    7G  0 part
  ├─rhel_dhcp--1--8-root 253:0    0  6.2G  0 lvm  /
  └─rhel_dhcp--1--8-swap 253:1    0  820M  0 lvm  [SWAP]
[root@dhcp-14-158 ~]# uname -r
3.10.0-1040.el7.x86_64
[root@dhcp-14-158 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-1040.el7.x86_64 root=/dev/mapper/rhel_dhcp--1--8-root ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rd.lvm.lv=rhel_dhcp-1-8/root rd.lvm.lv=rhel_dhcp-1-8/swap rhgb quiet LANG=en_US.UTF-8

RHEL Version:
RHEL7.7(3.10.0-1040.el7.x86_64 )

How reproducible:
100%

Steps to Reproduce:

1. Start a RHEL7.7 xen hvm guest

2. Reboot system

Actual results: 

Need wait 90s for this service stopping in each reboot.

Expected results:

Reboot without this timeout
Additional info:

- Can reproduce this issue in kvm guest too.

- Cannot reproduce this issue with "3.10.0-990.el7.x86_64" installed

Comment 2 Zdenek Kabelac 2019-04-29 10:25:54 UTC
Hi

What was the version of lvm2 in use ?

Are you suggesting that with *SAME* version of lvm2 it works with kernel 3.10.0-990  and it does not work with 3.10.0-1040 ?

Comment 3 Zdenek Kabelac 2019-04-29 11:29:52 UTC
Hmm I think we have reproduced the issue.


It's enough to start with running 'lvmetad' (doing single 'lvs' call after service start is enough)
and then there is  'endless' shutdown until systemd sends SIGKILL (-9)


It looks the SIGTERM is properly recognized and 
_shutdown_requested  is set to '1'
however   s.threads->next  is NOT NULL - so it looks like thread is still linked in list - but there is no thread ATM - so the update of linked list got broken...

To be further explored....

Comment 4 Zdenek Kabelac 2019-04-29 11:38:54 UTC
Ok - problem is missing call o  _reap() function call in this code path.

Comment 6 Zdenek Kabelac 2019-04-29 11:52:00 UTC
*** Bug 1703902 has been marked as a duplicate of this bug. ***

Comment 8 Frank Liang 2019-05-05 07:06:29 UTC
There is no such issue with lvm2-2.02.184-3.el7.x86_64 installed. I will set this bug to 'VERIFIED' when next RHEL7.7 compose is available.

Comment 9 ldu 2019-05-10 01:52:56 UTC
*** Bug 1708037 has been marked as a duplicate of this bug. ***

Comment 10 Frank Liang 2019-05-15 08:29:01 UTC
No such issue in RHEL-7.7-20190514.2 build, so set it to 'VERIFIED'.
[root@localhost ~]# rpm -qa|grep lvm2
lvm2-2.02.185-1.el7.x86_64
lvm2-libs-2.02.185-1.el7.x86_64
[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-1048.el7.x86_64 #1 SMP Sat May 11 16:13:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# reboot
Rebooting.
[  248.032213] Restarting system.
[root@localhost ~]#

Comment 11 Zdenek Kabelac 2019-05-16 12:30:02 UTC
*** Bug 1707528 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2019-08-06 13:10:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2253