Bug 1752635

Summary: systemd fails to start lvmlockd with sanlock running
Product: Red Hat Enterprise Linux 8 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Marian Csontos <mcsontos>
lvm2 sub component: LVM lock daemon / lvmlockd QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: agk, heinzm, jbrassow, mcsontos, prajnoha, teigland, wchadwic, zkabelac
Version: 8.1Keywords: Regression
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.03.05-5.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 22:35:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2019-09-16 18:44:45 UTC
Description of problem:
[root@host-087 ~]# systemctl status sanlock
â— sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-09-16 10:28:10 CDT; 3h 12min ago
  Process: 11732 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
 Main PID: 11737 (sanlock)
    Tasks: 6 (limit: 26213)
   Memory: 16.7M
   CGroup: /system.slice/sanlock.service
           ├─11737 /usr/sbin/sanlock daemon
           └─11739 /usr/sbin/sanlock daemon


[root@host-087 ~]# systemctl start lvmlockd
Job for lvmlockd.service failed because a timeout was exceeded.
See "systemctl status lvmlockd.service" and "journalctl -xe" for details.


Sep 16 13:40:51 host-093 systemd[1]: Starting LVM lock daemon...
Sep 16 13:40:51 host-093 lvmlockd[19607]: [D] creating /run/lvm/lvmlockd.socket
Sep 16 13:40:51 host-093 lvmlockd[19607]: 1568659251 lvmlockd started
Sep 16 13:42:21 host-093 systemd[1]: lvmlockd.service: Start operation timed out. Terminating.
Sep 16 13:42:21 host-093 systemd[1]: lvmlockd.service: Failed with result 'timeout'.
Sep 16 13:42:21 host-093 systemd[1]: Failed to start LVM lock daemon.


[root@host-087 ~]# systemctl status lvmlockd.service
â— lvmlockd.service - LVM lock daemon
   Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Mon 2019-09-16 13:42:21 CDT; 1min 30s ago
     Docs: man:lvmlockd(8)
  Process: 19776 ExecStart=/usr/sbin/lvmlockd --foreground (code=exited, status=0/SUCCESS)
 Main PID: 19776 (code=exited, status=0/SUCCESS)

Sep 16 13:40:51 host-087.virt.lab.msp.redhat.com systemd[1]: Starting LVM lock daemon...
Sep 16 13:40:51 host-087.virt.lab.msp.redhat.com lvmlockd[19776]: [D] creating /run/lvm/lvmlockd.socket
Sep 16 13:40:51 host-087.virt.lab.msp.redhat.com lvmlockd[19776]: 1568659251 lvmlockd started
Sep 16 13:42:21 host-087.virt.lab.msp.redhat.com systemd[1]: lvmlockd.service: Start operation timed out. Terminating.
Sep 16 13:42:21 host-087.virt.lab.msp.redhat.com systemd[1]: lvmlockd.service: Failed with result 'timeout'.
Sep 16 13:42:21 host-087.virt.lab.msp.redhat.com systemd[1]: Failed to start LVM lock daemon.



Version-Release number of selected component (if applicable):
kernel-4.18.0-144.el8    BUILT: Thu Sep 12 09:52:43 CDT 2019
lvm2-2.03.05-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
lvm2-libs-2.03.05-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
device-mapper-1.02.163-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
device-mapper-libs-1.02.163-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
device-mapper-event-1.02.163-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
device-mapper-event-libs-1.02.163-4.el8    BUILT: Sun Aug 18 11:44:11 CDT 2019
device-mapper-persistent-data-0.8.5-2.el8    BUILT: Wed Jun  5 10:28:04 CDT 2019
sanlock-3.8.0-2.el8    BUILT: Wed Jun 12 15:50:27 CDT 2019
sanlock-lib-3.8.0-2.el8    BUILT: Wed Jun 12 15:50:27 CDT 2019


How reproducible:
Everytime

Comment 1 David Teigland 2019-09-16 18:47:40 UTC
This is a regression caused by this commit:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=71af650760c2b75b3c267277f47d8ba235421300

I reverted that commit on the master branch which fixed the problem:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=6e1cf248d4a9520b6e342dc583f1e4f7d4c5f8fa

Comment 4 Corey Marthaler 2019-09-17 16:49:00 UTC
Verified that these operations *did* indeed work in rhel8.0.


kernel-4.18.0-80.el8    BUILT: Wed Mar 13 07:47:44 CDT 2019
lvm2-2.03.02-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
lvm2-libs-2.03.02-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
lvm2-lockd-2.03.02-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-libs-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-event-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-event-libs-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-persistent-data-0.7.6-1.el8    BUILT: Sun Aug 12 04:21:55 CDT 2018
sanlock-3.6.0-5.el8    BUILT: Thu Dec  6 13:31:26 CST 2018
sanlock-lib-3.6.0-5.el8    BUILT: Thu Dec  6 13:31:26 CST 2018



[root@host-087 ~]# systemctl status sanlock
â— sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2019-09-17 11:44:20 CDT; 2min 43s ago
  Process: 11576 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
 Main PID: 11580 (sanlock)
    Tasks: 6 (limit: 26213)
   Memory: 16.4M
   CGroup: /system.slice/sanlock.service
           ├─11580 /usr/sbin/sanlock daemon
           └─11581 /usr/sbin/sanlock daemon

Sep 17 11:44:20 host-087.virt.lab.msp.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
Sep 17 11:44:20 host-087.virt.lab.msp.redhat.com systemd[1]: Started Shared Storage Lease Manager.


[root@host-087 ~]# systemctl status lvmlockd
â— lvmlockd.service - LVM lock daemon
   Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2019-09-17 11:44:21 CDT; 2min 58s ago
     Docs: man:lvmlockd(8)
 Main PID: 11597 (lvmlockd)
    Tasks: 3 (limit: 26213)
   Memory: 2.6M
   CGroup: /system.slice/lvmlockd.service
           └─11597 /usr/sbin/lvmlockd --foreground

Sep 17 11:44:20 host-087.virt.lab.msp.redhat.com systemd[1]: Starting LVM lock daemon...
Sep 17 11:44:21 host-087.virt.lab.msp.redhat.com lvmlockd[11597]: [D] creating /run/lvm/lvmlockd.socket
Sep 17 11:44:21 host-087.virt.lab.msp.redhat.com lvmlockd[11597]: 1568738661 lvmlockd started
Sep 17 11:44:21 host-087.virt.lab.msp.redhat.com systemd[1]: Started LVM lock daemon.

Comment 5 David Teigland 2019-09-17 17:57:19 UTC
This is a scratch build with the bad commit reverted
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23540800

Comment 6 Corey Marthaler 2019-09-18 14:51:03 UTC
I ran some basic level lvm regression tests on this scratch build over night and saw no issues.

Comment 9 Corey Marthaler 2019-09-30 16:30:24 UTC
Fix verified in the latest rpms.

kernel-4.18.0-147.el8    BUILT: Thu Sep 26 11:15:44 CDT 2019
lvm2-2.03.05-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
lvm2-libs-2.03.05-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
lvm2-dbusd-2.03.05-5.el8    BUILT: Thu Sep 26 01:43:33 CDT 2019
lvm2-lockd-2.03.05-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-libs-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-event-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-event-libs-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-persistent-data-0.8.5-2.el8    BUILT: Wed Jun  5 10:28:04 CDT 2019
sanlock-3.8.0-2.el8    BUILT: Wed Jun 12 15:50:27 CDT 2019
sanlock-lib-3.8.0-2.el8    BUILT: Wed Jun 12 15:50:27 CDT 2019


Turning ON lvmlockd in lvm.conf
        host-087
        host-093

Starting sanlock on
        host-087
        host-093

Starting lvmlockd on
        host-087
        host-093


[root@host-093 ~]# systemctl status sanlock
â— sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-09-30 11:27:51 CDT; 29s ago
  Process: 31740 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
 Main PID: 31744 (sanlock)
    Tasks: 6 (limit: 26213)
   Memory: 14.6M
   CGroup: /system.slice/sanlock.service
           ├─31744 /usr/sbin/sanlock daemon
           └─31745 /usr/sbin/sanlock daemon

Sep 30 11:27:51 host-093.virt.lab.msp.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
Sep 30 11:27:51 host-093.virt.lab.msp.redhat.com systemd[1]: Started Shared Storage Lease Manager.

[root@host-093 ~]# systemctl status lvmlockd
â— lvmlockd.service - LVM lock daemon
   Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-09-30 11:27:52 CDT; 43s ago
     Docs: man:lvmlockd(8)
 Main PID: 31761 (lvmlockd)
    Tasks: 3 (limit: 26213)
   Memory: 2.8M
   CGroup: /system.slice/lvmlockd.service
           └─31761 /usr/sbin/lvmlockd --foreground

Sep 30 11:27:52 host-093.virt.lab.msp.redhat.com systemd[1]: Starting LVM lock daemon...
Sep 30 11:27:52 host-093.virt.lab.msp.redhat.com lvmlockd[31761]: [D] creating /run/lvm/lvmlockd.socket
Sep 30 11:27:52 host-093.virt.lab.msp.redhat.com lvmlockd[31761]: 1569860872 lvmlockd started
Sep 30 11:27:52 host-093.virt.lab.msp.redhat.com systemd[1]: Started LVM lock daemon.

Comment 11 errata-xmlrpc 2019-11-05 22:35:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3654