Bug 1270338

Summary: pvscan segfaults if lvmlockd is running yet lvmetad is not
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: LVM lock daemon / lvmlockd QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, heinzm, jbrassow, prajnoha, rbednar, teigland, zkabelac
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.152-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 04:11:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1295577, 1313485    

Description Corey Marthaler 2015-10-09 16:53:40 UTC
Description of problem:
[root@host-080 /]# systemctl status lvm2-lvmetad
â lvm2-lvmetad.service - LVM2 metadata daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Fri 2015-10-09 11:22:46 CDT; 25min ago
     Docs: man:lvmetad(8)
 Main PID: 484 (code=exited, status=0/SUCCESS)

Oct 09 16:19:29 host-080.virt.lab.msp.redhat.com systemd[1]: Started LVM2 metadata daemon.
Oct 09 16:19:29 host-080.virt.lab.msp.redhat.com systemd[1]: Starting LVM2 metadata daemon...
Oct 09 11:22:46 host-080.virt.lab.msp.redhat.com systemd[1]: Stopping LVM2 metadata daemon...
Oct 09 11:22:46 host-080.virt.lab.msp.redhat.com systemd[1]: Stopped LVM2 metadata daemon.
[root@host-080 /]# ps -ef | grep lvmetad
root     13312 13076  0 11:48 pts/0    00:00:00 grep --color=auto lvmetad
[root@host-080 /]# grep use_lvmetad /etc/lvm/lvm.conf
        # See the use_lvmetad comment for a special case regarding filters.
        #     This is incompatible with lvmetad. If use_lvmetad is enabled,
        # Configuration option global/use_lvmetad.
        # while use_lvmetad was disabled, it must be stopped, use_lvmetad
        use_lvmetad = 1
[root@host-080 /]# systemctl status lvm2-lvmlockd
â lvm2-lvmlockd.service - LVM2 lock daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmlockd.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2015-10-09 11:46:13 CDT; 2min 56s ago
     Docs: man:lvmlockd(8)
 Main PID: 13227 (lvmlockd)
   CGroup: /system.slice/lvm2-lvmlockd.service
           ââ13227 /usr/sbin/lvmlockd -f

Oct 09 11:46:13 host-080.virt.lab.msp.redhat.com systemd[1]: Started LVM2 lock daemon.
Oct 09 11:46:13 host-080.virt.lab.msp.redhat.com systemd[1]: Starting LVM2 lock daemon...
Oct 09 11:46:13 host-080.virt.lab.msp.redhat.com lvmlockd[13227]: 1444409173 lvmlockd started
Oct 09 11:46:13 host-080.virt.lab.msp.redhat.com lvmlockd[13227]: 1444409173 lvmetad_open error 2
Oct 09 11:46:13 host-080.virt.lab.msp.redhat.com lvmlockd[13227]: [D] creating /run/lvm/lvmlockd.socket
Oct 09 11:46:13 host-080.virt.lab.msp.redhat.com lvmlockd[13227]: /run/lvm/lvmetad.socket: connect failed: No such file or directory
[root@host-080 /]# pvscan
  /run/lvm/lvmetad.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
  Skipping global lock: lockspace not found or started
  Cannot proceed since lvmetad is not active.
  Internal error: Daemon send: socket fd cannot be negative -1
  lvmetad_validate_global_cache set_global_info error 22
Segmentation fault (core dumped)

[root@host-080 /]# systemctl start lvm2-lvmetad
[root@host-080 /]# ps -ef | grep lvmetad
root     13392     1  0 11:49 ?        00:00:00 /usr/sbin/lvmetad -f
root     13394 13076  0 11:49 pts/0    00:00:00 grep --color=auto lvmetad

[root@host-080 /]# pvscan
  Skipping global lock: lockspace not found or started
  PV /dev/vda2   VG rhel_host-080   lvm2 [29.51 GiB / 44.00 MiB free]
  Total: 1 [29.51 GiB] / in use: 1 [29.51 GiB] / in no VG: 0 [0   ]


Version-Release number of selected component (if applicable):
3.10.0-322.el7.x86_64

lvm2-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-libs-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-cluster-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015


How reproducible:
Everytime

Comment 2 David Teigland 2015-10-09 17:25:11 UTC
Another instance of using lvmetad_used() instead of lvmetad_active().  Fixed here:
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=21a8ac0cd3a392feaa049ab509c4727eee548d6b

Comment 6 Mike McCune 2016-03-28 23:39:55 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 8 Corey Marthaler 2016-08-03 22:38:27 UTC
Fix verified in the latest rpms.

3.10.0-480.el7.x86_64
lvm2-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-libs-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-cluster-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
sanlock-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
sanlock-lib-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
lvm2-lockd-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016



[root@harding-02 ~]# systemctl status lvm2-lvmetad
   lvm2-lvmetad.service - LVM2 metadata daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled; vendor preset: enabled)
   Active: failed (Result: signal) since Wed 2016-08-03 17:30:50 CDT; 1min 13s ago
     Docs: man:lvmetad(8)
  Process: 6597 ExecStart=/usr/sbin/lvmetad -f (code=killed, signal=KILL)
 Main PID: 6597 (code=killed, signal=KILL)

Aug 03 15:56:15 harding-02.lab.msp.redhat.com systemd[1]: Started LVM2 metadata daemon.
Aug 03 15:56:15 harding-02.lab.msp.redhat.com systemd[1]: Starting LVM2 metadata daemon...
Aug 03 17:29:20 harding-02.lab.msp.redhat.com systemd[1]: Stopping LVM2 metadata daemon...
Aug 03 17:29:20 harding-02.lab.msp.redhat.com lvmetad[6597]: Failed to accept connection.
Aug 03 17:30:50 harding-02.lab.msp.redhat.com systemd[1]: lvm2-lvmetad.service stop-sigterm timed out. Killing.
Aug 03 17:30:50 harding-02.lab.msp.redhat.com systemd[1]: lvm2-lvmetad.service: main process exited, code=killed, status=9/KILL
Aug 03 17:30:50 harding-02.lab.msp.redhat.com systemd[1]: Stopped LVM2 metadata daemon.
Aug 03 17:30:50 harding-02.lab.msp.redhat.com systemd[1]: Unit lvm2-lvmetad.service entered failed state.
Aug 03 17:30:50 harding-02.lab.msp.redhat.com systemd[1]: lvm2-lvmetad.service failed.

[root@harding-02 ~]# ps -ef | grep lvmetad
root     12406 21277  0 17:32 pts/0    00:00:00 grep --color=auto lvmetad

[root@harding-02 ~]# grep use_lvmetad /etc/lvm/lvm.conf
        # See the use_lvmetad comment for a special case regarding filters.
        #     This is incompatible with lvmetad. If use_lvmetad is enabled,
        # Configuration option global/use_lvmetad.
        # while use_lvmetad was disabled, it must be stopped, use_lvmetad
       use_lvmetad = 1

[root@harding-02 ~]# systemctl status lvm2-lvmlockd
   lvm2-lvmlockd.service - LVM2 lock daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmlockd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-01 17:25:22 CDT; 2 days ago
     Docs: man:lvmlockd(8)
 Main PID: 4445 (lvmlockd)
   CGroup: /system.slice/lvm2-lvmlockd.service
           /usr/sbin/lvmlockd -f

Aug 01 17:25:22 harding-02.lab.msp.redhat.com systemd[1]: Started LVM2 lock daemon.
Aug 01 17:25:22 harding-02.lab.msp.redhat.com systemd[1]: Starting LVM2 lock daemon...
Aug 01 17:25:22 harding-02.lab.msp.redhat.com lvmlockd[4445]: [D] creating /run/lvm/lvmlockd.socket
Aug 01 17:25:22 harding-02.lab.msp.redhat.com lvmlockd[4445]: 1470090322 lvmlockd started
Aug 03 16:28:38 harding-02.lab.msp.redhat.com lvmlockd[4445]: 1470259718 S lvm_global lockspace hosts 1
Aug 03 16:28:43 harding-02.lab.msp.redhat.com lvmlockd[4445]: 1470259723 S lvm_global lockspace hosts 1
Aug 03 16:28:49 harding-02.lab.msp.redhat.com lvmlockd[4445]: 1470259729 S lvm_global lockspace hosts 1

[root@harding-02 ~]# pvscan
  Skipping global lock: lockspace not found or started
  PV /dev/sda2             VG rhel_harding-02   lvm2 [92.16 GiB / 0    free]
  PV /dev/sdb1             VG rhel_harding-02   lvm2 [93.16 GiB / 0    free]
  PV /dev/sdc1             VG rhel_harding-02   lvm2 [93.16 GiB / 0    free]
  PV /dev/mapper/mpathh1   VG snapper_thinp     lvm2 [249.96 GiB / 245.71 GiB free]
  PV /dev/mapper/mpathb1   VG snapper_thinp     lvm2 [249.96 GiB / 245.96 GiB free]
  PV /dev/mapper/mpatha1   VG snapper_thinp     lvm2 [249.96 GiB / 249.96 GiB free]
  PV /dev/mapper/mpathe1   VG snapper_thinp     lvm2 [249.96 GiB / 249.96 GiB free]
  PV /dev/mapper/mpathg1   VG snapper_thinp     lvm2 [249.96 GiB / 249.96 GiB free]
  PV /dev/mapper/mpathf1                        lvm2 [250.00 GiB]
  Total: 9 [1.74 TiB] / in use: 8 [1.49 TiB] / in no VG: 1 [250.00 GiB]
[root@harding-02 ~]# echo $?
0

Comment 10 errata-xmlrpc 2016-11-04 04:11:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html