Bug 1348327

Summary: vgcfgrestore segfaults if attempted with missing PV
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Command-line tools QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: agk, heinzm, jbrassow, mnavrati, msnitzer, prajnoha, prockai, rbednar, teigland, thornber, zkabelac
Version: 7.3Keywords: Regression, TestBlocker
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.160-1.el7 Doc Type: No Doc Update
Doc Text:
Intra-release bug, no documentation needed.
Story Points: ---
Clone Of:
: 1583805 (view as bug list) Environment:
Last Closed: 2016-11-04 04:21:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1583805    

Description Corey Marthaler 2016-06-20 20:11:36 UTC
Description of problem:
This appears to be a regression of the check for bug 871630.


host-085: pvcreate /dev/sda2 /dev/sda1 /dev/sdc2 /dev/sdc1 /dev/sdg2 /dev/sdg1 /dev/sdf2 /dev/sdf1 /dev/sdb2 /dev/sdb1
host-085: vgcreate  raid_sanity /dev/sda2 /dev/sda1 /dev/sdc2 /dev/sdc1 /dev/sdg2 /dev/sdg1 /dev/sdf2 /dev/sdf1 /dev/sdb2 /dev/sdb1

============================================================
Iteration 1 of 2 started at Mon Jun 20 14:57:44 CDT 2016
============================================================
SCENARIO (raid1) - [vgcfgrestore_raid_with_missing_pv]
Create a raid, force remove a leg, and then restore it's VG
host-085: lvcreate  --nosync --type raid1 -m 1 -n missing_pv_raid -L 100M raid_sanity
  WARNING: New raid1 won't be synchronised. Don't read what you didn't write!

Deactivating missing_pv_raid raid
Backup the VG config
host-085 vgcfgbackup -f /tmp/raid_sanity.bkup.29027 raid_sanity

Force removing PV /dev/sda2 (used in this raid)
host-085: 'pvremove -ff --yes /dev/sda2'
  WARNING: PV /dev/sda2 is used by VG raid_sanity
  WARNING: Wiping physical volume label from /dev/sda2 of volume group "raid_sanity"
Verifying that this VG is now corrupt
  WARNING: Device for PV EXUnKE-nBCx-AjYA-iMde-yMbh-Icva-OoKlHN not found or rejected by a filter.
  Failed to find physical volume "/dev/sda2".

Attempt to restore the VG back to it's original state (should not segfault)
host-085 vgcfgrestore -f /tmp/raid_sanity.bkup.29027 raid_sanity
  Couldn't find device with uuid EXUnKE-nBCx-AjYA-iMde-yMbh-Icva-OoKlHN.
Checking syslog to see if vgcfgrestore segfaulted

(gdb) bt
#0  0x00007f91c3bc7127 in __strncpy_sse2 () from /lib64/libc.so.6
#1  0x00007f91c4e752ce in strncpy (__len=32, __src=<optimized out>, __dest=0x7ffe1dc56770 " \361\373Æ<91>\177") at /usr/include/bits/string3.h:120
#2  lvmcache_info_from_pvid (pvid=<optimized out>, dev=0x0, valid_only=valid_only@entry=0) at cache/lvmcache.c:717
#3  0x00007f91c4e92de2 in _restore_vg_should_write_pv (do_pvcreate=0, pv=0x7f91c6fd8e30) at format_text/archiver.c:342
#4  backup_restore_vg (cmd=cmd@entry=0x7f91c6f21020, vg=vg@entry=0x7f91c6fd8bc0, drop_lvmetad=drop_lvmetad@entry=1, do_pvcreate=do_pvcreate@entry=0, pva=pva@entry=0x0)
    at format_text/archiver.c:449
#5  0x00007f91c4e9350d in backup_restore_from_file (cmd=cmd@entry=0x7f91c6f21020, vg_name=vg_name@entry=0x7ffe1dc56f62 "raid_sanity", file=<optimized out>, force=force@entry=0)
    at format_text/archiver.c:553
#6  0x00007f91c4e63c91 in vgcfgrestore (cmd=0x7f91c6f21020, argc=<optimized out>, argv=<optimized out>) at vgcfgrestore.c:63
#7  0x00007f91c4e4c7c0 in lvm_run_command (cmd=cmd@entry=0x7f91c6f21020, argc=1, argc@entry=4, argv=0x7ffe1dc56bd0, argv@entry=0x7ffe1dc56bb8) at lvmcmdline.c:1706
#8  0x00007f91c4e4d360 in lvm2_main (argc=4, argv=0x7ffe1dc56bb8) at lvmcmdline.c:2175
#9  0x00007f91c3b54b15 in __libc_start_main () from /lib64/libc.so.6
#10 0x00007f91c4e33ea1 in _start ()


Version-Release number of selected component (if applicable):
3.10.0-419.el7.x86_64

lvm2-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
lvm2-libs-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
lvm2-cluster-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-libs-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-event-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-event-libs-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-persistent-data-0.6.2-0.1.rc8.el7    BUILT: Wed May  4 02:56:34 CDT 2016
cmirror-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
sanlock-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
sanlock-lib-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
lvm2-lockd-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016


How reproducible:
Everytime

Comment 1 David Teigland 2016-06-20 21:05:26 UTC
This has already fixed indirectly by this commit which makes vgcfgrestore not use lvmetad:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=6ae22125c62ddea4340916a5e255d55844bfd087

$ vgcfgrestore -f /tmp/test.bak test
  Couldn't find device with uuid 7TXhnS-877L-KOp9-BTle-0E2C-23iD-Zh9n4E.
  Cannot restore Volume Group test with 1 PVs marked as missing.
  Restore failed.

However, the code would still benefit from being defensive in the function identified in the backtrace, so I've pushed out this check for a missing device:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f96de674905cd9f109cd19e03ba5e92ac84104b8

Comment 4 Roman Bednář 2016-07-11 09:10:47 UTC
Adding QA ACK for 7.3.

Comment 6 Corey Marthaler 2016-08-03 18:42:06 UTC
Fix verified in the latest rpms.


3.10.0-480.el7.x86_64
lvm2-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-libs-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-cluster-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016



[root@host-079 ~]# lvs -a -o +devices
  WARNING: Device for PV dQka0Y-fc5Q-r0ZY-Otpm-z1yY-wXas-cmfSfR not found or rejected by a filter.
  LV                         VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                
  missing_pv_raid            raid_sanity   Rwi---r-p- 100.00m                                                     missing_pv_raid_rimage_0(0),missing_pv_raid_rimage_1(0)
  [missing_pv_raid_rimage_0] raid_sanity   Iwi---r-p- 100.00m                                                     [unknown](1)                                           
  [missing_pv_raid_rimage_1] raid_sanity   Iwi---r--- 100.00m                                                     /dev/sdc1(1)                                           
  [missing_pv_raid_rmeta_0]  raid_sanity   ewi---r-p-   4.00m                                                     [unknown](0)                                           
  [missing_pv_raid_rmeta_1]  raid_sanity   ewi---r---   4.00m                                                     /dev/sdc1(0)                                           

[root@host-079 ~]# vgcfgrestore -f /tmp/raid_sanity.bkup.21159 raid_sanity
  Couldn't find device with uuid dQka0Y-fc5Q-r0ZY-Otpm-z1yY-wXas-cmfSfR.
  Cannot restore Volume Group raid_sanity with 1 PVs marked as missing.
  Restore failed.

Comment 8 errata-xmlrpc 2016-11-04 04:21:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html