Bug 1348327

Summary:	vgcfgrestore segfaults if attempted with missing PV
Product:	Red Hat Enterprise Linux 7	Reporter:	Corey Marthaler <cmarthal>
Component:	lvm2	Assignee:	LVM and device-mapper development team <lvm-team>
lvm2 sub component:	Command-line tools	QA Contact:	cluster-qe <cluster-qe>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	agk, heinzm, jbrassow, mnavrati, msnitzer, prajnoha, prockai, rbednar, teigland, thornber, zkabelac
Version:	7.3	Keywords:	Regression, TestBlocker
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	lvm2-2.02.160-1.el7	Doc Type:	No Doc Update
Doc Text:	Intra-release bug, no documentation needed.	Story Points:	---
Clone Of:
Clones:	1583805 (view as bug list)		Environment:
Last Closed:	2016-11-04 04:21:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1583805

Description Corey Marthaler 2016-06-20 20:11:36 UTC

Description of problem:
This appears to be a regression of the check for bug 871630.


host-085: pvcreate /dev/sda2 /dev/sda1 /dev/sdc2 /dev/sdc1 /dev/sdg2 /dev/sdg1 /dev/sdf2 /dev/sdf1 /dev/sdb2 /dev/sdb1
host-085: vgcreate  raid_sanity /dev/sda2 /dev/sda1 /dev/sdc2 /dev/sdc1 /dev/sdg2 /dev/sdg1 /dev/sdf2 /dev/sdf1 /dev/sdb2 /dev/sdb1

============================================================
Iteration 1 of 2 started at Mon Jun 20 14:57:44 CDT 2016
============================================================
SCENARIO (raid1) - [vgcfgrestore_raid_with_missing_pv]
Create a raid, force remove a leg, and then restore it's VG
host-085: lvcreate  --nosync --type raid1 -m 1 -n missing_pv_raid -L 100M raid_sanity
  WARNING: New raid1 won't be synchronised. Don't read what you didn't write!

Deactivating missing_pv_raid raid
Backup the VG config
host-085 vgcfgbackup -f /tmp/raid_sanity.bkup.29027 raid_sanity

Force removing PV /dev/sda2 (used in this raid)
host-085: 'pvremove -ff --yes /dev/sda2'
  WARNING: PV /dev/sda2 is used by VG raid_sanity
  WARNING: Wiping physical volume label from /dev/sda2 of volume group "raid_sanity"
Verifying that this VG is now corrupt
  WARNING: Device for PV EXUnKE-nBCx-AjYA-iMde-yMbh-Icva-OoKlHN not found or rejected by a filter.
  Failed to find physical volume "/dev/sda2".

Attempt to restore the VG back to it's original state (should not segfault)
host-085 vgcfgrestore -f /tmp/raid_sanity.bkup.29027 raid_sanity
  Couldn't find device with uuid EXUnKE-nBCx-AjYA-iMde-yMbh-Icva-OoKlHN.
Checking syslog to see if vgcfgrestore segfaulted

(gdb) bt
#0  0x00007f91c3bc7127 in __strncpy_sse2 () from /lib64/libc.so.6
#1  0x00007f91c4e752ce in strncpy (__len=32, __src=<optimized out>, __dest=0x7ffe1dc56770 " \361\373Æ<91>\177") at /usr/include/bits/string3.h:120
#2  lvmcache_info_from_pvid (pvid=<optimized out>, dev=0x0, valid_only=valid_only@entry=0) at cache/lvmcache.c:717
#3  0x00007f91c4e92de2 in _restore_vg_should_write_pv (do_pvcreate=0, pv=0x7f91c6fd8e30) at format_text/archiver.c:342
#4  backup_restore_vg (cmd=cmd@entry=0x7f91c6f21020, vg=vg@entry=0x7f91c6fd8bc0, drop_lvmetad=drop_lvmetad@entry=1, do_pvcreate=do_pvcreate@entry=0, pva=pva@entry=0x0)
    at format_text/archiver.c:449
#5  0x00007f91c4e9350d in backup_restore_from_file (cmd=cmd@entry=0x7f91c6f21020, vg_name=vg_name@entry=0x7ffe1dc56f62 "raid_sanity", file=<optimized out>, force=force@entry=0)
    at format_text/archiver.c:553
#6  0x00007f91c4e63c91 in vgcfgrestore (cmd=0x7f91c6f21020, argc=<optimized out>, argv=<optimized out>) at vgcfgrestore.c:63
#7  0x00007f91c4e4c7c0 in lvm_run_command (cmd=cmd@entry=0x7f91c6f21020, argc=1, argc@entry=4, argv=0x7ffe1dc56bd0, argv@entry=0x7ffe1dc56bb8) at lvmcmdline.c:1706
#8  0x00007f91c4e4d360 in lvm2_main (argc=4, argv=0x7ffe1dc56bb8) at lvmcmdline.c:2175
#9  0x00007f91c3b54b15 in __libc_start_main () from /lib64/libc.so.6
#10 0x00007f91c4e33ea1 in _start ()


Version-Release number of selected component (if applicable):
3.10.0-419.el7.x86_64

lvm2-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
lvm2-libs-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
lvm2-cluster-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-libs-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-event-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-event-libs-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-persistent-data-0.6.2-0.1.rc8.el7    BUILT: Wed May  4 02:56:34 CDT 2016
cmirror-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
sanlock-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
sanlock-lib-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
lvm2-lockd-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016


How reproducible:
Everytime

Comment 1 David Teigland 2016-06-20 21:05:26 UTC

This has already fixed indirectly by this commit which makes vgcfgrestore not use lvmetad:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=6ae22125c62ddea4340916a5e255d55844bfd087

$ vgcfgrestore -f /tmp/test.bak test
  Couldn't find device with uuid 7TXhnS-877L-KOp9-BTle-0E2C-23iD-Zh9n4E.
  Cannot restore Volume Group test with 1 PVs marked as missing.
  Restore failed.

However, the code would still benefit from being defensive in the function identified in the backtrace, so I've pushed out this check for a missing device:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f96de674905cd9f109cd19e03ba5e92ac84104b8

Comment 4 Roman Bednář 2016-07-11 09:10:47 UTC

Adding QA ACK for 7.3.

Comment 6 Corey Marthaler 2016-08-03 18:42:06 UTC

Fix verified in the latest rpms.


3.10.0-480.el7.x86_64
lvm2-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-libs-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-cluster-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016



[root@host-079 ~]# lvs -a -o +devices
  WARNING: Device for PV dQka0Y-fc5Q-r0ZY-Otpm-z1yY-wXas-cmfSfR not found or rejected by a filter.
  LV                         VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                
  missing_pv_raid            raid_sanity   Rwi---r-p- 100.00m                                                     missing_pv_raid_rimage_0(0),missing_pv_raid_rimage_1(0)
  [missing_pv_raid_rimage_0] raid_sanity   Iwi---r-p- 100.00m                                                     [unknown](1)                                           
  [missing_pv_raid_rimage_1] raid_sanity   Iwi---r--- 100.00m                                                     /dev/sdc1(1)                                           
  [missing_pv_raid_rmeta_0]  raid_sanity   ewi---r-p-   4.00m                                                     [unknown](0)                                           
  [missing_pv_raid_rmeta_1]  raid_sanity   ewi---r---   4.00m                                                     /dev/sdc1(0)                                           

[root@host-079 ~]# vgcfgrestore -f /tmp/raid_sanity.bkup.21159 raid_sanity
  Couldn't find device with uuid dQka0Y-fc5Q-r0ZY-Otpm-z1yY-wXas-cmfSfR.
  Cannot restore Volume Group raid_sanity with 1 PVs marked as missing.
  Restore failed.

Comment 8 errata-xmlrpc 2016-11-04 04:21:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html