Bug 1354646

Summary: vgreduce --removemissing of partial activated raid0 volume segfaults
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, rbednar, zkabelac
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.161-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 04:15:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2016-07-11 20:33:16 UTC
Description of problem:
This isn't really an applicable raid0 test case since there is no redundancy associated with it, but it still shouldn't segfault.



SCENARIO (raid0) - [partial_raid_activation_replace_missing_segment]
Create a raid, corrupt an image, and then reactivate it partially with an error dm target

Recreating PVs/VG with smaller sizes
pvcreate --setphysicalvolumesize 200M /dev/sda1 /dev/sda2 /dev/sdb1 /dev/sdb2 /dev/sdd1)
vgcreate raid_sanity /dev/sda1 /dev/sda2 /dev/sdb1 /dev/sdb2 /dev/sdd1
host-084: lvcreate  --type raid0 -i 2 -n partial_activation -L 188M raid_sanity /dev/sda1 /dev/sda2 /dev/sdb1
Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Deactivating volume group
vgchange -an raid_sanity
host-084: dd if=/dev/zero of=/dev/sda1 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0213283 s, 49.2 MB/s
pvscan --cache /dev/sda1
Verify there's an unknown device where the corrupt PV used to be
  WARNING: Device for PV NjFEpi-9AoQ-4Ur9-Ooms-aSHi-P045-F6GmYf not found or rejected by a filter.
Activating VG in partial readonly mode
vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  WARNING: Device for PV NjFEpi-9AoQ-4Ur9-Ooms-aSHi-P045-F6GmYf not found or rejected by a filter.
Verify an error target now exists for the corrupted image

[root@host-084 ~]# lvs -a -o +devices
  WARNING: Device for PV NjFEpi-9AoQ-4Ur9-Ooms-aSHi-P045-F6GmYf not found or rejected by a filter.
  LV                            VG            Attr       LSize   Devices
  partial_activation            raid_sanity   rwi-a-r-p- 192.00m partial_activation_rimage_0(0),partial_activation_rimage_1(0)
  [partial_activation_rimage_0] raid_sanity   iwi-aor-p-  96.00m [unknown](0)
  [partial_activation_rimage_1] raid_sanity   iwi-aor---  96.00m /dev/sda2(0)

Restoring VG to default extent size
vgreduce --removemissing --force raid_sanity
  WARNING: Device for PV NjFEpi-9AoQ-4Ur9-Ooms-aSHi-P045-F6GmYf not found or rejected by a filter.
unable to --removemissing PVs from VG raid_sanity

Jul 11 15:01:20 host-084 qarshd[14732]: Running cmdline: vgreduce --removemissing --force raid_sanity
Jul 11 15:01:21 host-084 kernel: vgreduce[14733]: segfault at 8 ip 00007f40636c2a42 sp 00007ffd211a6d50 error 4 in lvm[7f40635e4000+193000]

d debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `vgreduce --removemissing --force raid_sanity'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f40636c2a42 in lv_raid_remove_missing (lv=lv@entry=0x7f4063f46510) at metadata/raid_manip.c:3089
3089                    if (!lv_is_partial(seg_lv(seg, s)) &&
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 elfutils-libs-0.163-3.el7.x86_64 glibc-2.17-105.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-26.el7.x86_64 libcap-2.22-8.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libsepol-2.1.9-3.el7.x86_64 libuuid-2.23.2-26.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 systemd-libs-219-21.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  0x00007f40636c2a42 in lv_raid_remove_missing (lv=lv@entry=0x7f4063f46510) at metadata/raid_manip.c:3089
#1  0x00007f406364b85a in _make_vg_consistent (vg=0x7f4063f45b50, cmd=0x7f4063e8d020) at vgreduce.c:99
#2  _vgreduce_repair_single (cmd=cmd@entry=0x7f4063e8d020, vg_name=vg_name@entry=0x7f4063ed6ab0 "raid_sanity", vg=vg@entry=0x7f4063f45b50, handle=handle@entry=0x7f4063ed69b8)
    at vgreduce.c:160
#3  0x00007f406363e81a in _process_vgnameid_list (process_single_vg=0x7f406364b5c0 <_vgreduce_repair_single>, handle=0x7f4063ed69b8, arg_tags=0x7ffd211a6e80, arg_vgnames=0x7ffd211a6e90, 
    vgnameids_to_process=0x7ffd211a6eb0, read_flags=1179648, cmd=0x7f4063e8d020) at toollib.c:1962
#4  process_each_vg (cmd=cmd@entry=0x7f4063e8d020, argc=argc@entry=0, argv=argv@entry=0x0, one_vgname=one_vgname@entry=0x7ffd211a8f6a "raid_sanity", use_vgnames=use_vgnames@entry=0x0, 
    read_flags=read_flags@entry=1179648, include_internal=include_internal@entry=0, handle=handle@entry=0x7f4063ed69b8, 
    process_single_vg=process_single_vg@entry=0x7f406364b5c0 <_vgreduce_repair_single>) at toollib.c:2275
#5  0x00007f406364bb7b in vgreduce (cmd=0x7f4063e8d020, argc=0, argv=0x7ffd211a7348) at vgreduce.c:248
#6  0x00007f40636298e9 in lvm_run_command (cmd=cmd@entry=0x7f4063e8d020, argc=1, argc@entry=4, argv=0x7ffd211a7340, argv@entry=0x7ffd211a7328) at lvmcmdline.c:1715
#7  0x00007f406362a490 in lvm2_main (argc=4, argv=0x7ffd211a7328) at lvmcmdline.c:2184
#8  0x00007f4062326b15 in __libc_start_main () from /lib64/libc.so.6
#9  0x00007f406360df91 in _start ()


Version-Release number of selected component (if applicable):
3.10.0-419.el7.x86_64

lvm2-2.02.160-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
lvm2-libs-2.02.160-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
lvm2-cluster-2.02.160-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
device-mapper-1.02.130-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
device-mapper-libs-1.02.130-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
device-mapper-event-1.02.130-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
device-mapper-event-libs-1.02.130-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
device-mapper-persistent-data-0.6.2-0.1.rc8.el7    BUILT: Wed May  4 02:56:34 CDT 2016
cmirror-2.02.160-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016
sanlock-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
sanlock-lib-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
lvm2-lockd-2.02.160-1.el7    BUILT: Wed Jul  6 11:16:47 CDT 2016

Comment 2 Heinz Mauelshagen 2016-07-12 15:55:52 UTC
Check for MetaLV was missing in raid_manip, thus a reference
to it in a log_debug() call caused the segfault.

Comment 5 Corey Marthaler 2016-07-28 21:22:03 UTC
Fix verified in the latest rpms.

3.10.0-480.el7.x86_64

lvm2-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-libs-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
lvm2-cluster-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-event-libs-1.02.131-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.161-3.el7    BUILT: Thu Jul 28 09:31:24 CDT 2016



============================================================
Iteration 10 of 10 started at Thu Jul 28 16:18:27 CDT 2016
============================================================
SCENARIO (raid1) - [partial_raid_activation_replace_missing_segment]
Create a raid, corrupt an image, and then reactivate it partially with an error dm target
Recreating PVs/VG with smaller sizes
pvcreate --setphysicalvolumesize 200M /dev/sdb1 /dev/sdb2 /dev/sdd1 /dev/sdd2 /dev/sdf1)
vgcreate raid_sanity /dev/sdb1 /dev/sdb2 /dev/sdd1 /dev/sdd2 /dev/sdf1
host-078: lvcreate  --type raid1 -m 1 -n partial_activation -L 188M raid_sanity /dev/sdb1 /dev/sdb2
Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Deactivating volume group
vgchange -an raid_sanity
host-078: dd if=/dev/zero of=/dev/sdb1 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0181949 s, 57.6 MB/s
pvscan --cache /dev/sdb1
Verify there's an unknown device where the corrupt PV used to be
  WARNING: Device for PV 1z5Jta-Edi9-wYia-9k7R-h7HE-CVUG-4qoa0O not found or rejected by a filter.
Activating VG in partial readonly mode
vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  WARNING: Device for PV 1z5Jta-Edi9-wYia-9k7R-h7HE-CVUG-4qoa0O not found or rejected by a filter.
Verify an error target now exists for the corrupted image
Restoring VG to default extent size
vgreduce --removemissing --force raid_sanity
  WARNING: Device for PV 1z5Jta-Edi9-wYia-9k7R-h7HE-CVUG-4qoa0O not found or rejected by a filter.
Remove -missing_0_0 images

perform raid scrubbing (lvchange --syncaction repair) on raid raid_sanity/partial_activation
Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Deactivating raid partial_activation... and removing
Restoring VG back to default parameters
vgremove --yes raid_sanity
pvremove --yes /dev/sdb2 /dev/sdd1 /dev/sdd2 /dev/sdf1
pvcreate /dev/sdb1 /dev/sdb2 /dev/sdd1 /dev/sdd2 /dev/sdf1 /dev/sdf2 /dev/sdg1 /dev/sdg2 /dev/sdh1 /dev/sdh2
vgcreate raid_sanity /dev/sdb1 /dev/sdb2 /dev/sdd1 /dev/sdd2 /dev/sdf1 /dev/sdf2 /dev/sdg1 /dev/sdg2 /dev/sdh1 /dev/sdh2

Comment 7 errata-xmlrpc 2016-11-04 04:15:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html