Bug 1412843

Summary: vgs segfault after re-enabling failed raid10 images when lvmetad is not running
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Command-line tools (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Version: 6.9   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1434054 (view as bug list) Environment:
Last Closed: 2017-12-06 10:58:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1434054    

Description Corey Marthaler 2017-01-12 23:32:44 UTC
Description of problem:
This test case is being used again in order to verify bug 1025322. This bug does not appear to happen when running this case with lvmetad running. I downgraded to the 6.8 lvm rpms (lvm2-2.02.143-7) and was able to see this bug there as well, so this does not appear to be a regression. Also, other raid10 image failure scenarios do not appear to hit this. This seems to be specific to this "kill three in-sync raid10 images" case.


Core was generated by `vgs'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f8a17aede37 in __strncpy_sse2 () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f8a17aede37 in __strncpy_sse2 () from /lib64/libc.so.6
#1  0x00007f8a18d3dada in lvmcache_info_from_pvid (pvid=<value optimized out>, valid_only=0) at /usr/include/bits/string3.h:121
#2  0x00007f8a18d8990d in _check_or_repair_pv_ext (cmd=<value optimized out>, vgname=<value optimized out>, vgid=0x7f8a00000000 <Address 0x7f8a00000000 out of bounds>, warn_flags=423090032,
    consistent=<value optimized out>, precommitted=0) at metadata/metadata.c:3752
#3  _vg_read (cmd=<value optimized out>, vgname=<value optimized out>, vgid=0x7f8a00000000 <Address 0x7f8a00000000 out of bounds>, warn_flags=423090032, consistent=<value optimized out>,
    precommitted=0) at metadata/metadata.c:4308
#4  0x00007f8a18d8d0a8 in vg_read_internal (cmd=<value optimized out>, vgname=0x7f8a1936fd28 "black_bird", vgid=<value optimized out>, warn_flags=1, consistent=0x7ffdd7370198)
    at metadata/metadata.c:4461
#5  0x00007f8a18d8d9ad in _recover_vg (cmd=0x7f8a1932a110, vg_name=0x7f8a1936fd28 "black_bird", vgid=0x7f8a1936fd00 "ro7f88KddlxD0DTXVdckkq2isNePid1j", read_flags=262144,
    lockd_state=<value optimized out>) at metadata/metadata.c:5189
#6  _vg_lock_and_read (cmd=0x7f8a1932a110, vg_name=0x7f8a1936fd28 "black_bird", vgid=0x7f8a1936fd00 "ro7f88KddlxD0DTXVdckkq2isNePid1j", read_flags=262144, lockd_state=<value optimized out>)
    at metadata/metadata.c:5499
#7  vg_read (cmd=0x7f8a1932a110, vg_name=0x7f8a1936fd28 "black_bird", vgid=0x7f8a1936fd00 "ro7f88KddlxD0DTXVdckkq2isNePid1j", read_flags=262144, lockd_state=<value optimized out>)
    at metadata/metadata.c:5585
#8  0x00007f8a18d274ce in _process_vgnameid_list (cmd=0x7f8a1932a110, argc=<value optimized out>, argv=<value optimized out>, one_vgname=<value optimized out>, read_flags=3610706576,
    handle=0x7ffdd7370410, process_single_vg=0x7f8a18d22590 <_vgs_single>) at toollib.c:1967
#9  process_each_vg (cmd=0x7f8a1932a110, argc=<value optimized out>, argv=<value optimized out>, one_vgname=<value optimized out>, read_flags=3610706576, handle=0x7ffdd7370410,
    process_single_vg=0x7f8a18d22590 <_vgs_single>) at toollib.c:2281
#10 0x00007f8a18d216d3 in _report (cmd=0x7f8a1932a110, argc=0, argv=0x7ffdd7370770, report_type=VGS) at reporter.c:920
#11 0x00007f8a18d13559 in lvm_run_command (cmd=0x7f8a1932a110, argc=0, argv=0x7ffdd7370770) at lvmcmdline.c:1655
#12 0x00007f8a18d177e9 in lvm2_main (argc=1, argv=0x7ffdd7370768) at lvmcmdline.c:2121
#13 0x00007f8a17a7ed1d in __libc_start_main () from /lib64/libc.so.6
#14 0x00007f8a18cfc269 in _start ()




================================================================================
Iteration 0.1 started at Thu Jan 12 17:00:06 CST 2017
================================================================================
Scenario kill_three_synced_raid10_3legs: Kill three legs (none of which share the same stripe leg) of synced 3 leg raid10 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_three_raid10_3legs_1
* sync:               1
* type:               raid10
* -m |-i value:       3
* leg devices:        /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdh1 /dev/sdf1
* spanned legs:       0
* manual repair:      0
* failpv(s):          /dev/sdb1 /dev/sdd1 /dev/sdh1
* additional snap:    /dev/sdc1
* failnode(s):        host-081
* lvmetad:            0
* raid fault policy:  allocate
******************************************************

Creating raids(s) on host-081...
host-081: lvcreate --type raid10 -i 3 -n synced_three_raid10_3legs_1 -L 500M black_bird /dev/sdb1:0-2400 /dev/sdc1:0-2400 /dev/sdd1:0-2400 /dev/sde1:0-2400 /dev/sdh1:0-2400 /dev/sdf1:0-2400

Current mirror/raid device structure(s):
  LV                                     Attr       LSize   Cpy%Sync Devices
   synced_three_raid10_3legs_1            rwi-a-r--- 504.00m 0.00     synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
   [synced_three_raid10_3legs_1_rimage_0] Iwi-aor--- 168.00m          /dev/sdb1(1)
   [synced_three_raid10_3legs_1_rimage_1] Iwi-aor--- 168.00m          /dev/sdc1(1)
   [synced_three_raid10_3legs_1_rimage_2] Iwi-aor--- 168.00m          /dev/sdd1(1)
   [synced_three_raid10_3legs_1_rimage_3] Iwi-aor--- 168.00m          /dev/sde1(1)
   [synced_three_raid10_3legs_1_rimage_4] Iwi-aor--- 168.00m          /dev/sdh1(1)
   [synced_three_raid10_3legs_1_rimage_5] Iwi-aor--- 168.00m          /dev/sdf1(1)
   [synced_three_raid10_3legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sdb1(0)
   [synced_three_raid10_3legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sdc1(0)
   [synced_three_raid10_3legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdd1(0)
   [synced_three_raid10_3legs_1_rmeta_3]  ewi-aor---   4.00m          /dev/sde1(0)
   [synced_three_raid10_3legs_1_rmeta_4]  ewi-aor---   4.00m          /dev/sdh1(0)
   [synced_three_raid10_3legs_1_rmeta_5]  ewi-aor---   4.00m          /dev/sdf1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 1, some allocation should work)

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on host-081...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on host-081...

PV=/dev/sdd1
        synced_three_raid10_3legs_1_rimage_2: 1.P
        synced_three_raid10_3legs_1_rmeta_2: 1.P
PV=/dev/sdh1
        synced_three_raid10_3legs_1_rimage_4: 1.P
        synced_three_raid10_3legs_1_rmeta_4: 1.P
PV=/dev/sdb1
        synced_three_raid10_3legs_1_rimage_0: 1.P
        synced_three_raid10_3legs_1_rmeta_0: 1.P

Creating a snapshot volume of each of the raids
Writing verification files (checkit) to mirror(s) on...
        ---- host-081 ----

<start name="host-081_synced_three_raid10_3legs_1"  pid="9966" time="Thu Jan 12 17:00:48 2017" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-081 ----

Disabling device sdb on host-081
Disabling device sdd on host-081
Disabling device sdh on host-081

Getting recovery check start time from /var/log/messages: Jan 12 17:01
Attempting I/O to cause mirror down conversion(s) on host-081
dd if=/dev/zero of=/mnt/synced_three_raid10_3legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.311566 s, 135 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  Couldn't find device with uuid RNNCKN-Jrty-rA0j-xBfP-PcWX-9o5S-DMej09.
  Couldn't find device with uuid GGAkpd-uJGq-dfyR-kul4-NdEQ-TkQv-OUyf7r.
  Couldn't find device with uuid ELYzqw-yZzR-ggjZ-uIvv-oQJt-YVLk-MKy39E.
  LV                                     Attr       LSize   Cpy%Sync Devices
   bb_snap1                               swi-a-s--- 252.00m          /dev/sdc1(43)
   synced_three_raid10_3legs_1            owi-aor-p- 504.00m 100.00   synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
   [synced_three_raid10_3legs_1_rimage_0] iwi-a-r-p- 168.00m          unknown device(1)
   [synced_three_raid10_3legs_1_rimage_1] iwi-aor--- 168.00m          /dev/sdc1(1)
   [synced_three_raid10_3legs_1_rimage_2] iwi-a-r-p- 168.00m          unknown device(1)
   [synced_three_raid10_3legs_1_rimage_3] iwi-aor--- 168.00m          /dev/sde1(1)
   [synced_three_raid10_3legs_1_rimage_4] iwi-aor--- 168.00m          /dev/sda1(1)
   [synced_three_raid10_3legs_1_rimage_5] iwi-aor--- 168.00m          /dev/sdf1(1)
   [synced_three_raid10_3legs_1_rmeta_0]  ewi-a-r-p-   4.00m          unknown device(0)
   [synced_three_raid10_3legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sdc1(0)
   [synced_three_raid10_3legs_1_rmeta_2]  ewi-a-r-p-   4.00m          unknown device(0)
   [synced_three_raid10_3legs_1_rmeta_3]  ewi-aor---   4.00m          /dev/sde1(0)
   [synced_three_raid10_3legs_1_rmeta_4]  ewi-aor---   4.00m          /dev/sda1(0)
   [synced_three_raid10_3legs_1_rmeta_5]  ewi-aor---   4.00m          /dev/sdf1(0)


Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sdc1 *IS* in the volume(s)
Verifying IMAGE device /dev/sde1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdf1 *IS* in the volume(s)
Verify the rimage/rmeta dm devices remain after the failures

Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_2 on: host-081 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_2 on: host-081 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_4 on: host-081 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_4 on: host-081 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_0 on: host-081 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_0 on: host-081 

Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: unknown /dev/sdc1 unknown /dev/sde1 unknown /dev/sdf1
ACTUAL LEG ORDER: unknown /dev/sdc1 unknown /dev/sde1 /dev/sda1 /dev/sdf1

Verifying files (checkit) on mirror(s) on...
        ---- host-081 ----

Enabling device sdb on host-081 
        Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
  Couldn't find device with uuid RNNCKN-Jrty-rA0j-xBfP-PcWX-9o5S-DMej09.
  Couldn't find device with uuid ELYzqw-yZzR-ggjZ-uIvv-oQJt-YVLk-MKy39E.

Enabling device sdd on host-081 
        Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
  Couldn't find device with uuid ELYzqw-yZzR-ggjZ-uIvv-oQJt-YVLk-MKy39E.
  WARNING: Inconsistent metadata found for VG black_bird - updating to use version 7
  Missing device /dev/sdd1 reappeared, updating metadata for VG black_bird to version 7.
  Device still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
  Missing device /dev/sdb1 reappeared, updating metadata for VG black_bird to version 7.
  Device still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
  Missing device unknown device reappeared, updating metadata for VG black_bird to version 7.

Simple vgs after device enable failed after brining sdd online

[root@host-081 ~]# lvs -a -o +devices
  Couldn't find device with uuid ELYzqw-yZzR-ggjZ-uIvv-oQJt-YVLk-MKy39E.
  LV                                     VG         Attr       LSize   Origin                      Data% Cpy%Sync Devices
  bb_snap1                               black_bird swi-a-s--- 252.00m synced_three_raid10_3legs_1 28.26          /dev/sdc1(43)
  synced_three_raid10_3legs_1            black_bird owi-aor-p- 504.00m                                   100.00   synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
  [synced_three_raid10_3legs_1_rimage_0] black_bird iwi-a-r-p- 168.00m                                            /dev/sdb1(1)
  [synced_three_raid10_3legs_1_rimage_1] black_bird iwi-aor--- 168.00m                                            /dev/sdc1(1)
  [synced_three_raid10_3legs_1_rimage_2] black_bird iwi-a-r-p- 168.00m                                            /dev/sdd1(1)
  [synced_three_raid10_3legs_1_rimage_3] black_bird iwi-aor--- 168.00m                                            /dev/sde1(1)
  [synced_three_raid10_3legs_1_rimage_4] black_bird iwi-aor--- 168.00m                                            /dev/sda1(1)
  [synced_three_raid10_3legs_1_rimage_5] black_bird iwi-aor--- 168.00m                                            /dev/sdf1(1)
  [synced_three_raid10_3legs_1_rmeta_0]  black_bird ewi-a-r-p-   4.00m                                            /dev/sdb1(0)
  [synced_three_raid10_3legs_1_rmeta_1]  black_bird ewi-aor---   4.00m                                            /dev/sdc1(0)
  [synced_three_raid10_3legs_1_rmeta_2]  black_bird ewi-a-r-p-   4.00m                                            /dev/sdd1(0)
  [synced_three_raid10_3legs_1_rmeta_3]  black_bird ewi-aor---   4.00m                                            /dev/sde1(0)
  [synced_three_raid10_3legs_1_rmeta_4]  black_bird ewi-aor---   4.00m                                            /dev/sda1(0)
  [synced_three_raid10_3legs_1_rmeta_5]  black_bird ewi-aor---   4.00m                                            /dev/sdf1(0)


Version-Release number of selected component (if applicable):
2.6.32-682.el6.x86_64

lvm2-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-libs-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-cluster-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
udev-147-2.73.el6_8.2    BUILT: Tue Aug 30 08:17:19 CDT 2016
device-mapper-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 08:58:09 CDT 2016


How reproducible:
Everytime as long as lvmetad isn't running.

Comment 4 Jonathan Earl Brassow 2017-10-03 22:38:12 UTC
fixed in rhel7.4, should be able to fix here.

Comment 5 Jan Kurik 2017-12-06 10:58:26 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/