Bug 1418478

Summary: thin pool/raid stack: device mapper keeps missing_0_0 devices listed even after the LV/VG containing raid is removed
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac
Version: 6.9   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 12:03:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2017-02-01 23:48:07 UTC
Description of problem:
This is same test case as in bug 1025322, however with a thin pool data volume stacked on top of the raid volume being failed. In this case, like in 1025322, the missing devices are not being thoroughly cleaned up.


host-073: pvcreate /dev/sdd1 /dev/sdc1 /dev/sda1 /dev/sdb1 /dev/sde1 /dev/sdf1 /dev/sdg1                                                                                                                        
host-073: vgcreate   black_bird /dev/sdd1 /dev/sdc1 /dev/sda1 /dev/sdb1 /dev/sde1 /dev/sdf1 /dev/sdg1                                                                                                           
                                                                                                                                                                                                                
================================================================================Iteration 0.1 started at Wed Feb  1 17:24:01 CST 2017                                                                                                                                                           
================================================================================                                                                                                                                
Scenario kill_three_synced_raid10_3legs: Kill three legs (none of which share the same stripe leg) of synced 3 leg raid10 volume(s)                                                                             
                                                                                                                                                                                                                
********* RAID hash info for this scenario *********                                                                                                                                                            
* names:              synced_three_raid10_3legs_1                                                                                                                                                               
* sync:               1                                                                                                                                                                                         
* type:               raid10                                                                                                                                                                                    
* -m |-i value:       3                                                                                                                                                                                                           
* leg devices:        /dev/sde1 /dev/sda1 /dev/sdc1 /dev/sdg1 /dev/sdb1 /dev/sdf1
* spanned legs:       0
* manual repair:      0
* failpv(s):          /dev/sde1 /dev/sdc1 /dev/sdb1
* failnode(s):        host-073
* lvmetad:            0
* thinpool stack:      1
* raid fault policy:  warn
******************************************************

Creating raids(s) on host-073...
host-073: lvcreate --type raid10 -i 3 -n synced_three_raid10_3legs_1 -L 500M black_bird /dev/sde1:0-2400 /dev/sda1:0-2400 /dev/sdc1:0-2400 /dev/sdg1:0-2400 /dev/sdb1:0-2400 /dev/sdf1:0-2400

Current mirror/raid device structure(s):
  LV                                     Attr       LSize   Cpy%Sync Devices
   synced_three_raid10_3legs_1            rwi-a-r--- 504.00m 0.00     synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
   [synced_three_raid10_3legs_1_rimage_0] Iwi-aor--- 168.00m          /dev/sde1(1)
   [synced_three_raid10_3legs_1_rimage_1] Iwi-aor--- 168.00m          /dev/sda1(1)
   [synced_three_raid10_3legs_1_rimage_2] Iwi-aor--- 168.00m          /dev/sdc1(1)
   [synced_three_raid10_3legs_1_rimage_3] Iwi-aor--- 168.00m          /dev/sdg1(1)
   [synced_three_raid10_3legs_1_rimage_4] Iwi-aor--- 168.00m          /dev/sdb1(1)
   [synced_three_raid10_3legs_1_rimage_5] Iwi-aor--- 168.00m          /dev/sdf1(1)
   [synced_three_raid10_3legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sde1(0)
   [synced_three_raid10_3legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(0)
   [synced_three_raid10_3legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdc1(0)
   [synced_three_raid10_3legs_1_rmeta_3]  ewi-aor---   4.00m          /dev/sdg1(0)
   [synced_three_raid10_3legs_1_rmeta_4]  ewi-aor---   4.00m          /dev/sdb1(0)
   [synced_three_raid10_3legs_1_rmeta_5]  ewi-aor---   4.00m          /dev/sdf1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 1, some allocation should work)

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

********* THIN POOL info for this scenario *********
* Killing the raid pool DATA device
* poolmetadataspare:    n
****************************************************

Convert mirror/raid volume(s) to Thinpool volume(s) on host-073...

Creating META device (which will not be failed on /dev/sda1), convert to thin pool, and create a virtual device
lvcreate -n meta -L 500M black_bird /dev/sda1
lvconvert --yes --thinpool black_bird/synced_three_raid10_3legs_1 --poolmetadata black_bird/meta  --poolmetadataspare n
  WARNING: Converting logical volume black_bird/synced_three_raid10_3legs_1 and black_bird/meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  WARNING: recovery of pools without pool metadata spare LV is not automated.
lvcreate --virtualsize 200M --thinpool black_bird/synced_three_raid10_3legs_1 -n virt_synced_three_raid10_3legs_1

Creating ext on top of mirror(s) on host-073...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on host-073...


Current mirror/raid device structure(s):
  LV                                           Attr       LSize   Cpy%Sync Devices
   synced_three_raid10_3legs_1                  twi-aotz-- 504.00m          synced_three_raid10_3legs_1_tdata(0)
   [synced_three_raid10_3legs_1_tdata]          rwi-aor--- 504.00m 100.00   synced_three_raid10_3legs_1_tdata_rimage_0(0),synced_three_raid10_3legs_1_tdata_rimage_1(0),synced_three_raid10_3legs_1_tdata_rimage_2(0),synced_three_raid10_3legs_1_tdata_rimage_3(0),synced_three_raid10_3legs_1_tdata_rimage_4(0),synced_three_raid10_3legs_1_tdata_rimage_5(0)
   [synced_three_raid10_3legs_1_tdata_rimage_0] iwi-aor--- 168.00m          /dev/sde1(1)
   [synced_three_raid10_3legs_1_tdata_rimage_1] iwi-aor--- 168.00m          /dev/sda1(1)
   [synced_three_raid10_3legs_1_tdata_rimage_2] iwi-aor--- 168.00m          /dev/sdc1(1)
   [synced_three_raid10_3legs_1_tdata_rimage_3] iwi-aor--- 168.00m          /dev/sdg1(1)
   [synced_three_raid10_3legs_1_tdata_rimage_4] iwi-aor--- 168.00m          /dev/sdb1(1)
   [synced_three_raid10_3legs_1_tdata_rimage_5] iwi-aor--- 168.00m          /dev/sdf1(1)
   [synced_three_raid10_3legs_1_tdata_rmeta_0]  ewi-aor---   4.00m          /dev/sde1(0)
   [synced_three_raid10_3legs_1_tdata_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(0)
   [synced_three_raid10_3legs_1_tdata_rmeta_2]  ewi-aor---   4.00m          /dev/sdc1(0)
   [synced_three_raid10_3legs_1_tdata_rmeta_3]  ewi-aor---   4.00m          /dev/sdg1(0)
   [synced_three_raid10_3legs_1_tdata_rmeta_4]  ewi-aor---   4.00m          /dev/sdb1(0)
   [synced_three_raid10_3legs_1_tdata_rmeta_5]  ewi-aor---   4.00m          /dev/sdf1(0)
   [synced_three_raid10_3legs_1_tmeta]          ewi-ao---- 500.00m          /dev/sda1(43)
   virt_synced_three_raid10_3legs_1             Vwi-aotz-- 200.00m

PV=/dev/sde1
        synced_three_raid10_3legs_1_tdata_rimage_0: 2
        synced_three_raid10_3legs_1_tdata_rmeta_0: 2
PV=/dev/sdb1
        synced_three_raid10_3legs_1_tdata_rimage_4: 2
        synced_three_raid10_3legs_1_tdata_rmeta_4: 2
PV=/dev/sdc1
        synced_three_raid10_3legs_1_tdata_rimage_2: 2
        synced_three_raid10_3legs_1_tdata_rmeta_2: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-073 ----

<start name="host-073_synced_three_raid10_3legs_1"  pid="27837" time="Wed Feb  1 17:24:48 2017 -0600" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 

lvcreate -s /dev/black_bird/virt_synced_three_raid10_3legs_1 -n snap1_synced_three_raid10_3legs_1
lvcreate -s /dev/black_bird/virt_synced_three_raid10_3legs_1 -n snap2_synced_three_raid10_3legs_1
  WARNING: Sum of all thin volume sizes (600.00 MiB) exceeds the size of thin pool black_bird/synced_three_raid10_3legs_1 (504.00 MiB)!
lvcreate -s /dev/black_bird/virt_synced_three_raid10_3legs_1 -n snap3_synced_three_raid10_3legs_1
  WARNING: Sum of all thin volume sizes (800.00 MiB) exceeds the size of thin pool black_bird/synced_three_raid10_3legs_1 (504.00 MiB)!

Verifying files (checkit) on mirror(s) on...
        ---- host-073 ----


Disabling device sde on host-073
Disabling device sdc on host-073
Disabling device sdb on host-073

Getting recovery check start time from /var/log/messages: Feb  1 17:25
Attempting I/O to cause mirror down conversion(s) on host-073
dd if=/dev/zero of=/mnt/synced_three_raid10_3legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.318626 s, 132 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  LV                                           Attr       LSize   Cpy%Sync Devices
  snap1_synced_three_raid10_3legs_1            Vwi---tzpk 200.00m
  snap2_synced_three_raid10_3legs_1            Vwi---tzpk 200.00m
  snap3_synced_three_raid10_3legs_1            Vwi---tzpk 200.00m
  synced_three_raid10_3legs_1                  twi-aotzp- 504.00m          synced_three_raid10_3legs_1_tdata(0)
  [synced_three_raid10_3legs_1_tdata]          rwi-aor-p- 504.00m 100.00   synced_three_raid10_3legs_1_tdata_rimage_0(0),synced_three_raid10_3legs_1_tdata_rimage_1(0),synced_three_raid10_3legs_1_tdata_rimage_2(0),synced_three_raid10_3legs_1_tdata_rimage_3(0),synced_three_raid10_3legs_1_tdata_rimage_4(0),synced_three_raid10_3legs_1_tdata_rimage_5(0)
  [synced_three_raid10_3legs_1_tdata_rimage_0] iwi-aor-p- 168.00m          unknown device(1)
  [synced_three_raid10_3legs_1_tdata_rimage_1] iwi-aor--- 168.00m          /dev/sda1(1)
  [synced_three_raid10_3legs_1_tdata_rimage_2] iwi-aor-p- 168.00m          unknown device(1)
  [synced_three_raid10_3legs_1_tdata_rimage_3] iwi-aor--- 168.00m          /dev/sdg1(1)
  [synced_three_raid10_3legs_1_tdata_rimage_4] iwi-aor-p- 168.00m          unknown device(1)
  [synced_three_raid10_3legs_1_tdata_rimage_5] iwi-aor--- 168.00m          /dev/sdf1(1)
  [synced_three_raid10_3legs_1_tdata_rmeta_0]  ewi-aor-p-   4.00m          unknown device(0)
  [synced_three_raid10_3legs_1_tdata_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(0)
  [synced_three_raid10_3legs_1_tdata_rmeta_2]  ewi-aor-p-   4.00m          unknown device(0)
  [synced_three_raid10_3legs_1_tdata_rmeta_3]  ewi-aor---   4.00m          /dev/sdg1(0)
  [synced_three_raid10_3legs_1_tdata_rmeta_4]  ewi-aor-p-   4.00m          unknown device(0)
  [synced_three_raid10_3legs_1_tdata_rmeta_5]  ewi-aor---   4.00m          /dev/sdf1(0)
  [synced_three_raid10_3legs_1_tmeta]          ewi-ao---- 500.00m          /dev/sda1(43)
  virt_synced_three_raid10_3legs_1             Vwi-aotzp- 200.00m

Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdc1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sda1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdg1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdf1 *IS* in the volume(s)
Verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_tdata_rimage_0 on: host-073 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_tdata_rmeta_0 on: host-073 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_tdata_rimage_4 on: host-073 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_tdata_rmeta_4 on: host-073 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_tdata_rimage_2 on: host-073 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_tdata_rmeta_2 on: host-073 

Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: unknown /dev/sda1 unknown /dev/sdg1 unknown /dev/sdf1
ACTUAL LEG ORDER: unknown /dev/sda1 unknown /dev/sdg1 unknown /dev/sdf1
unknown ne unknown
/dev/sda1 ne /dev/sda1
unknown ne unknown
/dev/sdg1 ne /dev/sdg1
unknown ne unknown
/dev/sdf1 ne /dev/sdf1
Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )


Verifying files (checkit) on mirror(s) on...
        ---- host-073 ----

Enabling device sde on host-073 Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
Enabling device sdc on host-073 Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
Enabling device sdb on host-073 Running vgs to make LVM update metadata version if possible (will restore a-m PVs)



Checking for leftover '-missing_0_0' or 'unknown devices'
'-missing' devices still exist (normal for partial allocation scenarios, see BUG 825026)

Checking for PVs marked as missing (a-m)...

Verifying files (checkit) on mirror(s) on...
        ---- host-073 ----

Stopping the io load (collie/xdoio) on mirror(s)
Unmounting ext and removing mnt point on host-073...


Deactivating and removing raid(s)


** -missing_ devices should no longer exist ** 

[root@host-073 ~]# lvs -a -o +devices
  LV      VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        

[root@host-073 ~]# dmsetup ls
black_bird-synced_three_raid10_3legs_1_tdata_rmeta_2-missing_0_0        (253:21)
black_bird-synced_three_raid10_3legs_1_tdata_rimage_2-missing_0_0       (253:20)
black_bird-synced_three_raid10_3legs_1_tdata_rmeta_4-missing_0_0        (253:19)
black_bird-synced_three_raid10_3legs_1_tdata_rimage_4-missing_0_0       (253:17)
black_bird-synced_three_raid10_3legs_1_tdata_rmeta_0-missing_0_0        (253:23)
black_bird-synced_three_raid10_3legs_1_tdata_rimage_0-missing_0_0       (253:22)

[root@host-073 ~]# dmsetup table
black_bird-synced_three_raid10_3legs_1_tdata_rmeta_2-missing_0_0: 0 8192 error 
black_bird-synced_three_raid10_3legs_1_tdata_rimage_2-missing_0_0: 0 344064 error 
black_bird-synced_three_raid10_3legs_1_tdata_rmeta_4-missing_0_0: 0 8192 error 
black_bird-synced_three_raid10_3legs_1_tdata_rimage_4-missing_0_0: 0 344064 error 
black_bird-synced_three_raid10_3legs_1_tdata_rmeta_0-missing_0_0: 0 8192 error 
black_bird-synced_three_raid10_3legs_1_tdata_rimage_0-missing_0_0: 0 344064 error 

 

Version-Release number of selected component (if applicable):
2.6.32-688.el6.x86_64

lvm2-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-libs-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-cluster-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
udev-147-2.73.el6_8.2    BUILT: Tue Aug 30 08:17:19 CDT 2016
device-mapper-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 08:58:09 CDT 2016


How reproducible:
Often, not always

Comment 2 Jan Kurik 2017-12-06 12:03:14 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/