Bug 1312403 - RFE: get rid of the -missing devices during allocation failure scenarios
RFE: get rid of the -missing devices during allocation failure scenarios
Status: NEW
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.8
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: LVM and device-mapper development team
cluster-qe@redhat.com
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-26 10:53 EST by Corey Marthaler
Modified: 2017-09-14 07:41 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2016-02-26 10:53:12 EST
Description of problem:
This is similar to closed bug 825026, however in this case, allocation did take place and the failed raid images were replaced, however these "-missing_0" still remain.


================================================================================
Iteration 1.31 started at Thu Feb 25 18:59:01 CST 2016
================================================================================

Scenario kill_multiple_synced_raid1_3legs: Kill multiple legs of synced 3 leg raid1 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_multiple_raid1_3legs_1 synced_multiple_raid1_3legs_2
* sync:               1
* type:               raid1
* -m |-i value:       3
* leg devices:        /dev/sdf1 /dev/sda1 /dev/sdb1 /dev/sdd1
* spanned legs:        0
* failpv(s):          /dev/sdb1 /dev/sdd1
* additional snap:    /dev/sdf1
* failnode(s):        host-119.virt.lab.msp.redhat.com
* lvmetad:            0
* raid fault policy:  allocate
******************************************************

Creating raids(s) on host-119.virt.lab.msp.redhat.com...
host-119.virt.lab.msp.redhat.com: lvcreate --type raid1 -m 3 -n synced_multiple_raid1_3legs_1 -L 500M black_bird /dev/sdf1:0-2400 /dev/sda1:0-2400 /dev/sdb1:0-2400 /dev/sdd1:0-2400
host-119.virt.lab.msp.redhat.com: lvcreate --type raid1 -m 3 -n synced_multiple_raid1_3legs_2 -L 500M black_bird /dev/sdf1:0-2400 /dev/sda1:0-2400 /dev/sdb1:0-2400 /dev/sdd1:0-2400

Current mirror/raid device structure(s):
  LV                                       Attr       LSize   Cpy%Sync Devices
   synced_multiple_raid1_3legs_1            rwi-a-r--- 500.00m 5.60     synced_multiple_raid1_3legs_1_rimage_0(0),synced_multiple_raid1_3legs_1_rimage_1(0),synced_multiple_raid1_3legs_1_rimage_2(0),synced_multiple_raid1_3legs_1_rimage_3(0)
   [synced_multiple_raid1_3legs_1_rimage_0] Iwi-aor--- 500.00m          /dev/sdf1(1)
   [synced_multiple_raid1_3legs_1_rimage_1] Iwi-aor--- 500.00m          /dev/sda1(1)
   [synced_multiple_raid1_3legs_1_rimage_2] Iwi-aor--- 500.00m          /dev/sdb1(1)
   [synced_multiple_raid1_3legs_1_rimage_3] Iwi-aor--- 500.00m          /dev/sdd1(1)
   [synced_multiple_raid1_3legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sdf1(0)
   [synced_multiple_raid1_3legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(0)
   [synced_multiple_raid1_3legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdb1(0)
   [synced_multiple_raid1_3legs_1_rmeta_3]  ewi-aor---   4.00m          /dev/sdd1(0)
   synced_multiple_raid1_3legs_2            rwi-a-r--- 500.00m 0.00     synced_multiple_raid1_3legs_2_rimage_0(0),synced_multiple_raid1_3legs_2_rimage_1(0),synced_multiple_raid1_3legs_2_rimage_2(0),synced_multiple_raid1_3legs_2_rimage_3(0)
   [synced_multiple_raid1_3legs_2_rimage_0] Iwi-aor--- 500.00m          /dev/sdf1(127)
   [synced_multiple_raid1_3legs_2_rimage_1] Iwi-aor--- 500.00m          /dev/sda1(127)
   [synced_multiple_raid1_3legs_2_rimage_2] Iwi-aor--- 500.00m          /dev/sdb1(127)
   [synced_multiple_raid1_3legs_2_rimage_3] Iwi-aor--- 500.00m          /dev/sdd1(127)
   [synced_multiple_raid1_3legs_2_rmeta_0]  ewi-aor---   4.00m          /dev/sdf1(126)
   [synced_multiple_raid1_3legs_2_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(126)
   [synced_multiple_raid1_3legs_2_rmeta_2]  ewi-aor---   4.00m          /dev/sdb1(126)
   [synced_multiple_raid1_3legs_2_rmeta_3]  ewi-aor---   4.00m          /dev/sdd1(126)


Waiting until all mirror|raid volumes become fully syncd...
   2/2 mirror(s) are fully synced: ( 100.00% 100.00% )

Creating ext on top of mirror(s) on host-119.virt.lab.msp.redhat.com...
mke2fs 1.41.12 (17-May-2010)
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on host-119.virt.lab.msp.redhat.com...

PV=/dev/sdd1
        synced_multiple_raid1_3legs_1_rimage_3: 1.0
        synced_multiple_raid1_3legs_1_rmeta_3: 1.0
        synced_multiple_raid1_3legs_2_rimage_3: 1.0
        synced_multiple_raid1_3legs_2_rmeta_3: 1.0
PV=/dev/sdb1
        synced_multiple_raid1_3legs_1_rimage_2: 1.0
        synced_multiple_raid1_3legs_1_rmeta_2: 1.0
        synced_multiple_raid1_3legs_2_rimage_2: 1.0
        synced_multiple_raid1_3legs_2_rmeta_2: 1.0

Creating a snapshot volume of each of the raids
Writing verification files (checkit) to mirror(s) on...
        ---- host-119.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-119.virt.lab.msp.redhat.com ----


Disabling device sdb on host-119.virt.lab.msp.redhat.com
Disabling device sdd on host-119.virt.lab.msp.redhat.com

Getting recovery check start time from /var/log/messages: Feb 25 19:00
Attempting I/O to cause mirror down conversion(s) on host-119.virt.lab.msp.redhat.com
dd if=/dev/zero of=/mnt/synced_multiple_raid1_3legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.861715 s, 48.7 MB/s
dd if=/dev/zero of=/mnt/synced_multiple_raid1_3legs_2/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.987503 s, 42.5 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  Couldn't find device with uuid dyxBNG-1zlk-dJzA-glCj-wqLS-KSxV-Xi25X3.
  Couldn't find device with uuid HozHAI-oyOi-OMf1-qSL2-660k-xb9m-kL70no.
  LV                                       Attr       LSize   Cpy%Sync Devices
   bb_snap1                                 swi-a-s--- 252.00m          /dev/sdf1(252)
   bb_snap2                                 swi-a-s--- 252.00m          /dev/sdf1(315)
   synced_multiple_raid1_3legs_1            owi-aor--- 500.00m 100.00   synced_multiple_raid1_3legs_1_rimage_0(0),synced_multiple_raid1_3legs_1_rimage_1(0),synced_multiple_raid1_3legs_1_rimage_2(0),synced_multiple_raid1_3legs_1_rimage_3(0)
   [synced_multiple_raid1_3legs_1_rimage_0] iwi-aor--- 500.00m          /dev/sdf1(1)
   [synced_multiple_raid1_3legs_1_rimage_1] iwi-aor--- 500.00m          /dev/sda1(1)
   [synced_multiple_raid1_3legs_1_rimage_2] iwi-aor--- 500.00m          /dev/sdg1(127)
   [synced_multiple_raid1_3legs_1_rimage_3] iwi-aor--- 500.00m          /dev/sdh1(1)
   [synced_multiple_raid1_3legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sdf1(0)
   [synced_multiple_raid1_3legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(0)
   [synced_multiple_raid1_3legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdg1(126)
   [synced_multiple_raid1_3legs_1_rmeta_3]  ewi-aor---   4.00m          /dev/sdh1(0)
   synced_multiple_raid1_3legs_2            owi-aor--- 500.00m 100.00   synced_multiple_raid1_3legs_2_rimage_0(0),synced_multiple_raid1_3legs_2_rimage_1(0),synced_multiple_raid1_3legs_2_rimage_2(0),synced_multiple_raid1_3legs_2_rimage_3(0)
   [synced_multiple_raid1_3legs_2_rimage_0] iwi-aor--- 500.00m          /dev/sdf1(127)
   [synced_multiple_raid1_3legs_2_rimage_1] iwi-aor--- 500.00m          /dev/sda1(127)
   [synced_multiple_raid1_3legs_2_rimage_2] iwi-aor--- 500.00m          /dev/sdg1(1)
   [synced_multiple_raid1_3legs_2_rimage_3] iwi-aor--- 500.00m          /dev/sdh1(127)
   [synced_multiple_raid1_3legs_2_rmeta_0]  ewi-aor---   4.00m          /dev/sdf1(126)
   [synced_multiple_raid1_3legs_2_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(126)
   [synced_multiple_raid1_3legs_2_rmeta_2]  ewi-aor---   4.00m          /dev/sdg1(0)
   [synced_multiple_raid1_3legs_2_rmeta_3]  ewi-aor---   4.00m          /dev/sdh1(126)


Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sdf1 *IS* in the volume(s)
Verifying IMAGE device /dev/sda1 *IS* in the volume(s)
Verify the rimage/rmeta dm devices remain after the failures

Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rimage_3 on: host-119.virt.lab.msp.redhat.com 
Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rmeta_3 on: host-119.virt.lab.msp.redhat.com 
Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rimage_3 on: host-119.virt.lab.msp.redhat.com 
Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rmeta_3 on: host-119.virt.lab.msp.redhat.com 
Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rimage_2 on: host-119.virt.lab.msp.redhat.com 
Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rmeta_2 on: host-119.virt.lab.msp.redhat.com 
Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rimage_2 on: host-119.virt.lab.msp.redhat.com 
Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rmeta_2 on: host-119.virt.lab.msp.redhat.com 

Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: /dev/sdf1 /dev/sda1 unknown unknown
ACTUAL LEG ORDER: /dev/sdf1 /dev/sda1 /dev/sdg1 /dev/sdh1
/dev/sdf1 ne /dev/sdf1
/dev/sda1 ne /dev/sda1
unknown ne /dev/sdg1
unknown ne /dev/sdh1
ACTUAL LEG ORDER: /dev/sdf1 /dev/sda1 /dev/sdg1 /dev/sdh1
/dev/sdf1 ne /dev/sdf1
/dev/sda1 ne /dev/sda1
unknown ne /dev/sdg1
unknown ne /dev/sdh1
Verifying files (checkit) on mirror(s) on...
        ---- host-119.virt.lab.msp.redhat.com ----

Enabling device sdb on host-119.virt.lab.msp.redhat.com Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
  Couldn't find device with uuid HozHAI-oyOi-OMf1-qSL2-660k-xb9m-kL70no.
Enabling device sdd on host-119.virt.lab.msp.redhat.com Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
  WARNING: Inconsistent metadata found for VG black_bird - updating to use version 682
  Missing device /dev/sdb1 reappeared, updating metadata for VG black_bird to version 682.
  Missing device /dev/sdd1 reappeared, updating metadata for VG black_bird to version 682.


Verify that each of the raid repairs finished successfully

Checking for leftover '-missing_0_0' or 'unknown devices'
there should not be any 'missing' dm devices for full allocation scenarios on host-119.virt.lab.msp.redhat.com


[root@host-119 ~]# dmsetup ls | grep black_bird
black_bird-synced_multiple_raid1_3legs_2_rimage_3-missing_0_0   (253:30)
black_bird-synced_multiple_raid1_3legs_2_rmeta_3        (253:6)
black_bird-synced_multiple_raid1_3legs_2_rmeta_2        (253:26)
black_bird-synced_multiple_raid1_3legs_2_rimage_3       (253:7)
black_bird-synced_multiple_raid1_3legs_2_rmeta_1        (253:13)
black_bird-synced_multiple_raid1_3legs_2_rimage_2       (253:27)
black_bird-synced_multiple_raid1_3legs_1_rimage_3       (253:29)
black_bird-synced_multiple_raid1_3legs_2_rmeta_0        (253:11)
black_bird-bb_snap2     (253:25)
black_bird-synced_multiple_raid1_3legs_2_rimage_1       (253:14)
black_bird-synced_multiple_raid1_3legs_1_rimage_2       (253:16)
black_bird-synced_multiple_raid1_3legs_2_rmeta_3-missing_0_0    (253:31)
black_bird-bb_snap1     (253:22)
black_bird-synced_multiple_raid1_3legs_2_rimage_0       (253:12)
black_bird-synced_multiple_raid1_3legs_1_rimage_1       (253:5)
black_bird-synced_multiple_raid1_3legs_1_rimage_0       (253:3)
black_bird-bb_snap1-cow (253:21)
black_bird-synced_multiple_raid1_3legs_2        (253:19)
black_bird-synced_multiple_raid1_3legs_1_rmeta_3        (253:28)
black_bird-synced_multiple_raid1_3legs_1        (253:10)
black_bird-synced_multiple_raid1_3legs_1_rmeta_2        (253:15)
black_bird-bb_snap2-cow (253:24)
black_bird-synced_multiple_raid1_3legs_1-real   (253:20)
black_bird-synced_multiple_raid1_3legs_1_rmeta_1        (253:4)
black_bird-synced_multiple_raid1_3legs_2-real   (253:23)
black_bird-synced_multiple_raid1_3legs_1_rmeta_0        (253:2)




Feb 25 19:00:15 host-119 lvm[4863]: WARNING: Inconsistent metadata found for VG black_bird - updating to use version 675
Feb 25 19:00:15 host-119 lvm[4863]: WARNING: Failed to write an MDA of VG black_bird.
Feb 25 19:00:15 host-119 lvm[4863]: WARNING: Failed to write an MDA of VG black_bird.
Feb 25 19:00:15 host-119 kernel: md: super_written gets error=-5, uptodate=0
Feb 25 19:00:15 host-119 kernel: md/raid1:mdX: Disk failure on dm-18, disabling device.
Feb 25 19:00:15 host-119 kernel: md/raid1:mdX: Operation continuing on 2 devices.
Feb 25 19:00:15 host-119 kernel: device-mapper: raid: Failed to read superblock of device at position 3
Feb 25 19:00:15 host-119 kernel: device-mapper: raid: Device 2 specified for rebuild: Clearing superblock
Feb 25 19:00:15 host-119 kernel: md: super_written gets error=-5, uptodate=0
Feb 25 19:00:15 host-119 kernel: md/raid1:mdX: Disk failure on dm-9, disabling device.
Feb 25 19:00:15 host-119 kernel: md/raid1:mdX: Operation continuing on 3 devices.
Feb 25 19:00:15 host-119 lvm[4863]: Device #3 of raid1 array, black_bird-synced_multiple_raid1_3legs_1-real, has failed.
Feb 25 19:00:15 host-119 kernel: md: super_written gets error=-5, uptodate=0
Feb 25 19:00:15 host-119 kernel: md/raid1:mdX: Disk failure on dm-7, disabling device.
Feb 25 19:00:15 host-119 kernel: md/raid1:mdX: Operation continuing on 2 devices.
Feb 25 19:00:15 host-119 kernel: md/raid1:mdX: active with 2 out of 4 mirrors
Feb 25 19:00:15 host-119 kernel: created bitmap (1 pages) for device mdX
Feb 25 19:00:15 host-119 lvm[4863]: Internal error: Performing unsafe table load while 9 device(s) are known to be suspended:  (253:30)
Feb 25 19:00:15 host-119 lvm[4863]: Internal error: Performing unsafe table load while 9 device(s) are known to be suspended:  (253:18)
Feb 25 19:00:15 host-119 lvm[4863]: Internal error: Performing unsafe table load while 9 device(s) are known to be suspended:  (253:31)
Feb 25 19:00:15 host-119 lvm[4863]: Internal error: Performing unsafe table load while 9 device(s) are known to be suspended:  (253:17)
Feb 25 19:00:15 host-119 kernel: mdX: bitmap initialized from disk: read 1 pages, set 65 of 1000 bits
Feb 25 19:00:15 host-119 kernel: md: recovery of RAID array mdX
Feb 25 19:00:15 host-119 kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Feb 25 19:00:15 host-119 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Feb 25 19:00:15 host-119 kernel: md: using 128k window, over a total of 512000k.
Feb 25 19:00:15 host-119 lvm[4863]: WARNING: Failed to write an MDA of VG black_bird.
Feb 25 19:00:16 host-119 kernel: device-mapper: raid: Failed to read superblock of device at position 3
Feb 25 19:00:16 host-119 kernel: md/raid1:mdX: active with 2 out of 4 mirrors
Feb 25 19:00:16 host-119 kernel: created bitmap (1 pages) for device mdX
Feb 25 19:00:16 host-119 kernel: md: mdX: recovery interrupted.
Feb 25 19:00:16 host-119 kernel: mdX: bitmap initialized from disk: read 1 pages, set 65 of 1000 bits
Feb 25 19:00:16 host-119 lvm[4863]: Faulty devices in black_bird/synced_multiple_raid1_3legs_2 successfully replaced.
Feb 25 19:00:17 host-119 lvm[4863]: Faulty devices in black_bird/synced_multiple_raid1_3legs_1 successfully replaced.

Feb 25 19:00:22 host-119 qarshd[14186]: Running cmdline: dd if=/dev/zero of=/mnt/synced_multiple_raid1_3legs_1/ddfile count=10 bs=4M
Feb 25 19:00:23 host-119 qarshd[14188]: Running cmdline: dd if=/dev/zero of=/mnt/synced_multiple_raid1_3legs_2/ddfile count=10 bs=4M
Feb 25 19:00:25 host-119 qarshd[14190]: Running cmdline: sync
Feb 25 19:00:32 host-119 kernel: md: mdX: recovery done.
Feb 25 19:00:32 host-119 lvm[4863]: Device #3 of raid1 array, black_bird-synced_multiple_raid1_3legs_1-real, has failed.
Feb 25 19:00:32 host-119 kernel: md: mdX: recovery done.
Feb 25 19:00:32 host-119 lvm[4863]: Device #2 of raid1 array, black_bird-synced_multiple_raid1_3legs_2-real, has failed.

Feb 25 19:00:32 host-119 lvm[4863]: Faulty devices in black_bird/synced_multiple_raid1_3legs_1 successfully replaced.
Feb 25 19:00:33 host-119 lvm[4863]: Faulty devices in black_bird/synced_multiple_raid1_3legs_2 successfully replaced.

Feb 25 19:00:33 host-119 lvm[4863]: Device #2 of raid1 array, black_bird-synced_multiple_raid1_3legs_2-real, has failed.
Feb 25 19:00:33 host-119 lvm[4863]: WARNING: black_bird/synced_multiple_raid1_3legs_2 is not in-sync.
Feb 25 19:00:33 host-119 lvm[4863]: WARNING: Portions of the array may be unrecoverable.
Feb 25 19:00:33 host-119 lvm[4863]: Faulty devices in black_bird/synced_multiple_raid1_3legs_2 successfully replaced.

Feb 25 19:00:37 host-119 qarshd[14284]: Running cmdline: lvs > /dev/null 2>&1
Feb 25 19:00:38 host-119 lvm[4863]: Device #2 of raid1 array, black_bird-synced_multiple_raid1_3legs_2-real, has failed.
Feb 25 19:00:38 host-119 lvm[4863]: Faulty devices in black_bird/synced_multiple_raid1_3legs_2 successfully replaced.



Version-Release number of selected component (if applicable):
2.6.32-616.el6.x86_64
lvm2-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
lvm2-libs-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
lvm2-cluster-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
udev-147-2.71.el6    BUILT: Wed Feb 10 07:07:17 CST 2016
device-mapper-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-libs-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-event-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-event-libs-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-persistent-data-0.6.2-0.1.rc5.el6    BUILT: Wed Feb 24 07:07:09 CST 2016
cmirror-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016

Note You need to log in before you can comment on or make changes to this bug.