Bug 1459584 - synced raid10 multiple image repair attempt fails
synced raid10 multiple image repair attempt fails
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Heinz Mauelshagen
cluster-qe@redhat.com
: TestOnly
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-07 10:01 EDT by Corey Marthaler
Modified: 2018-03-22 11:19 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-22 11:19:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
verbose lvconvert --repair attempt (160.78 KB, text/plain)
2017-06-07 10:03 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2017-06-07 10:01:48 EDT
Description of problem:
In this scenario, every other device was fails, basically the entire primary side of the mirror. This was the test case used to reproduce/verify bug 1434054, and that can also potentially cause bug 1281922.


  WARNING: Not using lvmetad because a repair command was run.

creating lvm devices...
host-127: pvcreate /dev/sde1 /dev/sda1 /dev/sdb1 /dev/sdh1 /dev/sdg1 /dev/sdd1 /dev/sdc1
host-127: vgcreate   black_bird /dev/sde1 /dev/sda1 /dev/sdb1 /dev/sdh1 /dev/sdg1 /dev/sdd1 /dev/sdc1

================================================================================
Iteration 0.1 started at Tue Jun  6 17:25:04 CDT 2017
================================================================================
Scenario kill_three_synced_raid10_3legs: Kill three legs (none of which share the same stripe leg) of synced 3 leg raid10 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_three_raid10_3legs_1
* sync:               1
* type:               raid10
* -m |-i value:       3
* leg devices:        /dev/sdg1 /dev/sdc1 /dev/sda1 /dev/sdb1 /dev/sde1 /dev/sdh1
* spanned legs:       0
* manual repair:      0
* no MDA devices:     
* failpv(s):          /dev/sdg1 /dev/sda1 /dev/sde1
* failnode(s):        host-127
* lvmetad:            1
* raid fault policy:  warn
******************************************************

Creating raids(s) on host-127...
host-127: lvcreate  --type raid10 -i 3 -n synced_three_raid10_3legs_1 -L 500M black_bird /dev/sdg1:0-2400 /dev/sdc1:0-2400 /dev/sda1:0-2400 /dev/sdb1:0-2400 /dev/sde1:0-2400 /dev/sdh1:0-2400

Current mirror/raid device structure(s):
  LV                                     Attr       LSize   Cpy%Sync Devices
   synced_three_raid10_3legs_1            rwi-a-r--- 504.00m 100.00  synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
   [synced_three_raid10_3legs_1_rimage_0] iwi-aor--- 168.00m         /dev/sdg1(1)
   [synced_three_raid10_3legs_1_rimage_1] iwi-aor--- 168.00m         /dev/sdc1(1)
   [synced_three_raid10_3legs_1_rimage_2] iwi-aor--- 168.00m         /dev/sda1(1)
   [synced_three_raid10_3legs_1_rimage_3] iwi-aor--- 168.00m         /dev/sdb1(1)
   [synced_three_raid10_3legs_1_rimage_4] iwi-aor--- 168.00m         /dev/sde1(1)
   [synced_three_raid10_3legs_1_rimage_5] iwi-aor--- 168.00m         /dev/sdh1(1)
   [synced_three_raid10_3legs_1_rmeta_0]  ewi-aor---   4.00m         /dev/sdg1(0)
   [synced_three_raid10_3legs_1_rmeta_1]  ewi-aor---   4.00m         /dev/sdc1(0)
   [synced_three_raid10_3legs_1_rmeta_2]  ewi-aor---   4.00m         /dev/sda1(0)
   [synced_three_raid10_3legs_1_rmeta_3]  ewi-aor---   4.00m         /dev/sdb1(0)
   [synced_three_raid10_3legs_1_rmeta_4]  ewi-aor---   4.00m         /dev/sde1(0)
   [synced_three_raid10_3legs_1_rmeta_5]  ewi-aor---   4.00m         /dev/sdh1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 1, some allocation should work)

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Creating xfs on top of mirror(s) on host-127...
Mounting mirrored xfs filesystems on host-127...

PV=/dev/sda1
        synced_three_raid10_3legs_1_rimage_2: 2
        synced_three_raid10_3legs_1_rmeta_2: 2
PV=/dev/sdg1
        synced_three_raid10_3legs_1_rimage_0: 2
        synced_three_raid10_3legs_1_rmeta_0: 2
PV=/dev/sde1
        synced_three_raid10_3legs_1_rimage_4: 2
        synced_three_raid10_3legs_1_rmeta_4: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-127 ----

<start name="host-127_synced_three_raid10_3legs_1"  pid="19280" time="Tue Jun  6 17:25:50 2017 -0500" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-127 ----

Disabling device sdg on host-127rescan device...
Disabling device sda on host-127rescan device...
Disabling device sde on host-127rescan device...

Getting recovery check start time from /var/log/messages: Jun  6 17:26
Attempting I/O to cause mirror down conversion(s) on host-127
dd if=/dev/zero of=/mnt/synced_three_raid10_3legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0648834 s, 646 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  LV                                     Attr       LSize   Cpy%Sync Devices
  synced_three_raid10_3legs_1            rwi-aor-p- 504.00m 100.00  synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
  [synced_three_raid10_3legs_1_rimage_0] Iwi-aor-p- 168.00m         [unknown](1)
  [synced_three_raid10_3legs_1_rimage_1] iwi-aor--- 168.00m         /dev/sdc1(1)
  [synced_three_raid10_3legs_1_rimage_2] Iwi-aor-p- 168.00m         [unknown](1)
  [synced_three_raid10_3legs_1_rimage_3] iwi-aor--- 168.00m         /dev/sdb1(1)
  [synced_three_raid10_3legs_1_rimage_4] Iwi-aor-p- 168.00m         [unknown](1)
  [synced_three_raid10_3legs_1_rimage_5] iwi-aor--- 168.00m         /dev/sdh1(1)
  [synced_three_raid10_3legs_1_rmeta_0]  ewi-aor-p-   4.00m         [unknown](0)
  [synced_three_raid10_3legs_1_rmeta_1]  ewi-aor---   4.00m         /dev/sdc1(0)
  [synced_three_raid10_3legs_1_rmeta_2]  ewi-aor-p-   4.00m         [unknown](0)
  [synced_three_raid10_3legs_1_rmeta_3]  ewi-aor---   4.00m         /dev/sdb1(0)
  [synced_three_raid10_3legs_1_rmeta_4]  ewi-aor-p-   4.00m         [unknown](0)
  [synced_three_raid10_3legs_1_rmeta_5]  ewi-aor---   4.00m         /dev/sdh1(0)


Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sda1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sdc1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdb1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdh1 *IS* in the volume(s)
Verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_2 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_2 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_0 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_0 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_4 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_4 on: host-127 

Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: unknown /dev/sdc1 unknown /dev/sdb1 unknown /dev/sdh1
ACTUAL LEG ORDER: [unknown] /dev/sdc1 [unknown] /dev/sdb1 [unknown] /dev/sdh1

Fault policy is warn... Manually repairing failed raid volumes
host-127: 'lvconvert --yes --repair black_bird/synced_three_raid10_3legs_1'
  WARNING: Disabling lvmetad cache for repair command.
  WARNING: Not using lvmetad because of repair.
  /dev/sda1: read failed after 0 of 1024 at 22545367040: Input/output error
  [...]
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rimage_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rmeta_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rimage_2 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rmeta_2 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rimage_4 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rmeta_4 while checking used and assumed devices.
  Insufficient suitable allocatable extents for logical volume : 129 more required
  Failed to replace 2 devices.  Attempting to replace 3 instead.
  Insufficient suitable allocatable extents for logical volume : 86 more required
  Failed to replace 1 devices.  Attempting to replace 2 instead.
  device-mapper: create ioctl on black_bird-synced_three_raid10_3legs_1_rimage_2-missing_0_0 LVM-g5jrM8UcbNT52T604O15eeXWLefQDY3AjR7pU57QtwL83sVvbelabWiK4alZmsUR-missing_0_0 failed: Device or resource busy
  Failed to lock logical volume black_bird/synced_three_raid10_3legs_1.
  Failed to replace faulty devices in black_bird/synced_three_raid10_3legs_1.
lvconvert repair failed for black_bird/synced_three_raid10_3legs_1 on host-127


Version-Release number of selected component (if applicable):
3.10.0-666.el7.x86_64

lvm2-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
lvm2-libs-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
lvm2-cluster-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-libs-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-event-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-event-libs-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017


How reproducible:
Often
Comment 2 Corey Marthaler 2017-06-07 10:03 EDT
Created attachment 1285808 [details]
verbose lvconvert --repair attempt
Comment 3 Heinz Mauelshagen 2018-03-14 10:58:34 EDT
Corey,
is this still regressing for you?
Works here with 2.02.177...

[root@rhel-7-5 ~]# lvm version
  LVM version:     2.02.177(2)-RHEL7 (2018-01-22)
  Library version: 1.02.146-RHEL7 (2018-01-22)
  Driver version:  4.37.0
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=internal --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --enable-python2-bindings --with-cluster=internal --with-clvmd=corosync --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd

root@rhel-7-5 ~]# lvs -ao+devices                                                                        
  /dev/sdaf: open failed: No such device or address
  /dev/sdag: open failed: No such device or address
  /dev/sdah: open failed: No such device or address
  LV           VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                                            
  r            nvm  rwi-a-r-r- 24.00m                                    100.00           r_rimage_0(0),r_rimage_1(0),r_rimage_2(0),r_rimage_3(0),r_rimage_4(0),r_rimage_5(0)
  [r_rimage_0] nvm  iwi-aor---  8.00m                                                     /dev/sda(1)                                                                        
  [r_rimage_1] nvm  Iwi-aor-r-  8.00m                                                     /dev/sdaf(1)                                                                       
  [r_rimage_2] nvm  iwi-aor---  8.00m                                                     /dev/sdab(1)                                                                       
  [r_rimage_3] nvm  Iwi-aor-r-  8.00m                                                     /dev/sdag(1)                                                                       
  [r_rimage_4] nvm  Iwi-aor-r-  8.00m                                                     /dev/sdah(1)                                                                       
  [r_rimage_5] nvm  iwi-aor---  8.00m                                                     /dev/sdae(1)                                                                       
  [r_rmeta_0]  nvm  ewi-aor---  4.00m                                                     /dev/sda(0)                                                                        
  [r_rmeta_1]  nvm  ewi-aor-r-  4.00m                                                     /dev/sdaf(0)                                                                       
  [r_rmeta_2]  nvm  ewi-aor---  4.00m                                                     /dev/sdab(0)                                                                       
  [r_rmeta_3]  nvm  ewi-aor-r-  4.00m                                                     /dev/sdag(0)                                                                       
  [r_rmeta_4]  nvm  ewi-aor-r-  4.00m                                                     /dev/sdah(0)                                                                       
  [r_rmeta_5]  nvm  ewi-aor---  4.00m                                                     /dev/sdae(0)                                                                       
  root         rhel -wi-ao---- 45.12g                                                     /dev/vda2(0)                                                                       
  swap         rhel -wi-ao----  3.88g                                                     /dev/vda2(11550)                                                                   
[root@rhel-7-5 ~]# lvconvert --repair -y nvm/r       
  WARNING: Disabling lvmetad cache for repair command.
  WARNING: Not using lvmetad because of repair.
  Couldn't find device with uuid Rm2mLI-w7Ge-EAw1-Dm3T-pIAw-eq6g-AQXbFE.
  Couldn't find device with uuid dRPPpW-2g4O-7Hjd-fgDf-LwNa-bxVV-MmzP7s.
  Couldn't find device with uuid 43CswJ-jF2b-CAiv-HL9w-fbvq-zwmM-M4JTOo.
  Faulty devices in nvm/r successfully replaced.
[root@rhel-7-5 ~]# lvs -ao+devices                 
  WARNING: Not using lvmetad because a repair command was run.
  Couldn't find device with uuid Rm2mLI-w7Ge-EAw1-Dm3T-pIAw-eq6g-AQXbFE.
  Couldn't find device with uuid dRPPpW-2g4O-7Hjd-fgDf-LwNa-bxVV-MmzP7s.
  Couldn't find device with uuid 43CswJ-jF2b-CAiv-HL9w-fbvq-zwmM-M4JTOo.
  LV           VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                                            
  r            nvm  rwi-a-r--- 24.00m                                    100.00           r_rimage_0(0),r_rimage_1(0),r_rimage_2(0),r_rimage_3(0),r_rimage_4(0),r_rimage_5(0)
  [r_rimage_0] nvm  iwi-aor---  8.00m                                                     /dev/sda(1)                                                                        
  [r_rimage_1] nvm  iwi-aor---  8.00m                                                     /dev/sdaa(1)                                                                       
  [r_rimage_2] nvm  iwi-aor---  8.00m                                                     /dev/sdab(1)                                                                       
  [r_rimage_3] nvm  iwi-aor---  8.00m                                                     /dev/sdac(1)                                                                       
  [r_rimage_4] nvm  iwi-aor---  8.00m                                                     /dev/sdad(1)                                                                       
  [r_rimage_5] nvm  iwi-aor---  8.00m                                                     /dev/sdae(1)                                                                       
  [r_rmeta_0]  nvm  ewi-aor---  4.00m                                                     /dev/sda(0)                                                                        
  [r_rmeta_1]  nvm  ewi-aor---  4.00m                                                     /dev/sdaa(0)                                                                       
  [r_rmeta_2]  nvm  ewi-aor---  4.00m                                                     /dev/sdab(0)                                                                       
  [r_rmeta_3]  nvm  ewi-aor---  4.00m                                                     /dev/sdac(0)                                                                       
  [r_rmeta_4]  nvm  ewi-aor---  4.00m                                                     /dev/sdad(0)                                                                       
  [r_rmeta_5]  nvm  ewi-aor---  4.00m                                                     /dev/sdae(0)                                                                       
  root         rhel -wi-ao---- 45.12g                                                     /dev/vda2(0)                                                                       
  swap         rhel -wi-ao----  3.88g                                                     /dev/vda2(11550)
Comment 4 Corey Marthaler 2018-03-22 11:19:52 EDT
I'm no longer able to reproduce this issue w/ the latest rpms, Closing.

3.10.0-860.el7.x86_64
lvm2-2.02.177-4.el7    BUILT: Fri Feb 16 13:22:31 CET 2018
lvm2-libs-2.02.177-4.el7    BUILT: Fri Feb 16 13:22:31 CET 2018
device-mapper-1.02.146-4.el7    BUILT: Fri Feb 16 13:22:31 CET 2018
device-mapper-libs-1.02.146-4.el7    BUILT: Fri Feb 16 13:22:31 CET 2018
device-mapper-event-1.02.146-4.el7    BUILT: Fri Feb 16 13:22:31 CET 2018
device-mapper-event-libs-1.02.146-4.el7    BUILT: Fri Feb 16 13:22:31 CET 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 12:07:18 CET 2017

Note You need to log in before you can comment on or make changes to this bug.