Bug 1459584 - synced raid10 multiple image repair attempt fails
synced raid10 multiple image repair attempt fails
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: LVM and device-mapper development team
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-07 10:01 EDT by Corey Marthaler
Modified: 2017-08-17 14:31 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
verbose lvconvert --repair attempt (160.78 KB, text/plain)
2017-06-07 10:03 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2017-06-07 10:01:48 EDT
Description of problem:
In this scenario, every other device was fails, basically the entire primary side of the mirror. This was the test case used to reproduce/verify bug 1434054, and that can also potentially cause bug 1281922.


  WARNING: Not using lvmetad because a repair command was run.

creating lvm devices...
host-127: pvcreate /dev/sde1 /dev/sda1 /dev/sdb1 /dev/sdh1 /dev/sdg1 /dev/sdd1 /dev/sdc1
host-127: vgcreate   black_bird /dev/sde1 /dev/sda1 /dev/sdb1 /dev/sdh1 /dev/sdg1 /dev/sdd1 /dev/sdc1

================================================================================
Iteration 0.1 started at Tue Jun  6 17:25:04 CDT 2017
================================================================================
Scenario kill_three_synced_raid10_3legs: Kill three legs (none of which share the same stripe leg) of synced 3 leg raid10 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_three_raid10_3legs_1
* sync:               1
* type:               raid10
* -m |-i value:       3
* leg devices:        /dev/sdg1 /dev/sdc1 /dev/sda1 /dev/sdb1 /dev/sde1 /dev/sdh1
* spanned legs:       0
* manual repair:      0
* no MDA devices:     
* failpv(s):          /dev/sdg1 /dev/sda1 /dev/sde1
* failnode(s):        host-127
* lvmetad:            1
* raid fault policy:  warn
******************************************************

Creating raids(s) on host-127...
host-127: lvcreate  --type raid10 -i 3 -n synced_three_raid10_3legs_1 -L 500M black_bird /dev/sdg1:0-2400 /dev/sdc1:0-2400 /dev/sda1:0-2400 /dev/sdb1:0-2400 /dev/sde1:0-2400 /dev/sdh1:0-2400

Current mirror/raid device structure(s):
  LV                                     Attr       LSize   Cpy%Sync Devices
   synced_three_raid10_3legs_1            rwi-a-r--- 504.00m 100.00  synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
   [synced_three_raid10_3legs_1_rimage_0] iwi-aor--- 168.00m         /dev/sdg1(1)
   [synced_three_raid10_3legs_1_rimage_1] iwi-aor--- 168.00m         /dev/sdc1(1)
   [synced_three_raid10_3legs_1_rimage_2] iwi-aor--- 168.00m         /dev/sda1(1)
   [synced_three_raid10_3legs_1_rimage_3] iwi-aor--- 168.00m         /dev/sdb1(1)
   [synced_three_raid10_3legs_1_rimage_4] iwi-aor--- 168.00m         /dev/sde1(1)
   [synced_three_raid10_3legs_1_rimage_5] iwi-aor--- 168.00m         /dev/sdh1(1)
   [synced_three_raid10_3legs_1_rmeta_0]  ewi-aor---   4.00m         /dev/sdg1(0)
   [synced_three_raid10_3legs_1_rmeta_1]  ewi-aor---   4.00m         /dev/sdc1(0)
   [synced_three_raid10_3legs_1_rmeta_2]  ewi-aor---   4.00m         /dev/sda1(0)
   [synced_three_raid10_3legs_1_rmeta_3]  ewi-aor---   4.00m         /dev/sdb1(0)
   [synced_three_raid10_3legs_1_rmeta_4]  ewi-aor---   4.00m         /dev/sde1(0)
   [synced_three_raid10_3legs_1_rmeta_5]  ewi-aor---   4.00m         /dev/sdh1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 1, some allocation should work)

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Creating xfs on top of mirror(s) on host-127...
Mounting mirrored xfs filesystems on host-127...

PV=/dev/sda1
        synced_three_raid10_3legs_1_rimage_2: 2
        synced_three_raid10_3legs_1_rmeta_2: 2
PV=/dev/sdg1
        synced_three_raid10_3legs_1_rimage_0: 2
        synced_three_raid10_3legs_1_rmeta_0: 2
PV=/dev/sde1
        synced_three_raid10_3legs_1_rimage_4: 2
        synced_three_raid10_3legs_1_rmeta_4: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-127 ----

<start name="host-127_synced_three_raid10_3legs_1"  pid="19280" time="Tue Jun  6 17:25:50 2017 -0500" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-127 ----

Disabling device sdg on host-127rescan device...
Disabling device sda on host-127rescan device...
Disabling device sde on host-127rescan device...

Getting recovery check start time from /var/log/messages: Jun  6 17:26
Attempting I/O to cause mirror down conversion(s) on host-127
dd if=/dev/zero of=/mnt/synced_three_raid10_3legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0648834 s, 646 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  LV                                     Attr       LSize   Cpy%Sync Devices
  synced_three_raid10_3legs_1            rwi-aor-p- 504.00m 100.00  synced_three_raid10_3legs_1_rimage_0(0),synced_three_raid10_3legs_1_rimage_1(0),synced_three_raid10_3legs_1_rimage_2(0),synced_three_raid10_3legs_1_rimage_3(0),synced_three_raid10_3legs_1_rimage_4(0),synced_three_raid10_3legs_1_rimage_5(0)
  [synced_three_raid10_3legs_1_rimage_0] Iwi-aor-p- 168.00m         [unknown](1)
  [synced_three_raid10_3legs_1_rimage_1] iwi-aor--- 168.00m         /dev/sdc1(1)
  [synced_three_raid10_3legs_1_rimage_2] Iwi-aor-p- 168.00m         [unknown](1)
  [synced_three_raid10_3legs_1_rimage_3] iwi-aor--- 168.00m         /dev/sdb1(1)
  [synced_three_raid10_3legs_1_rimage_4] Iwi-aor-p- 168.00m         [unknown](1)
  [synced_three_raid10_3legs_1_rimage_5] iwi-aor--- 168.00m         /dev/sdh1(1)
  [synced_three_raid10_3legs_1_rmeta_0]  ewi-aor-p-   4.00m         [unknown](0)
  [synced_three_raid10_3legs_1_rmeta_1]  ewi-aor---   4.00m         /dev/sdc1(0)
  [synced_three_raid10_3legs_1_rmeta_2]  ewi-aor-p-   4.00m         [unknown](0)
  [synced_three_raid10_3legs_1_rmeta_3]  ewi-aor---   4.00m         /dev/sdb1(0)
  [synced_three_raid10_3legs_1_rmeta_4]  ewi-aor-p-   4.00m         [unknown](0)
  [synced_three_raid10_3legs_1_rmeta_5]  ewi-aor---   4.00m         /dev/sdh1(0)


Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sda1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sdc1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdb1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdh1 *IS* in the volume(s)
Verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_2 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_2 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_0 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_0 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rimage_4 on: host-127 
Checking EXISTENCE and STATE of synced_three_raid10_3legs_1_rmeta_4 on: host-127 

Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: unknown /dev/sdc1 unknown /dev/sdb1 unknown /dev/sdh1
ACTUAL LEG ORDER: [unknown] /dev/sdc1 [unknown] /dev/sdb1 [unknown] /dev/sdh1

Fault policy is warn... Manually repairing failed raid volumes
host-127: 'lvconvert --yes --repair black_bird/synced_three_raid10_3legs_1'
  WARNING: Disabling lvmetad cache for repair command.
  WARNING: Not using lvmetad because of repair.
  /dev/sda1: read failed after 0 of 1024 at 22545367040: Input/output error
  [...]
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rimage_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rmeta_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rimage_2 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rmeta_2 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rimage_4 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/synced_three_raid10_3legs_1_rmeta_4 while checking used and assumed devices.
  Insufficient suitable allocatable extents for logical volume : 129 more required
  Failed to replace 2 devices.  Attempting to replace 3 instead.
  Insufficient suitable allocatable extents for logical volume : 86 more required
  Failed to replace 1 devices.  Attempting to replace 2 instead.
  device-mapper: create ioctl on black_bird-synced_three_raid10_3legs_1_rimage_2-missing_0_0 LVM-g5jrM8UcbNT52T604O15eeXWLefQDY3AjR7pU57QtwL83sVvbelabWiK4alZmsUR-missing_0_0 failed: Device or resource busy
  Failed to lock logical volume black_bird/synced_three_raid10_3legs_1.
  Failed to replace faulty devices in black_bird/synced_three_raid10_3legs_1.
lvconvert repair failed for black_bird/synced_three_raid10_3legs_1 on host-127


Version-Release number of selected component (if applicable):
3.10.0-666.el7.x86_64

lvm2-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
lvm2-libs-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
lvm2-cluster-2.02.171-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-libs-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-event-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-event-libs-1.02.140-3.el7    BUILT: Wed May 31 08:36:29 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017


How reproducible:
Often
Comment 2 Corey Marthaler 2017-06-07 10:03 EDT
Created attachment 1285808 [details]
verbose lvconvert --repair attempt

Note You need to log in before you can comment on or make changes to this bug.