Bug 1461562 - RAID RESHAPE: Reshape request failed on exclusive raid on clustered VG (md: pers->run() failed)
RAID RESHAPE: Reshape request failed on exclusive raid on clustered VG (md: p...
Status: ASSIGNED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.4
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Heinz Mauelshagen
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-14 15:22 EDT by Corey Marthaler
Modified: 2017-06-20 22:01 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
verbose lvconvert attempt (287.19 KB, text/plain)
2017-06-14 16:01 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2017-06-14 15:22:41 EDT
Description of problem:
This appears very similar to bug 1448116. However, unlike bug 1448116, the "takeover" operation passed, and the reshape image addition is what failed, however it didn't cause a deadlock. I'll attempt to reproduce and provide verbose output from the lvconvert cmd. Feel free to mark this a dup of bug 1448116 if that appears to be the case.


3.10.0-681.el7.bz1443999a.x86_64

lvm2-2.02.171-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017
lvm2-libs-2.02.171-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017
lvm2-cluster-2.02.171-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017
device-mapper-1.02.140-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017
device-mapper-libs-1.02.140-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017
device-mapper-event-1.02.140-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017
device-mapper-event-libs-1.02.140-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017
cmirror-2.02.171-4.el7    BUILT: Wed Jun  7 09:16:17 CDT 2017



================================================================================
Iteration 0.2 started at Wed Jun 14 13:44:43 CDT 2017
================================================================================
Scenario raid5_ra: Convert Striped raid5_ra volume
********* Take over hash info for this scenario *********
* from type:    raid5_ra
* to type:      raid6_ra_6
* from legs:    4
* to legs:      5
* from region:  8192.00k
* to region:    1024.00k
* contiguous:   0
* snapshot:     0
******************************************************

Creating original volume on harding-03...
harding-03: lvcreate -aye --type raid5_ra -R 8192.00k -i 4 -n takeover -L 4G centipede2
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 44.89% )
   0/1 mirror(s) are fully synced: ( 82.78% )
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Placing a spacer on all raid image PVs so that expansion will have to be placed beyond
Extending raid beyond spacer
        lvextend -L +50M centipede2/takeover

Current volume device structure:
  LV                  Attr       LSize    Cpy%Sync Devices
  lvol0               -wi-a-----   20.00m          /dev/mapper/mpatha1(257)
  lvol1               -wi-a-----   20.00m          /dev/mapper/mpatha1(262)
  lvol2               -wi-a-----   20.00m          /dev/mapper/mpathb1(257)
  lvol3               -wi-a-----   20.00m          /dev/mapper/mpathb1(262)
  lvol4               -wi-a-----   20.00m          /dev/mapper/mpathc1(257)
  lvol5               -wi-a-----   20.00m          /dev/mapper/mpathc1(262)
  lvol6               -wi-a-----   20.00m          /dev/mapper/mpathd1(257)
  lvol7               -wi-a-----   20.00m          /dev/mapper/mpathd1(262)
  lvol8               -wi-a-----   20.00m          /dev/mapper/mpathe1(257)
  lvol9               -wi-a-----   20.00m          /dev/mapper/mpathe1(262)
  takeover            rwi-a-r---    4.06g 100.00   takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0),takeover_rimage_4(0)
  [takeover_rimage_0] iwi-aor---   <1.02g          /dev/mapper/mpatha1(1)
  [takeover_rimage_0] iwi-aor---   <1.02g          /dev/mapper/mpatha1(267)
  [takeover_rimage_1] iwi-aor---   <1.02g          /dev/mapper/mpathb1(1)
  [takeover_rimage_1] iwi-aor---   <1.02g          /dev/mapper/mpathb1(267)
  [takeover_rimage_2] iwi-aor---   <1.02g          /dev/mapper/mpathc1(1)
  [takeover_rimage_2] iwi-aor---   <1.02g          /dev/mapper/mpathc1(267)
  [takeover_rimage_3] iwi-aor---   <1.02g          /dev/mapper/mpathd1(1)
  [takeover_rimage_3] iwi-aor---   <1.02g          /dev/mapper/mpathd1(267)
  [takeover_rimage_4] iwi-aor---   <1.02g          /dev/mapper/mpathe1(1)
  [takeover_rimage_4] iwi-aor---   <1.02g          /dev/mapper/mpathe1(267)
  [takeover_rmeta_0]  ewi-aor---    4.00m          /dev/mapper/mpatha1(0)
  [takeover_rmeta_1]  ewi-aor---    4.00m          /dev/mapper/mpathb1(0)
  [takeover_rmeta_2]  ewi-aor---    4.00m          /dev/mapper/mpathc1(0)
  [takeover_rmeta_3]  ewi-aor---    4.00m          /dev/mapper/mpathd1(0)
  [takeover_rmeta_4]  ewi-aor---    4.00m          /dev/mapper/mpathe1(0)

Creating xfs on top of mirror(s) on harding-03...
Mounting mirrored xfs filesystems on harding-03...

Writing verification files (checkit) to mirror(s) on...
        ---- harding-03 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- harding-03 ----

TAKEOVER: lvconvert --yes -R 1024.00k  --type raid6_ra_6 centipede2/takeover
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 27.27% )
   0/1 mirror(s) are fully synced: ( 54.59% )
   0/1 mirror(s) are fully synced: ( 80.40% )
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Current volume device structure:
  LV                  Attr       LSize    Cpy%Sync Devices
  lvol0               -wi-a-----   20.00m          /dev/mapper/mpatha1(257)
  lvol1               -wi-a-----   20.00m          /dev/mapper/mpatha1(262)
  lvol2               -wi-a-----   20.00m          /dev/mapper/mpathb1(257)
  lvol3               -wi-a-----   20.00m          /dev/mapper/mpathb1(262)
  lvol4               -wi-a-----   20.00m          /dev/mapper/mpathc1(257)
  lvol5               -wi-a-----   20.00m          /dev/mapper/mpathc1(262)
  lvol6               -wi-a-----   20.00m          /dev/mapper/mpathd1(257)
  lvol7               -wi-a-----   20.00m          /dev/mapper/mpathd1(262)
  lvol8               -wi-a-----   20.00m          /dev/mapper/mpathe1(257)
  lvol9               -wi-a-----   20.00m          /dev/mapper/mpathe1(262)
  takeover            rwi-aor---    4.06g 100.00   takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0),takeover_rimage_4(0),takeover_rimage_5(0)
  [takeover_rimage_0] iwi-aor---   <1.02g          /dev/mapper/mpatha1(1)
  [takeover_rimage_0] iwi-aor---   <1.02g          /dev/mapper/mpatha1(267)
  [takeover_rimage_1] iwi-aor---   <1.02g          /dev/mapper/mpathb1(1)
  [takeover_rimage_1] iwi-aor---   <1.02g          /dev/mapper/mpathb1(267)
  [takeover_rimage_2] iwi-aor---   <1.02g          /dev/mapper/mpathc1(1)
  [takeover_rimage_2] iwi-aor---   <1.02g          /dev/mapper/mpathc1(267)
  [takeover_rimage_3] iwi-aor---   <1.02g          /dev/mapper/mpathd1(1)
  [takeover_rimage_3] iwi-aor---   <1.02g          /dev/mapper/mpathd1(267)
  [takeover_rimage_4] iwi-aor---   <1.02g          /dev/mapper/mpathe1(1)
  [takeover_rimage_4] iwi-aor---   <1.02g          /dev/mapper/mpathe1(267)
  [takeover_rimage_5] iwi-aor---   <1.02g          /dev/mapper/mpathf1(1)
  [takeover_rmeta_0]  ewi-aor---    4.00m          /dev/mapper/mpatha1(0)
  [takeover_rmeta_1]  ewi-aor---    4.00m          /dev/mapper/mpathb1(0)
  [takeover_rmeta_2]  ewi-aor---    4.00m          /dev/mapper/mpathc1(0)
  [takeover_rmeta_3]  ewi-aor---    4.00m          /dev/mapper/mpathd1(0)
  [takeover_rmeta_4]  ewi-aor---    4.00m          /dev/mapper/mpathe1(0)
  [takeover_rmeta_5]  ewi-aor---    4.00m          /dev/mapper/mpathf1(0)

Verifying files (checkit) on mirror(s) on...
        ---- harding-03 ----

RESHAPE: lvconvert --yes  --stripes 5 centipede2/takeover
  WARNING: Adding stripes to active and open logical volume centipede2/takeover will grow it from 1040 to 1300 extents!
  Error locking on node 2: device-mapper: reload ioctl on  (253:29) failed: Invalid argument
  Failed to lock logical volume centipede2/takeover.
  Internal error: Update of LV centipede2/takeover failed.
  Reshape request failed on LV centipede2/takeover.
couldn't reshape volume



Jun 14 13:47:18 harding-03 qarshd[8448]: Running cmdline: lvconvert --yes --stripes 5 centipede2/takeover
Jun 14 13:47:19 harding-03 multipathd: dm-42: remove map (uevent)
Jun 14 13:47:19 harding-03 multipathd: dm-42: devmap not registered, can't remove
Jun 14 13:47:19 harding-03 multipathd: dm-42: remove map (uevent)
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: device dm-20 operational as raid disk 0
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: device dm-22 operational as raid disk 1
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: device dm-24 operational as raid disk 2
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: device dm-26 operational as raid disk 3
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: device dm-28 operational as raid disk 4
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: device dm-41 operational as raid disk 5
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 17
Jun 14 13:47:19 harding-03 dmeventd[7295]: No longer monitoring RAID device centipede2-takeover for events.
Jun 14 13:47:19 harding-03 kernel: dm-29: detected capacity change from 5452595200 to 4362076160
Jun 14 13:47:19 harding-03 kernel: VFS: busy inodes on changed media or resized disk dm-29
Jun 14 13:47:19 harding-03 lvm[7295]: Monitoring RAID device centipede2-takeover for events.
Jun 14 13:47:19 harding-03 kernel: md/raid:mdX: reshape_position too early for auto-recovery - aborting.
Jun 14 13:47:19 harding-03 kernel: md: pers->run() failed ...
Jun 14 13:47:19 harding-03 kernel: device-mapper: table: 253:29: raid: Failed to run raid array
Jun 14 13:47:19 harding-03 kernel: device-mapper: ioctl: error adding target to table


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Corey Marthaler 2017-06-14 16:01 EDT
Created attachment 1287808 [details]
verbose lvconvert attempt
Comment 4 Heinz Mauelshagen 2017-06-19 12:27:06 EDT
This is another effect of growing rimages and reordering their address space done in one step rather than 2 (bz1447812 is another one).  In the clustered VG case, the grown size of the rimage LVs is not propagated properly causing the raid personality function to fail the respective validation.

We have to restrict reshaping on the clustered LVs for the time being until this fix is properly designed, implemented and tested.
Comment 5 Jonathan Earl Brassow 2017-06-19 13:49:56 EDT
Disallowing reshape/takeover while LV is in cluster VG until future release

Note You need to log in before you can comment on or make changes to this bug.