Bug 1618806
Summary: | RAID TAKEOVER: issues and corruption attempting raid5_n -> linear conversion | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> |
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> |
lvm2 sub component: | Mirroring and RAID | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED NOTABUG | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac |
Version: | 7.6 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-08-17 19:49:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2018-08-17 16:16:04 UTC
Here's the actual test output. It appears the conversion to raid1 is what causes the corruption. ================================================================================ Iteration 0.1 started at Fri Aug 17 10:55:18 CDT 2018 ================================================================================ Scenario random_interim_conversion: Convert Random striped type to indirect interim volume ********* Take over hash info for this scenario ********* * from type: raid5_n * to type: linear * from legs: 3 * to legs: 3 * from region: 1024.00k * to region: 0 * contiguous: 0 ****************************************************** Creating original volume on hayes-01... hayes-01: lvcreate --type raid5_n -R 1024.00k -i 3 -n takeover -L 2.75G centipede2 Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 49.97% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Placing a spacer on all raid image PVs so that expansion will have to be placed beyond Extending raid beyond spacer lvextend -L +50M centipede2/takeover Current volume device structure: LV Type Attr LSize Cpy%Sync Devices lvol0 linear -wi-a----- 20.00m /dev/sdg1(236) lvol1 linear -wi-a----- 20.00m /dev/sdg1(241) lvol2 linear -wi-a----- 20.00m /dev/sdh1(236) lvol3 linear -wi-a----- 20.00m /dev/sdh1(241) lvol4 linear -wi-a----- 20.00m /dev/sdi1(236) lvol5 linear -wi-a----- 20.00m /dev/sdi1(241) lvol6 linear -wi-a----- 20.00m /dev/sdk1(236) lvol7 linear -wi-a----- 20.00m /dev/sdk1(241) takeover raid5_n rwi-a-r--- 2.81g 97.92 takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0) [takeover_rimage_0] linear Iwi-aor--- 960.00m /dev/sdg1(1) [takeover_rimage_0] linear Iwi-aor--- 960.00m /dev/sdg1(246) [takeover_rimage_1] linear Iwi-aor--- 960.00m /dev/sdk1(1) [takeover_rimage_1] linear Iwi-aor--- 960.00m /dev/sdk1(246) [takeover_rimage_2] linear Iwi-aor--- 960.00m /dev/sdi1(1) [takeover_rimage_2] linear Iwi-aor--- 960.00m /dev/sdi1(246) [takeover_rimage_3] linear Iwi-aor--- 960.00m /dev/sdh1(1) [takeover_rimage_3] linear Iwi-aor--- 960.00m /dev/sdh1(246) [takeover_rmeta_0] linear ewi-aor--- 4.00m /dev/sdg1(0) [takeover_rmeta_1] linear ewi-aor--- 4.00m /dev/sdk1(0) [takeover_rmeta_2] linear ewi-aor--- 4.00m /dev/sdi1(0) [takeover_rmeta_3] linear ewi-aor--- 4.00m /dev/sdh1(0) Creating xfs on top of mirror(s) on hayes-01... Mounting mirrored xfs filesystems on hayes-01... Writing verification files (checkit) to mirror(s) on... ---- hayes-01 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- hayes-01 ---- TAKEOVER: lvconvert --force --yes --type linear centipede2/takeover Converting raid5_n LV centipede2/takeover to 2 stripes first. Replaced LV type linear with possible type raid5_n. Repeat this command to convert to linear after an interim conversion has finished. WARNING: Removing stripes from active and open logical volume centipede2/takeover will shrink it from 2.81 GiB to 960.00 MiB! THIS MAY DESTROY (PARTS OF) YOUR DATA! WARNING: to remove freed stripes after the conversion has finished, you have to run "lvconvert --stripes 1 centipede2/takeover" Current volume device structure: LV Type Attr LSize Cpy%Sync Devices lvol0 linear -wi-a----- 20.00m /dev/sdg1(236) lvol1 linear -wi-a----- 20.00m /dev/sdg1(241) lvol2 linear -wi-a----- 20.00m /dev/sdh1(236) lvol3 linear -wi-a----- 20.00m /dev/sdh1(241) lvol4 linear -wi-a----- 20.00m /dev/sdi1(236) lvol5 linear -wi-a----- 20.00m /dev/sdi1(241) lvol6 linear -wi-a----- 20.00m /dev/sdk1(236) lvol7 linear -wi-a----- 20.00m /dev/sdk1(241) takeover raid5_n rwi-aor-s- 2.81g 56.67 takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0) [takeover_rimage_0] linear Iwi-aor--- 964.00m /dev/sdg1(1) [takeover_rimage_0] linear Iwi-aor--- 964.00m /dev/sdg1(246) [takeover_rimage_1] linear Iwi-aor--- 964.00m /dev/sdk1(1) [takeover_rimage_1] linear Iwi-aor--- 964.00m /dev/sdk1(246) [takeover_rimage_2] linear Iwi-aor-R- 964.00m /dev/sdi1(1) [takeover_rimage_2] linear Iwi-aor-R- 964.00m /dev/sdi1(246) [takeover_rimage_3] linear Iwi-aor-R- 964.00m /dev/sdh1(1) [takeover_rimage_3] linear Iwi-aor-R- 964.00m /dev/sdh1(246) [takeover_rmeta_0] linear ewi-aor--- 4.00m /dev/sdg1(0) [takeover_rmeta_1] linear ewi-aor--- 4.00m /dev/sdk1(0) [takeover_rmeta_2] linear ewi-aor-R- 4.00m /dev/sdi1(0) [takeover_rmeta_3] linear ewi-aor-R- 4.00m /dev/sdh1(0) Verifying files (checkit) on mirror(s) on... ---- hayes-01 ---- current segtype doesn't match the takeover attempted current: raid5_n ne expected: linear Continuing to run this same takeover cmd until at the requested segtype (attempts 1) TAKEOVER: lvconvert --force --yes --type linear centipede2/takeover Converting raid5_n LV centipede2/takeover to 2 stripes first. Replaced LV type linear with possible type raid5_n. Repeat this command to convert to linear after an interim conversion has finished. Current volume device structure: LV Type Attr LSize Cpy%Sync Devices lvol0 linear -wi-a----- 20.00m /dev/sdg1(236) lvol1 linear -wi-a----- 20.00m /dev/sdg1(241) lvol2 linear -wi-a----- 20.00m /dev/sdh1(236) lvol3 linear -wi-a----- 20.00m /dev/sdh1(241) lvol4 linear -wi-a----- 20.00m /dev/sdi1(236) lvol5 linear -wi-a----- 20.00m /dev/sdi1(241) lvol6 linear -wi-a----- 20.00m /dev/sdk1(236) lvol7 linear -wi-a----- 20.00m /dev/sdk1(241) takeover raid5_n rwi-aor--- 960.00m 100.00 takeover_rimage_0(0),takeover_rimage_1(0) [takeover_rimage_0] linear iwi-aor--- 964.00m /dev/sdg1(1) [takeover_rimage_0] linear iwi-aor--- 964.00m /dev/sdg1(246) [takeover_rimage_1] linear iwi-aor--- 964.00m /dev/sdk1(1) [takeover_rimage_1] linear iwi-aor--- 964.00m /dev/sdk1(246) [takeover_rmeta_0] linear ewi-aor--- 4.00m /dev/sdg1(0) [takeover_rmeta_1] linear ewi-aor--- 4.00m /dev/sdk1(0) Verifying files (checkit) on mirror(s) on... ---- hayes-01 ---- current segtype doesn't match the takeover attempted current: raid5_n ne expected: linear Continuing to run this same takeover cmd until at the requested segtype (attempts 2) # It appears to be this conversion that causes the corruption: TAKEOVER: lvconvert --force --yes --type linear centipede2/takeover Replaced LV type linear with possible type raid1. Repeat this command to convert to linear after an interim conversion has finished. Current volume device structure: LV Type Attr LSize Cpy%Sync Devices lvol0 linear -wi-a----- 20.00m /dev/sdg1(236) lvol1 linear -wi-a----- 20.00m /dev/sdg1(241) lvol2 linear -wi-a----- 20.00m /dev/sdh1(236) lvol3 linear -wi-a----- 20.00m /dev/sdh1(241) lvol4 linear -wi-a----- 20.00m /dev/sdi1(236) lvol5 linear -wi-a----- 20.00m /dev/sdi1(241) lvol6 linear -wi-a----- 20.00m /dev/sdk1(236) lvol7 linear -wi-a----- 20.00m /dev/sdk1(241) takeover raid1 rwi-aor--- 960.00m 100.00 takeover_rimage_0(0),takeover_rimage_1(0) [takeover_rimage_0] linear iwi-aor--- 960.00m /dev/sdg1(2) [takeover_rimage_0] linear iwi-aor--- 960.00m /dev/sdg1(246) [takeover_rimage_1] linear iwi-aor--- 960.00m /dev/sdk1(2) [takeover_rimage_1] linear iwi-aor--- 960.00m /dev/sdk1(246) [takeover_rmeta_0] linear ewi-aor--- 4.00m /dev/sdg1(0) [takeover_rmeta_1] linear ewi-aor--- 4.00m /dev/sdk1(0) Verifying files (checkit) on mirror(s) on... ---- hayes-01 ---- Can not stat bnpiblpnmfqywmenh: Input/output error checkit write verify failed Corruption was **NOT** expected for this scenario! Actually, it appears the corruption can happen after the first takeover attempt. ================================================================================ Iteration 0.1 started at Fri Aug 17 13:21:38 CDT 2018 ================================================================================ Scenario random_interim_conversion: Convert Random striped type to indirect interim volume ********* Take over hash info for this scenario ********* * from type: raid5_n * to type: linear * from legs: 3 * to legs: 3 * from region: 8192.00k * to region: 0 * contiguous: 1 ****************************************************** Creating original volume on hayes-01... hayes-01: lvcreate --type raid5_n -R 8192.00k -i 3 -n takeover -L 2.75G centipede2 Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Current volume device structure: LV Type Attr LSize Cpy%Sync Devices takeover raid5_n rwi-a-r--- 2.75g 100.00 takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0) [takeover_rimage_0] linear iwi-aor--- 940.00m /dev/sdg1(1) [takeover_rimage_1] linear iwi-aor--- 940.00m /dev/sdk1(1) [takeover_rimage_2] linear iwi-aor--- 940.00m /dev/sdi1(1) [takeover_rimage_3] linear iwi-aor--- 940.00m /dev/sdh1(1) [takeover_rmeta_0] linear ewi-aor--- 4.00m /dev/sdg1(0) [takeover_rmeta_1] linear ewi-aor--- 4.00m /dev/sdk1(0) [takeover_rmeta_2] linear ewi-aor--- 4.00m /dev/sdi1(0) [takeover_rmeta_3] linear ewi-aor--- 4.00m /dev/sdh1(0) Creating xfs on top of mirror(s) on hayes-01... Mounting mirrored xfs filesystems on hayes-01... Writing verification files (checkit) to mirror(s) on... ---- hayes-01 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- hayes-01 ---- TAKEOVER: lvconvert --force --yes --type linear centipede2/takeover Converting raid5_n LV centipede2/takeover to 2 stripes first. Replaced LV type linear with possible type raid5_n. Repeat this command to convert to linear after an interim conversion has finished. WARNING: Removing stripes from active and open logical volume centipede2/takeover will shrink it from 2.75 GiB to 940.00 MiB! THIS MAY DESTROY (PARTS OF) YOUR DATA! WARNING: to remove freed stripes after the conversion has finished, you have to run "lvconvert --stripes 1 centipede2/takeover" Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) <fail name="hayes-01_takeover" pid="21472" time="Fri Aug 17 13:23:14 2018 -0500" type="cmd" duration="60" ec="1" /> Sleeping 15 sec Current volume device structure: LV Type Attr LSize Cpy%Sync Devices takeover raid5_n rwi-aor-R- 2.75g 100.00 takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0) [takeover_rimage_0] linear iwi-aor--- 944.00m /dev/sdg1(1) [takeover_rimage_1] linear iwi-aor--- 944.00m /dev/sdk1(1) [takeover_rimage_2] linear Iwi-aor-R- 944.00m /dev/sdi1(1) [takeover_rimage_3] linear Iwi-aor-R- 944.00m /dev/sdh1(1) [takeover_rmeta_0] linear ewi-aor--- 4.00m /dev/sdg1(0) [takeover_rmeta_1] linear ewi-aor--- 4.00m /dev/sdk1(0) [takeover_rmeta_2] linear ewi-aor-R- 4.00m /dev/sdi1(0) [takeover_rmeta_3] linear ewi-aor-R- 4.00m /dev/sdh1(0) Verifying files (checkit) on mirror(s) on... ---- hayes-01 ---- Can not cd to /mnt/takeover/checkit: Input/output error checkit write verify failed Aug 17 13:22:45 hayes-01 qarshd[54963]: Running cmdline: lvconvert --force --yes --type linear centipede2/takeover Aug 17 13:22:45 hayes-01 dmeventd[70059]: No longer monitoring RAID device centipede2-takeover for events. Aug 17 13:22:45 hayes-01 lvm[70059]: Monitoring RAID device centipede2-takeover for events. Aug 17 13:22:45 hayes-01 kernel: md/raid:mdX: device dm-1 operational as raid disk 0 Aug 17 13:22:45 hayes-01 kernel: md/raid:mdX: device dm-3 operational as raid disk 1 Aug 17 13:22:45 hayes-01 kernel: md/raid:mdX: device dm-5 operational as raid disk 2 Aug 17 13:22:45 hayes-01 kernel: md/raid:mdX: device dm-7 operational as raid disk 3 Aug 17 13:22:45 hayes-01 kernel: md/raid:mdX: raid level 5 active with 4 out of 4 devices, algorithm 5 Aug 17 13:22:45 hayes-01 dmeventd[70059]: No longer monitoring RAID device centipede2-takeover for events. Aug 17 13:22:45 hayes-01 kernel: dm-8: detected capacity change from 2956984320 to 985661440 Aug 17 13:22:45 hayes-01 kernel: VFS: busy inodes on changed media or resized disk dm-8 Aug 17 13:22:45 hayes-01 lvm[70059]: Monitoring RAID device centipede2-takeover for events. Aug 17 13:22:45 hayes-01 lvm[70059]: raid5_n array, centipede2-takeover, is not in-sync. Aug 17 13:22:45 hayes-01 kernel: md: reshape of RAID array mdX Aug 17 13:22:57 hayes-01 kernel: md/raid:mdX: device dm-1 operational as raid disk 0 Aug 17 13:22:57 hayes-01 kernel: md/raid:mdX: device dm-3 operational as raid disk 1 Aug 17 13:22:57 hayes-01 kernel: md/raid:mdX: device dm-5 operational as raid disk 2 Aug 17 13:22:57 hayes-01 kernel: md/raid:mdX: device dm-7 operational as raid disk 3 Aug 17 13:22:57 hayes-01 kernel: md/raid:mdX: raid level 5 active with 2 out of 2 devices, algorithm 5 Aug 17 13:22:57 hayes-01 dmeventd[70059]: No longer monitoring RAID device centipede2-takeover for events. Aug 17 13:22:57 hayes-01 kernel: md: mdX: reshape interrupted. Aug 17 13:22:57 hayes-01 kernel: dm-8: detected capacity change from 2956984320 to 985661440 Aug 17 13:22:57 hayes-01 kernel: VFS: busy inodes on changed media or resized disk dm-8 Aug 17 13:22:57 hayes-01 lvm[70059]: Monitoring RAID device centipede2-takeover for events. Aug 17 13:22:57 hayes-01 lvm[70059]: raid5_n array, centipede2-takeover, is not in-sync. Aug 17 13:22:57 hayes-01 kernel: md: reshape of RAID array mdX Aug 17 13:23:05 hayes-01 kernel: md: mdX: reshape done. Aug 17 13:23:05 hayes-01 lvm[70059]: raid5_n array, centipede2-takeover, is not in-sync. [...] Aug 17 13:23:14 hayes-01 kernel: attempt to access beyond end of device Aug 17 13:23:14 hayes-01 kernel: dm-8: rw=7217, want=2887680, limit=1925120 Aug 17 13:23:14 hayes-01 kernel: XFS (dm-8): metadata I/O error: block 0x2c0f80 ("xlog_iodone") error 5 numblks 128 Aug 17 13:23:14 hayes-01 kernel: XFS (dm-8): xfs_do_force_shutdown(0x2) called from line 1221 of file fs/xfs/xfs_log.c. Return address = 0xffffffffc0665270 Aug 17 13:23:14 hayes-01 kernel: XFS (dm-8): Log I/O Error Detected. Shutting down filesystem Aug 17 13:23:14 hayes-01 kernel: XFS (dm-8): Please umount the filesystem and rectify the problem(s) Originally, I thought this corruption was a bug because it wasn't like other reshape scenarios where you specifically specify a reduction in images from N -> N-1 stripes, but instead lvm does it "automatically". But after re-evaluating, lvm does A. provide a warning about corruption B. require a --force and C. then prompt the user w/ a --yes. So, closing NOTABUG [root@host-086 ~]# lvconvert --type linear centipede2/lvol0 Converting raid5_n LV centipede2/lvol0 to 2 stripes first. Replaced LV type linear with possible type raid5_n. Repeat this command to convert to linear after an interim conversion has finished. WARNING: Removing stripes from active logical volume centipede2/lvol0 will shrink it from 108.00 MiB to 36.00 MiB! THIS MAY DESTROY (PARTS OF) YOUR DATA! Interrupt the conversion and run "lvresize -y -l81 centipede2/lvol0" to keep the current size if not done already! If that leaves the logical volume larger than 81 extents due to stripe rounding, you may want to grow the content afterwards (filesystem etc.) WARNING: to remove freed stripes after the conversion has finished, you have to run "lvconvert --stripes 1 centipede2/lvol0" Can't remove stripes without --force option. Reshape request failed on LV centipede2/lvol0. [root@host-086 ~]# lvconvert --force --type linear centipede2/lvol0 Converting raid5_n LV centipede2/lvol0 to 2 stripes first. Replaced LV type linear with possible type raid5_n. Repeat this command to convert to linear after an interim conversion has finished. WARNING: Removing stripes from active logical volume centipede2/lvol0 will shrink it from 108.00 MiB to 36.00 MiB! THIS MAY DESTROY (PARTS OF) YOUR DATA! Interrupt the conversion and run "lvresize -y -l81 centipede2/lvol0" to keep the current size if not done already! If that leaves the logical volume larger than 81 extents due to stripe rounding, you may want to grow the content afterwards (filesystem etc.) WARNING: to remove freed stripes after the conversion has finished, you have to run "lvconvert --stripes 1 centipede2/lvol0" Are you sure you want to remove 2 images from raid5_n LV centipede2/lvol0? [y/n]: |