Bug 1818914
| Summary: | [rhel 8.0] mdadm reshape from RAID5 to RAID6 hangs [rhel-8] | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Alicja Kario <hkario> | |
| Component: | mdadm | Assignee: | Nigel Croxon <ncroxon> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Storage QE <storage-qe> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 8.3 | CC: | agk, dledford, extras-qa, heinzm, jbrassow, jes.sorensen, Ken.Green, ncroxon, xni | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | 8.0 | |||
| Hardware: | x86_64 | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1818912 | |||
| : | 1818931 (view as bug list) | Environment: | ||
| Last Closed: | 2021-08-31 20:11:33 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1818912 | |||
| Bug Blocks: | 1818931 | |||
Just ran fine on Fedora 31, kernel 5.5.11-200 and mdadm 'v4.1 - 2018-10-01' fine but on disks, not on loop. Can you repreduce on newer Fedora? Same behaviour on Fedora 31:
kernel-5.5.10-200.fc31.x86_64
mdadm-4.1-4.fc31.x86_64
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
2093056 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
[>....................] reshape = 0.0% (1/1046528) finish=0.0min speed=174421K/sec
unused devices: <none>
And on Fedora 32:
kernel-5.6.0-0.rc7.git0.2.fc32.x86_64
mdadm-4.1-4.fc32.x86_64
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
2093056 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
[>....................] reshape = 0.0% (1/1046528) finish=0.0min speed=174421K/sec
unused devices: <none>
Test version of mdadm: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=27903664 That's a rhel-7 package, this is a rhel-8 bug... yes, but it will work on rhel8.. it's a test version of mdadm, it's not going outside RH. worked with kernel-4.18.0-193.5.el8.x86_64 and mdadm-4.1-njc.el7_8.x86_64:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
2093056 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
Moving our conversation to the RHEL8 bz for tracking... Hubert, if you could try this test version on a RHEL8 machine. Give feedback. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=28291821 Thanks Nigel I'm afraid that the scratch build was cleaned up, there are no rpms there any more seems to work fine
mdadm-4.1-njc2.el8.x86_64
kernel-4.18.0-193.13.el8.x86_64
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
2093056 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
https://marc.info/?l=linux-raid&m=159195299630680&w=2 Verified the above patch fixes the hang and allows the grow to proceed. Hubert, if you want to give this mdadm a test: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=29810872 Worked on my test machine.. but just tried 1minutetip and failed. [root@ci-vm-10-0-139-241 ~]# uname -a Linux ci-vm-10-0-139-241.hosted.upshift.rdu2.redhat.com 4.18.0-234.el8.x86_64 #1 SMP Thu Aug 20 10:25:32 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux [root@ci-vm-10-0-139-241 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.4 Beta (Ootpa) Sep 14 14:50:13 ci-vm-10-0-139-241 kernel: md/raid:md0: device loop1 operational as raid disk 1 Sep 14 14:50:13 ci-vm-10-0-139-241 kernel: md/raid:md0: device loop0 operational as raid disk 0 Sep 14 14:50:13 ci-vm-10-0-139-241 kernel: md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2 Sep 14 14:50:13 ci-vm-10-0-139-241 kernel: md0: detected capacity change from 0 to 2143289344 Sep 14 14:50:13 ci-vm-10-0-139-241 kernel: md: recovery of RAID array md0 Sep 14 14:50:24 ci-vm-10-0-139-241 kernel: md: md0: recovery done. Sep 14 14:50:52 ci-vm-10-0-139-241 kernel: md/raid:md0: device loop2 operational as raid disk 2 Sep 14 14:50:52 ci-vm-10-0-139-241 kernel: md/raid:md0: device loop1 operational as raid disk 1 Sep 14 14:50:52 ci-vm-10-0-139-241 kernel: md/raid:md0: device loop0 operational as raid disk 0 Sep 14 14:50:52 ci-vm-10-0-139-241 kernel: md/raid:md0: raid level 6 active with 3 out of 4 devices, algorithm 18 Sep 14 14:50:53 ci-vm-10-0-139-241 kernel: md: reshape of RAID array md0 Sep 14 14:50:53 ci-vm-10-0-139-241 systemd[1]: Started Manage MD Reshape on /dev/md0. Sep 14 14:50:53 ci-vm-10-0-139-241 mdadm[1500]: mdadm: array: Cannot grow - need backup-file Sep 14 14:50:53 ci-vm-10-0-139-241 mdadm[1500]: mdadm: Please provide one with "--backup=..." Sep 14 14:50:53 ci-vm-10-0-139-241 systemd[1]: mdadm-grow-continue: Main process exited, code=exited, status=1/FAILURE Sep 14 14:50:53 ci-vm-10-0-139-241 systemd[1]: mdadm-grow-continue: Failed with result 'exit-code'. This happens on 8.2 (OK, I tested it on CentOS) it's not just reshaping RAID5 to RAID6 that hangs, I also tried this adding a disk into a RAID5 group.
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]# mdadm --create /dev/md0 --level 5 -n 4 /dev/sd[bcde]
mdadm: partition table exists on /dev/sdb
mdadm: partition table exists on /dev/sdb but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sdc
mdadm: partition table exists on /dev/sdc but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sdd
mdadm: partition table exists on /dev/sdd but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sde
mdadm: partition table exists on /dev/sde but will be lost or
meaningless after creating array
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]# mdadm /dev/md0 --add /dev/sd[fg]
mdadm: added /dev/sdf
mdadm: added /dev/sdg
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]# cat /proc/mdstat
Personalities : [raid0] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdg[6](S) sdf[5](S) sde[4] sdd[2] sdc[1] sdb[0]
6282240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]# mdadm --grow /dev/md0 --backup-file=/tmp/md0-backup-raid5 --raid-devices=5
mdadm: Need to backup 6144K of critical section..
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]# cat /proc/mdstat Personalities : [raid0] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdg[6] sdf[5](S) sde[4] sdd[2] sdc[1] sdb[0]
6282240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (1/2094080) finish=0.3min speed=104394K/sec
unused devices: <none>
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]# cat /proc/mdstat
Personalities : [raid0] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdg[6] sdf[5](S) sde[4] sdd[2] sdc[1] sdb[0]
6282240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (1/2094080) finish=0.9min speed=37283K/sec
unused devices: <none>
[root@lg02vmcentos82 ~]#
Then wonder off for a few hours
[root@lg02vmcentos82 ~]# cat /proc/mdstat
Personalities : [raid0] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdg[6] sdf[5](S) sde[4] sdd[2] sdc[1] sdb[0]
6282240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (1/2094080) finish=402.1min speed=86K/sec
unused devices: <none>
[root@lg02vmcentos82 ~]# date
Sun Oct 18 13:27:57 EDT 2020
[root@lg02vmcentos82 ~]# ls -l /run/mdadm/
total 4
lrwxrwxrwx. 1 root root 21 Oct 18 10:00 backup_file-md0 -> /tmp/md0-backup-raid5
-rw-------. 1 root root 53 Oct 18 09:59 map
[root@lg02vmcentos82 ~]#
[root@lg02vmcentos82 ~]# ls -l /tmp/md0-backup-raid5
-rw-------. 1 root root 6295552 Oct 18 10:00 /tmp/md0-backup-raid5
[root@lg02vmcentos82 ~]# cat /proc/mdstat
Personalities : [raid0] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdg[6] sdf[5](S) sde[4] sdd[2] sdc[1] sdb[0]
6282240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (1/2094080) finish=415.9min speed=83K/sec
unused devices: <none>
[root@lg02vmcentos82 ~]#
I get just the same behaviour with converting the RAID5 to RAID6 as noted above.
My setup is running with a KVM based virtual machine.
|
Description of problem: Reshaping a 3-disk RAID5 to 4-disk RAID6 hangs, restore from critical section impossible Version-Release number of selected component (if applicable): mdadm-4.1-13.el8.x86_64 kernel-4.18.0-190.3.el8.x86_64 How reproducible: always Steps to Reproduce: truncate -s 1G disk1 truncate -s 1G disk2 truncate -s 1G disk3 truncate -s 1G disk4 DEVS=($(losetup --find --show disk1)) DEVS+=($(losetup --find --show disk2)) DEVS+=($(losetup --find --show disk3)) ADD=$(losetup --find --show disk4) mdadm --create /dev/md0 --level=5 --raid-devices=3 "${DEVS[@]}" mdadm --wait /dev/md0 mdadm /dev/md0 --add "$ADD" mdadm --grow /dev/md0 --level=6 --raid-devices=4 --backup-file=mdadm.backup Actual results: hanged at the beginning of of migration: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0] 2093056 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_] [>....................] reshape = 0.0% (1/1046528) finish=2.0min speed=8305K/sec unused devices: <none> Expected results: a RAID6 array with previously existing data Additional info: mdadm --stop /dev/md0 mdadm --assemble /dev/md0 "${DEVS[@]}" $ADD --backup-file=mdadm.backup mdadm: Failed to restore critical section for reshape, sorry.