Bug 1818931 - mdadm reshape from RAID5 to RAID6 hangs [rhel-7]
Summary: mdadm reshape from RAID5 to RAID6 hangs [rhel-7]
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: mdadm
Version: 7.8
Hardware: x86_64
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Nigel Croxon
QA Contact: Fine Fan
URL:
Whiteboard:
Depends On: 1818912 1818914
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-30 18:02 UTC by Hubert Kario
Modified: 2020-09-23 14:06 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1818914
Environment:
Last Closed: 2020-09-23 14:06:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Hubert Kario 2020-03-30 18:02:10 UTC
Description of problem:
Reshaping a 3-disk RAID5 to 4-disk RAID6 hangs, restore from critical section impossible

Version-Release number of selected component (if applicable):
mdadm-4.1-4.el7.x86_64
kernel-3.10.0-1127.el7.x86_64

How reproducible:
always

Steps to Reproduce:
truncate -s 1G disk1
truncate -s 1G disk2
truncate -s 1G disk3
truncate -s 1G disk4
DEVS=($(losetup --find --show disk1))
DEVS+=($(losetup --find --show disk2))
DEVS+=($(losetup --find --show disk3))
ADD=$(losetup --find --show disk4)
mdadm --create /dev/md0 --level=5 --raid-devices=3 "${DEVS[@]}"
mdadm --wait /dev/md0
mdadm /dev/md0 --add "$ADD"
mdadm --grow /dev/md0 --level=6 --raid-devices=4 --backup-file=mdadm.backup

Actual results:
hanged at the beginning of of migration:

# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
      [>....................]  reshape =  0.0% (1/1046528) finish=0.0min speed=261632K/sec
      
unused devices: <none>


Expected results:
a RAID6 array with previously existing data

Additional info:
mdadm --stop /dev/md0
mdadm --assemble /dev/md0 "${DEVS[@]}" $ADD --backup-file=mdadm.backup

mdadm: Failed to restore critical section for reshape, sorry.

Comment 2 Nigel Croxon 2020-04-13 18:51:55 UTC
Test version of mdadm:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=27903664

Comment 3 Hubert Kario 2020-04-14 11:11:24 UTC
mdadm-4.1-njc.el7_8.src.rpm seems to have helped:

Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>

Comment 4 Nigel Croxon 2020-04-29 17:48:30 UTC
Issue #1
In the how to reproduce steps... you must specify a full path on the --backup_file=  (--backup_file=/home/hkario/mdadm.spec).

Issue #2 That I am encountering is:
Apr 29 13:38:38 localhost setroubleshoot[3993]: SELinux is preventing /usr/sbin/mdadm from 'read, write' accesses on the file mdadm.backup. For complete SELinux messages run: sealert -l dbf13fb1-be67-46e0-b49a-4abfa57856a0
Apr 29 13:38:38 localhost platform-python[3993]: SELinux is preventing /usr/sbin/mdadm from 'read, write' accesses on the file mdadm.backup.#012#012*****  Plugin catchall (100. confidence) suggests   **************************#012#012If you believe that mdadm should be allowed read write access on the mdadm.backup file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm#012# semodule -X 300 -i my-mdadm.pp#012

I executed:
ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm
semodule -i my-mdadm.pp
and then follow the reproduce steps.

And the array synchronized.

Comment 5 Nigel Croxon 2020-04-29 18:15:31 UTC
I see no code changes needed to kernel MD nor userspace mdadm.

Steps to Reproduce:   (except change the backup sub-dir to your own)

truncate -s 1G disk1
truncate -s 1G disk2
truncate -s 1G disk3
truncate -s 1G disk4
DEVS=($(losetup --find --show disk1))
DEVS+=($(losetup --find --show disk2))
DEVS+=($(losetup --find --show disk3))
ADD=$(losetup --find --show disk4)
ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm
semodule -i my-mdadm.pp
mdadm --create /dev/md0 --level=5 --raid-devices=3 "${DEVS[@]}"
mdadm --wait /dev/md0
mdadm /dev/md0 --add "$ADD"
mdadm --grow /dev/md0 --level=6 --raid-devices=4 --backup-file=/home/ncroxon/mdadm.backup

Comment 6 Nigel Croxon 2020-04-29 18:48:03 UTC
turning off selinux appears to work also
# setenforce 0

Comment 7 Hubert Kario 2020-04-30 10:49:45 UTC
(In reply to Nigel Croxon from comment #4)
> Issue #1
> In the how to reproduce steps... you must specify a full path on the
> --backup_file=  (--backup_file=/home/hkario/mdadm.spec).

then why it worked with the mdadm-4.1-njc.el7_8 package?
passing full path name doesn't change behaviour either, still hangs on first block
 
> Issue #2 That I am encountering is:
> Apr 29 13:38:38 localhost setroubleshoot[3993]: SELinux is preventing
> /usr/sbin/mdadm from 'read, write' accesses on the file mdadm.backup. For
> complete SELinux messages run: sealert -l
> dbf13fb1-be67-46e0-b49a-4abfa57856a0
> Apr 29 13:38:38 localhost platform-python[3993]: SELinux is preventing
> /usr/sbin/mdadm from 'read, write' accesses on the file
> mdadm.backup.#012#012*****  Plugin catchall (100. confidence) suggests  
> **************************#012#012If you believe that mdadm should be
> allowed read write access on the mdadm.backup file by default.#012Then you
> should report this as a bug.#012You can generate a local policy module to
> allow this access.#012Do#012allow this access for now by executing:#012#
> ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm#012# semodule -X 300 -i
> my-mdadm.pp#012

then it looks to me like mdadm should verify the read/write access to the file before start of the reshape/grow

also, I don't see any AVC denials:

[root@ci-vm-10-0-138-143 ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
      [>....................]  reshape =  0.0% (1/1046528) finish=3.0min speed=5508K/sec
      
unused devices: <none>

[root@ci-vm-10-0-138-143 ~]# ausearch -m avc -ts today
<no matches>
 
> I executed:
> ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm
> semodule -i my-mdadm.pp
> and then follow the reproduce steps.
> 
> And the array synchronized.

> turning off selinux appears to work also
> # setenforce 0

and exactly consistent with the the lack of AVC denials, disabling selinux doesn't change anything, the process still hangs on first block with the mdadm-4.1-5.el7.x86_64 package

Comment 8 Nigel Croxon 2020-04-30 11:13:31 UTC
mdadm-4.1-njc.el7_8 package is a test package that I made. It does not contain a proper fix.


If SElinux is not enabled, they mdadm should have a clear path to write/read the the FS.

Download the latest mdadm and retry:
http://download-node-02.eng.bos.redhat.com/nightly/RHEL-8.3.0-20200430.n.0/compose/BaseOS/x86_64/os/Packages/mdadm-4.1-13.el8.x86_64.rpm

Comment 9 Hubert Kario 2020-04-30 11:30:23 UTC
(In reply to Nigel Croxon from comment #8)
> mdadm-4.1-njc.el7_8 package is a test package that I made. It does not
> contain a proper fix.
> 
> 
> If SElinux is not enabled, they mdadm should have a clear path to write/read
> the the FS.
> 
> Download the latest mdadm and retry:
> http://download-node-02.eng.bos.redhat.com/nightly/RHEL-8.3.0-20200430.n.0/
> compose/BaseOS/x86_64/os/Packages/mdadm-4.1-13.el8.x86_64.rpm

[root@ci-vm-10-0-139-17 ~]# rpm -ivh mdadm-4.1-13.el8.x86_64.rpm 
warning: mdadm-4.1-13.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
error: Failed dependencies:
        libc.so.6(GLIBC_2.27)(64bit) is needed by mdadm-4.1-13.el8.x86_64
        libc.so.6(GLIBC_2.28)(64bit) is needed by mdadm-4.1-13.el8.x86_64
        libreport-filesystem is needed by mdadm-4.1-13.el8.x86_64
        dracut < 034-1 conflicts with mdadm-4.1-13.el8.x86_64

Comment 10 Nigel Croxon 2020-05-01 11:55:38 UTC
A RHEL7.X version for testing.

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=28318976

Comment 11 Hubert Kario 2020-05-04 14:23:49 UTC
yes, kernel-3.10.0-1136.el7.x86_64 with mdadm-4.1-njc2.el7.x86_64 works as expected:

# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>

Comment 12 Nigel Croxon 2020-05-05 19:43:22 UTC
https://www.spinics.net/lists/raid/msg64347.html

Comment 13 Nigel Croxon 2020-07-01 13:23:46 UTC
https://marc.info/?l=linux-raid&m=159195299630680&w=2

Verified the above patch fixes the hang and allows the grow to proceed.


Note You need to log in before you can comment on or make changes to this bug.