Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1818931

Summary:	mdadm reshape from RAID5 to RAID6 hangs [rhel-7]
Product:	Red Hat Enterprise Linux 7	Reporter:	Alicja Kario <hkario>
Component:	mdadm	Assignee:	Nigel Croxon <ncroxon>
Status:	CLOSED EOL	QA Contact:	Fine Fan <ffan>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	7.8	CC:	agk, dledford, dmilburn, extras-qa, heinzm, jes.sorensen, ncroxon, storage-qe, xni
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1818914	Environment:
Last Closed:	2020-09-23 14:06:47 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1818912, 1818914
Bug Blocks:

Description Alicja Kario 2020-03-30 18:02:10 UTC

Description of problem:
Reshaping a 3-disk RAID5 to 4-disk RAID6 hangs, restore from critical section impossible

Version-Release number of selected component (if applicable):
mdadm-4.1-4.el7.x86_64
kernel-3.10.0-1127.el7.x86_64

How reproducible:
always

Steps to Reproduce:
truncate -s 1G disk1
truncate -s 1G disk2
truncate -s 1G disk3
truncate -s 1G disk4
DEVS=($(losetup --find --show disk1))
DEVS+=($(losetup --find --show disk2))
DEVS+=($(losetup --find --show disk3))
ADD=$(losetup --find --show disk4)
mdadm --create /dev/md0 --level=5 --raid-devices=3 "${DEVS[@]}"
mdadm --wait /dev/md0
mdadm /dev/md0 --add "$ADD"
mdadm --grow /dev/md0 --level=6 --raid-devices=4 --backup-file=mdadm.backup

Actual results:
hanged at the beginning of of migration:

# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
      [>....................]  reshape =  0.0% (1/1046528) finish=0.0min speed=261632K/sec
      
unused devices: <none>


Expected results:
a RAID6 array with previously existing data

Additional info:
mdadm --stop /dev/md0
mdadm --assemble /dev/md0 "${DEVS[@]}" $ADD --backup-file=mdadm.backup

mdadm: Failed to restore critical section for reshape, sorry.

Comment 2 Nigel Croxon 2020-04-13 18:51:55 UTC

Test version of mdadm:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=27903664

Comment 3 Alicja Kario 2020-04-14 11:11:24 UTC

mdadm-4.1-njc.el7_8.src.rpm seems to have helped:

Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>

Comment 4 Nigel Croxon 2020-04-29 17:48:30 UTC

Issue #1
In the how to reproduce steps... you must specify a full path on the --backup_file=  (--backup_file=/home/hkario/mdadm.spec).

Issue #2 That I am encountering is:
Apr 29 13:38:38 localhost setroubleshoot[3993]: SELinux is preventing /usr/sbin/mdadm from 'read, write' accesses on the file mdadm.backup. For complete SELinux messages run: sealert -l dbf13fb1-be67-46e0-b49a-4abfa57856a0
Apr 29 13:38:38 localhost platform-python[3993]: SELinux is preventing /usr/sbin/mdadm from 'read, write' accesses on the file mdadm.backup.#012#012*****  Plugin catchall (100. confidence) suggests   **************************#012#012If you believe that mdadm should be allowed read write access on the mdadm.backup file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm#012# semodule -X 300 -i my-mdadm.pp#012

I executed:
ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm
semodule -i my-mdadm.pp
and then follow the reproduce steps.

And the array synchronized.

Comment 5 Nigel Croxon 2020-04-29 18:15:31 UTC

I see no code changes needed to kernel MD nor userspace mdadm.

Steps to Reproduce:   (except change the backup sub-dir to your own)

truncate -s 1G disk1
truncate -s 1G disk2
truncate -s 1G disk3
truncate -s 1G disk4
DEVS=($(losetup --find --show disk1))
DEVS+=($(losetup --find --show disk2))
DEVS+=($(losetup --find --show disk3))
ADD=$(losetup --find --show disk4)
ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm
semodule -i my-mdadm.pp
mdadm --create /dev/md0 --level=5 --raid-devices=3 "${DEVS[@]}"
mdadm --wait /dev/md0
mdadm /dev/md0 --add "$ADD"
mdadm --grow /dev/md0 --level=6 --raid-devices=4 --backup-file=/home/ncroxon/mdadm.backup

Comment 6 Nigel Croxon 2020-04-29 18:48:03 UTC

turning off selinux appears to work also
# setenforce 0

Comment 7 Alicja Kario 2020-04-30 10:49:45 UTC

(In reply to Nigel Croxon from comment #4)
> Issue #1
> In the how to reproduce steps... you must specify a full path on the
> --backup_file=  (--backup_file=/home/hkario/mdadm.spec).

then why it worked with the mdadm-4.1-njc.el7_8 package?
passing full path name doesn't change behaviour either, still hangs on first block
 
> Issue #2 That I am encountering is:
> Apr 29 13:38:38 localhost setroubleshoot[3993]: SELinux is preventing
> /usr/sbin/mdadm from 'read, write' accesses on the file mdadm.backup. For
> complete SELinux messages run: sealert -l
> dbf13fb1-be67-46e0-b49a-4abfa57856a0
> Apr 29 13:38:38 localhost platform-python[3993]: SELinux is preventing
> /usr/sbin/mdadm from 'read, write' accesses on the file
> mdadm.backup.#012#012*****  Plugin catchall (100. confidence) suggests  
> **************************#012#012If you believe that mdadm should be
> allowed read write access on the mdadm.backup file by default.#012Then you
> should report this as a bug.#012You can generate a local policy module to
> allow this access.#012Do#012allow this access for now by executing:#012#
> ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm#012# semodule -X 300 -i
> my-mdadm.pp#012

then it looks to me like mdadm should verify the read/write access to the file before start of the reshape/grow

also, I don't see any AVC denials:

[root@ci-vm-10-0-138-143 ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
      [>....................]  reshape =  0.0% (1/1046528) finish=3.0min speed=5508K/sec
      
unused devices: <none>

[root@ci-vm-10-0-138-143 ~]# ausearch -m avc -ts today
<no matches>
 
> I executed:
> ausearch -c 'mdadm' --raw | audit2allow -M my-mdadm
> semodule -i my-mdadm.pp
> and then follow the reproduce steps.
> 
> And the array synchronized.

> turning off selinux appears to work also
> # setenforce 0

and exactly consistent with the the lack of AVC denials, disabling selinux doesn't change anything, the process still hangs on first block with the mdadm-4.1-5.el7.x86_64 package

Comment 8 Nigel Croxon 2020-04-30 11:13:31 UTC

mdadm-4.1-njc.el7_8 package is a test package that I made. It does not contain a proper fix.


If SElinux is not enabled, they mdadm should have a clear path to write/read the the FS.

Download the latest mdadm and retry:
http://download-node-02.eng.bos.redhat.com/nightly/RHEL-8.3.0-20200430.n.0/compose/BaseOS/x86_64/os/Packages/mdadm-4.1-13.el8.x86_64.rpm

Comment 9 Alicja Kario 2020-04-30 11:30:23 UTC

(In reply to Nigel Croxon from comment #8)
> mdadm-4.1-njc.el7_8 package is a test package that I made. It does not
> contain a proper fix.
> 
> 
> If SElinux is not enabled, they mdadm should have a clear path to write/read
> the the FS.
> 
> Download the latest mdadm and retry:
> http://download-node-02.eng.bos.redhat.com/nightly/RHEL-8.3.0-20200430.n.0/
> compose/BaseOS/x86_64/os/Packages/mdadm-4.1-13.el8.x86_64.rpm

[root@ci-vm-10-0-139-17 ~]# rpm -ivh mdadm-4.1-13.el8.x86_64.rpm 
warning: mdadm-4.1-13.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
error: Failed dependencies:
        libc.so.6(GLIBC_2.27)(64bit) is needed by mdadm-4.1-13.el8.x86_64
        libc.so.6(GLIBC_2.28)(64bit) is needed by mdadm-4.1-13.el8.x86_64
        libreport-filesystem is needed by mdadm-4.1-13.el8.x86_64
        dracut < 034-1 conflicts with mdadm-4.1-13.el8.x86_64

Comment 10 Nigel Croxon 2020-05-01 11:55:38 UTC

A RHEL7.X version for testing.

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=28318976

Comment 11 Alicja Kario 2020-05-04 14:23:49 UTC

yes, kernel-3.10.0-1136.el7.x86_64 with mdadm-4.1-njc2.el7.x86_64 works as expected:

# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>

Comment 12 Nigel Croxon 2020-05-05 19:43:22 UTC

https://www.spinics.net/lists/raid/msg64347.html

Comment 13 Nigel Croxon 2020-07-01 13:23:46 UTC

https://marc.info/?l=linux-raid&m=159195299630680&w=2

Verified the above patch fixes the hang and allows the grow to proceed.