Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2203859

Summary: mdadm: add support for transient devices
Product: Red Hat Enterprise Linux 9 Reporter: Nigel Croxon <ncroxon>
Component: mdadmAssignee: XiaoNi <xni>
Status: CLOSED ERRATA QA Contact: Fine Fan <ffan>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.3CC: cwei, ffan, ncroxon
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: mdadm-4.2-9.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:54:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nigel Croxon 2023-05-15 13:17:16 UTC
Description of problem:


Currently, MD’s userspace program, mdadm, can detect any I/O failure on a device. When it does, it will set the affected device(s) to faulty. The MD array is then set to a degraded state. It will continue to work if the number of devices left meet the minimum requirement to keep the array alive.

The MD array then requires manual interaction to resolve this situation.

If the device had a temporary failure, for example, connection loss with the storage array. It can be re-integrated with the degraded MD array.
If the device had a permanent failure it would need to be replaced with a spare device.


The motivation for this functionally is to check the box for the customer when we say, Red Hat Enterprise Linux has the ability to automatically add transient devices back into an existing MD array.  


The solutions is to have a udev script that recognizes the newly added block device and see if it was a member of a MD array. If it was, add it back in.

Required Files
/lib/udev/rules.d/66-md-auto-re-add.rules
/sbin/md_raid_auto_readd.sh


Requires MD Array to have a bitmap
	mdadm -CR /dev/md0 -l1 -n2 /dev/sd[ab] –bitmap=internal

Comment 1 Nigel Croxon 2023-05-15 13:21:38 UTC
/sbin/md_raid_auto_readd.sh

#!/usr/bin/bash
MDADM=/sbin/mdadm
DEVNAME=$1

export $(${MDADM} --examine --export ${DEVNAME})
if [ -z "${MD_UUID}" ]; then
     exit 1
fi

UUID_LINK=$(readlink /dev/disk/by-id/md-uuid-${MD_UUID})
MD_DEVNAME=${UUID_LINK##*/}
export $(${MDADM} --detail --export /dev/${MD_DEVNAME})
if [ -z "${MD_METADATA}" ] ; then
     exit 1
fi
 
${MDADM} --manage /dev/${MD_DEVNAME} --re-add ${DEVNAME} --verbose

Comment 2 Nigel Croxon 2023-05-15 13:22:04 UTC
#
# Enable/Disable - default is Disabled
# to disable this rule, GOTO="md_end" should be the first active command.
# to enable this rule, Comment out GOTO="md_end". 
GOTO="md_end"

# Required: MD arrays must have a bitmap for transient devices to
# be added back in the array.
# mdadm -CR /dev/md0 -l1 -n2 /dev/sd[ab] –bitmap=internal

# Don't process any events if anaconda is running as anaconda brings up
# raid devices manually
ENV{ANACONDA}=="?*", GOTO="md_end"

# Also don't process disks that are slated to be a multipath device
ENV{DM_MULTIPATH_DEVICE_PATH}=="1", GOTO="md_end"

# We process add events on block devices (since they are ready as soon as
# they are added to the system), but we must process change events as well
# on any dm devices (like LUKS partitions or LVM logical volumes) and on
# md devices because both of these first get added, then get brought live
# and trigger a change event.  The reason we don't process change events
# on bare hard disks is because if you stop all arrays on a disk, then
# run fdisk on the disk to change the partitions, when fdisk exits it
# triggers a change event, and we want to wait until all the fdisks on
# all member disks are done before we do anything.  Unfortunately, we have
# no way of knowing that, so we just have to let those arrays be brought
# up manually after fdisk has been run on all of the disks.

# First, process all add events (md and dm devices will not really do
# anything here, just regular disks, and this also won't get any imsm
# array members either)

ACTION!="add", GOTO="md_end"
ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="md_end"
SUBSYSTEM=="block", ACTION=="add", RUN{program}+="/sbin/md_raid_auto_readd.sh $devnode" 

#
# Land here to exit cleanly
LABEL="md_end"

Comment 7 errata-xmlrpc 2023-11-07 08:54:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (mdadm bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6651