Bug 1268955 - Mdadm -Ss (from CLI) race with mdadm -I (triggered from udev rules)
Mdadm -Ss (from CLI) race with mdadm -I (triggered from udev rules)
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: mdadm (Show other bugs)
22
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jes Sorensen
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-05 13:51 EDT by Jes Sorensen
Modified: 2015-10-26 09:24 EDT (History)
6 users (show)

See Also:
Fixed In Version: mdadm-3.3.4-2.fc22
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-26 09:24:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jes Sorensen 2015-10-05 13:51:42 EDT
Description of problem:
Stopping volumes and containers is not always completed successfully and can lead to various of very unwanted behavior of MD.

How reproducible:
Almost Always (more likely with faster machines)

Steps to Reproduce:
1. Create IMSM RAID
#mdadm -CR /dev/md/imsm0 -e imsm -n4 /dev/sd[a-d]
#mdadm -CR /dev/md/vol0 -l5 -n4 /dev/sd[a-d]
2. Try to stop all volumens
#mdadm -Ss

Actual results:
"mdadm -Ss" hangs (prompt not returned), two mdadm processes run:
#ps -aux | grep mdadm
root      76003  1.2  0.0   7052   832 pts/0    S+   05:18   0:00 mdadm -Ss
root      76005  1.2  0.0   7052  1052 ?        S    05:18   0:00 /sbin/mdadm -I /dev/md127

#cat /proc/mdstat
Personalities : [raid10] [raid6] [raid5] [raid4]
md127 : inactive sda[3](S) sdb[2](S) sdc[1](S) sdd[0](S)
      12612 blocks super external:imsm

or - in worse scenario - volume without container is assembled in result of this race, container not exists in system (no mdadm process in background):

#cat /proc/mdstat
Personalities : [raid10] [raid6] [raid5] [raid4]
md126 : active raid5 sda[3] sdb[2] sdc[1] sdd[0]
      3145728 blocks super external:/md127/0 level 5, 128k chunk, algorithm 0 [4/4] [UUUU]

After killing hanged mdadms, and started "mdadm -Ss" once more volume is stopped, but after re-assemble - volume is very often failed/degraded/broken. 

Expected results:
All volumes and containers are stopped successfully.
#cat /proc/mdstat
Personalities : [raid10] [raid6] [raid5] [raid4]
unused devices: <none>

All of above problems are caused by part of 65-md-incremental.rules file:
---
# In case the initramfs only started some of the arrays in our container,
# run incremental assembly on the container itself.  Note: we ran mdadm
# on the container in 64-md-raid.rules, and that's how the MD_LEVEL
# environment variable is already set.  If that disappears from the other
# file, we will need to add this line into the middle of the next rule:
#       IMPORT{program}="/sbin/mdadm -D --export $tempnode", \

SUBSYSTEM=="block", ACTION=="add|change", KERNEL=="md*", \
        ENV{MD_LEVEL}=="container", RUN+="/sbin/mdadm -I $env{DEVNAME}"

When stopping all volumes and containers, "change" event is generated for some reason. This event triggers "mdadm -I /dev/md{container}" while it (the container) is during stop procedure and blocks it, or assemble the volume.

This part of udev-rule should be removed.
Comment 1 Fedora Update System 2015-10-05 14:18:40 EDT
mdadm-3.3.4-2.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-b58f6008ee
Comment 2 Fedora Update System 2015-10-07 12:26:44 EDT
mdadm-3.3.4-2.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update mdadm'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-b58f6008ee
Comment 3 Fedora Update System 2015-10-26 09:24:07 EDT
mdadm-3.3.4-2.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.