616597 – Hot-plugable RAID components sometimes not assembled correctly

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 616597 - Hot-plugable RAID components sometimes not assembled correctly

Summary: Hot-plugable RAID components sometimes not assembled correctly

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	mdadm
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Doug Ledford
QA Contact:	Yulia Kopkova
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	622922 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-07-20 21:31 UTC by Doug Ledford
Modified:	2018-10-27 13:15 UTC (History)
CC List:	6 users (show)
Fixed In Version:	mdadm-3.1.3-0.git20100722.1
Doc Type:	Bug Fix
Doc Text:
Clone Of:	616596
Environment:
Last Closed:	2010-11-10 21:09:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Doug Ledford 2010-07-20 21:31:41 UTC

+++ This bug was initially created as a clone of Bug #616596 +++

+++ This bug was initially created as a clone of Bug #600900 +++

Description of problem:
I've a bunch of RAID-6 volumes, composed by different partitions of 10 USB disks.

These disks are connect together using some USB HUBs, so there is a single USB cable which can be connect to the PC.

Once the cable is connected the 10 disks are initialized and incremental assembly of the different RAIDs starts.

There seem to be some sort of race condition, so that sometimes, partitions belonging to the same RAID volume, are assembled as different RAID volumes, leading to two incomplete volumes and related mess.

So, for example, the partitions /dev/sd[d-m]1 form one RAID-6 volume.

The incremental assemble, sometimes, creates two RAIDs, /dev/md127 and /dev/md126, composed, for example, one with /dev/sdd1 and the second with /dev/sd[e-m]1, which, of course, does not work.

Version-Release number of selected component (if applicable):
mdadm-3.1.2-10.fc13.x86_64

How reproducible:
Sometimes, usually after the first hot-plug, it seems to work better.

Steps to Reproduce:
1.
Hot-plug the HDDs.
2.
Wait the mess to finish
3.
Check /proc/mdstat

Actual results:
Sometimes more, not working, RAID volumes are assemble than expected

Expected results:
The incremental assembly should work properly

Additional info:
The different volumes are PVs of an LVM VG, which means also LVM runs on hot-plug.

Hope this helps,

bye

pg

--- Additional comment from rmy on 2010-06-17 15:21:23 EDT ---

I'm seeing something similar, but with ATA devices not USB. I have ten partitions across three ATA drives that are combined into five RAID1 volumes. Here's what they look like in F12:

Personalities : [raid1] 
md123 : active raid1 sda5[0] sdb5[1]
      31463168 blocks [2/2] [UU]
      
md124 : active raid1 sda6[0] sdb6[1]
      30033408 blocks [2/2] [UU]
      
md125 : active raid1 sda7[0] sdb7[1]
      30796480 blocks [2/2] [UU]
      
md126 : active raid1 sda9[0] sdc10[1]
      41953600 blocks [2/2] [UU]
      
md127 : active raid1 sdc12[0] sda8[1]
      30788416 blocks [2/2] [UU]
      
unused devices: <none>

I installed F13 onto the partitions that used to hold F11 but, as is my custom, didn't tell anaconda what to do with the RAID volumes. Later I added them to fstab in F13. Initially there were problems that I took to be due to the line 'AUTO +imsm +1.x -all' that anaconda had put in mdadm.conf. All my RAID partitions have 0.9 metadata. I commented out the AUTO line and put in ARRAY lines specifying the UUIDs of the RAID devices.

Now I find that more often than not F13 fails to correctly assemble the arrays. F12 always succeeds. In six boots of F13 the arrays were only properly built once. The failures are all different.  Here's one example:

Personalities : [raid1] 
md127 : active (auto-read-only) raid1 sda8[1]
      30788416 blocks [2/1] [_U]
      
md125 : active raid1 sda7[0] sdb7[1]
      30796480 blocks [2/2] [UU]
      
md123 : inactive sda5[0](S)
      31463168 blocks
       
md126 : active (auto-read-only) raid1 sdc10[1]
      41953600 blocks [2/1] [_U]
      
md124 : active raid1 sdb6[1] sda6[0]
      30033408 blocks [2/2] [UU]
      
unused devices: <none>

I'll attach some more information in case anyone can see a pattern in this. I certainly can't.

--- Additional comment from rmy on 2010-06-17 15:22:42 EDT ---

Created an attachment (id=424916)
dmesg and mdstat from six boots of F13

--- Additional comment from rmy on 2010-06-20 12:22:23 EDT ---

I seem to have got this working more reliably by removing rd_NO_MD from the kernel line in grub.conf.  At least, I've been able to boot Fedora 13 five times now and the RAID arrays have been assembled correctly every time.

Without rd_NO_MD the arrays are assembled earlier in the boot process, though I don't know why that would make any difference.

--- Additional comment from dledford on 2010-07-20 17:29:19 EDT ---

@Ron: The difference you are seeing is that earlier in the boot process udev is likely processing disk add events sequentially instead of in parallel.  Evidently there is a race condition when devices are added in parallel.

OK, after some code inspection, I've found the race.  Specifically, if two devices belonging to the same array are assembled in parallel, then if the array is not yet listed in the md-device-map file, each parallel tries to open a lock file, then attempts an exclusive lock on the lock file.  One process gets it, the other waits.  The process that got the lock then adds the array and calls map_update to write out the new map entry.  Finally it calls unlock on the existing file, then it unlinks the lock file.  The problem here is that if another instance was already waiting on the lock, it doesn't care that the file was unlinked and gets a new lock on an unlinked file, while a totally different instance of mdadm creates a new lock file and locks the new lock file, resulting in two instances having exclusive locks on two different lock files and being allowed to actually run in parallel, resulting in this problem.

My solution is to change the locking mechanism to pass the flags O_CREAT and O_EXCL to the open command, which will fail the open if it is not the process that created the file.  For as long as we fail due to the file already existing, we keep trying to open the file.  Once we manage to create the file, then we already have a lock and are free to run.  The fix for this will be in the next mdadm update (mdadm-3.1.3-0.git07202010.1 or later).

Comment 2 Peter Martuccelli 2010-08-13 15:38:19 UTC

*** Bug 622922 has been marked as a duplicate of this bug. ***

Comment 4 releng-rhel@redhat.com 2010-11-10 21:09:45 UTC

Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.