Bug 621524 - dangling md-device-map.lock
dangling md-device-map.lock
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: mdadm (Show other bugs)
13
All Linux
low Severity medium
: ---
: ---
Assigned To: Doug Ledford
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-05 07:20 EDT by Michal Schmidt
Modified: 2010-12-07 15:14 EST (History)
1 user (show)

See Also:
Fixed In Version: mdadm-3.1.3-0.git20100804.2.fc13
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-12-07 15:12:46 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Michal Schmidt 2010-08-05 07:20:29 EDT
Description of problem:
After a failed attempt to add a device to an active array /dev/md/md-device-map.lock is left dangling. Later executed mdadm processes will then go into 100% CPU spinning waiting to the lock to free.

A reproducer:

for i in {0..2}; do
        dd if=/dev/zero of=/tmp/testmd$i bs=64K count=2048
        losetup /dev/loop$i /tmp/testmd$i
done
mdadm --create /dev/md/testraid --level=5 --metadata=0.90 --raid-devices=3 /dev/loop{0..2}
# give it no time to resync:
mdadm --stop /dev/md/testraid
# now try incremental reassembling:
for i in {0..2}; do
        mdadm -I /dev/loop$i
        ls -l /dev/md/*.lock
done


Version-Release number of selected component (if applicable):
mdadm-3.1.3-0.git20100722.2.fc13.x86_64

How reproducible:
always

Steps to Reproduce:
1. Run the reproducer script
  
Actual results:
mdadm: array /dev/md/testraid started.
mdadm: stopped /dev/md/testraid
mdadm: /dev/loop0 attached to /dev/md/127, not enough to start (1).
ls: cannot access /dev/md/*.lock: No such file or directory
mdadm: /dev/loop1 attached to /dev/md/127, which has been started.
ls: cannot access /dev/md/*.lock: No such file or directory
mdadm: not adding /dev/loop2 to active array (without --run) /dev/md/127
-rw-------. 1 root root 0 Aug  5 13:15 /dev/md/md-device-map.lock

After the failed attempt to add /dev/loop2 the lock file was left.

Expected results:
The lock file must be gone after mdadm exits.
Comment 1 Doug Ledford 2010-08-05 10:16:06 EDT
This is a known issue fixed in the mdadm-3.1.3-0.git20100804.1 and later builds.  A push of this later build to testing is forthcoming.
Comment 2 Fedora Update System 2010-08-05 10:25:19 EDT
mdadm-3.1.3-0.git20100804.2.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc13
Comment 3 Fedora Update System 2010-08-05 10:25:55 EDT
mdadm-3.1.3-0.git20100804.2.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc12
Comment 4 Fedora Update System 2010-08-05 10:26:34 EDT
mdadm-3.1.3-0.git20100804.2.fc14 has been submitted as an update for Fedora 14.
http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc14
Comment 5 Michal Schmidt 2010-08-05 11:01:22 EDT
With mdadm-3.1.3-0.git20100804.2.fc13 I can still see the file /dev/md/md-device-map.lock is present after the test is over.
But now it does not prevent a follow-up "mdadm -S /dev/md127" from completing successfully (and deleting the lock file afterwards).

Not sure if this is exactly the expected behaviour, but it is usable.
Comment 6 Doug Ledford 2010-08-05 11:21:33 EDT
It is the expected behaviour.  There is nothing we can do about a dangling lock file on an interrupted command (think a segv or similar, the lock file will get left no matter whether we have a signal handler that should clean it up or not as on fatal errors like that the signal handler is never run).  So, subsequent runs must be able to deal with a dangling lock.  The new code does exactly that.
Comment 7 Fedora Update System 2010-08-05 19:29:33 EDT
mdadm-3.1.3-0.git20100804.2.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc12
Comment 8 Fedora Update System 2010-08-05 19:53:05 EDT
mdadm-3.1.3-0.git20100804.2.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc13
Comment 9 Fedora Update System 2010-08-09 21:30:09 EDT
mdadm-3.1.3-0.git20100804.2.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc14
Comment 10 Mike Gahagan 2010-11-22 16:55:33 EST
Anaconda when installing from a USB live image seems to trigger this behavior when looking for storage devices. When installing to a system which has a 3 disk RAID 5 array (0.90 MD on disk format), I had to kill the mdadm process manually to get the installer to continue (the raid array contains data only so it isn't needed for booting at all).


I updated to mdadm-3.1.3-0.git20100804.2.fc14 post-install and it seemed to fix all issues I had post-install (mostly related to bz 650803 I believe).
Comment 11 Fedora Update System 2010-12-07 15:12:04 EST
mdadm-3.1.3-0.git20100804.2.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 12 Fedora Update System 2010-12-07 15:14:01 EST
mdadm-3.1.3-0.git20100804.2.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.