Bug 119532

Summary: mdmonitor prevents raidstop or mdadm --stop
Product: [Fedora] Fedora Reporter: Alexandre Oliva <oliva>
Component: mdadmAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: paul.clements
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: mdadm-1.5.0-10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-07-16 17:02:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexandre Oliva 2004-03-31 05:46:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040312

Description of problem:
mdadm --monitor --scan keeps all raid devices open (presumably in
order to catch events?), but this has the ugly side effect that one
has to stop mdmonitor in order to be able to stop a raid device that
it's monitoring.  At least mdadm --stop should somehow tell mdmonitor
to let the raid device go, but it does no such thing.

Version-Release number of selected component (if applicable):
mdadm-1.5.0-3

How reproducible:
Always

Steps to Reproduce:
1.Create a raid device
2.Restart mdmonitor
3.Try to stop the raid device
    

Actual Results:  It's reported as busy

Expected Results:  It should stop

Additional info:

Comment 1 Doug Ledford 2004-03-31 15:49:59 UTC
This will need to be worked upstream.

Comment 2 Doug Ledford 2004-04-22 22:37:32 UTC
*** Bug 121076 has been marked as a duplicate of this bug. ***

Comment 3 Doug Ledford 2004-05-14 15:41:21 UTC
This has been brought to the attention of the upstream maintainer. 
The maintainer is planning an update to the mdadm package in the near
future, and I suspect this will be fixed then.

Comment 4 Doug Ledford 2004-05-22 21:08:30 UTC
The fix for this has been identified.  I added a patch to prevent a
file descriptor leak in the --scan mode of --monitor for mdadm.

[root@test dledford]# ls /proc/4805/fd/ -l
total 0
lr-x------    1 root     root           64 May 22 16:55 0 -> /dev/null
lrwx------    1 root     root           64 May 22 16:55 1 -> /dev/console
lrwx------    1 root     root           64 May 22 16:55 2 -> /dev/console
lr-x------    1 root     root           64 May 22 16:55 3 ->
/etc/mdadm.conf
lr-x------    1 root     root           64 May 22 16:55 4 -> /dev/md0
lr-x------    1 root     root           64 May 22 16:55 5 -> /dev/md2
lr-x------    1 root     root           64 May 22 16:55 6 -> /dev/md1
[root@test dledford]# rpm -q mdadm
mdadm-1.5.0-3
[root@test dledford]# rpm -Uvh /tmp/mdadm-1.5.0-8.i386.rpm 
Preparing...               
########################################### [100%]
   1:mdadm                 
########################################### [100%]
[root@test dledford]# ps axf | grep mdadm
 5802 pts/0    S      0:00                      \_ grep mdadm
 5774 ?        S      0:00 mdadm --monitor --scan -f
[root@test dledford]# ls /proc/5774/fd/ -l
total 0
l---------    1 root     root           64 May 22 16:56 0 -> /dev/null
l---------    1 root     root           64 May 22 16:56 1 -> /dev/null
l---------    1 root     root           64 May 22 16:56 2 -> /dev/null
lr-x------    1 root     root           64 May 22 16:56 3 ->
/etc/mdadm.conf
[root@test dledford]# 


Comment 5 Doug Ledford 2004-05-22 21:21:06 UTC
Correction, the 1.5.0-8 tag was already in use.  I bumped this one to
1.5.0-9.

Comment 6 Brock Organ 2004-06-11 18:02:30 UTC
testing with RHEL3-U2 AS product for ia64 and ppc, this still fails
with mdadm-1.5.0-9:

> # mdadm --stop /dev/md0
> mdadm: fail to stop array /dev/md0: Device or resource busy
> # ps afxw | grep $(fuser /dev/md0 \
>     | awk ' { print $2;}')
>  2880 ?        S      0:00 mdadm --monitor --scan
> #

(it does work properly for i386, x86_64, s390, s390x though ...)


Comment 7 Doug Ledford 2004-06-11 19:26:58 UTC
Please verify that this isn't a transient error (aka, mdadm --monitor
reopens each device once every 15 seconds, checks status, then closes
the device IIRC, so this may have just been luck that it hit this
before closing the file or something).  If it isn't transient, then
can you please attach the output of:

ls -l /proc/<pid_of_mdadm>/fd/

Comment 8 Brock Organ 2004-06-11 21:35:06 UTC
this appears to have been a transient error ... rebooting the machines
and trying the tests again worked without a problem ...

Comment 9 Alexandre Oliva 2004-07-16 17:02:38 UTC
Confirmed fixed in mdadm-1.5.0-10, thanks.

Comment 10 Jay Turner 2004-09-02 02:12:00 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-226.html