Bug 816527

Summary: after check md2 mdmonitor enters "failed state"
Product: [Fedora] Fedora Reporter: Harald Reindl <h.reindl>
Component: mdadmAssignee: Jes Sorensen <Jes.Sorensen>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: agk, dledford, Jes.Sorensen, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-04 13:16:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harald Reindl 2012-04-26 10:35:09 UTC
the following happens since months each weekly raid-check
the only reason why "mdmonitor" doe snot stop completly is my modifed unit

[Unit]
Description=Software RAID monitoring and management
After=syslog.target
ConditionPathExists=/etc/mdadm.conf
[Service]
Type=forking
PIDFile=/var/run/mdadm/mdadm.pid
EnvironmentFile=-/etc/sysconfig/mdmonitor
ExecStart=/sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid
Restart=always
RestartSec=1
[Install]
WantedBy=multi-user.target
__________________

Apr 26 12:17:28 srv-rhsoft kernel: md: md2: data-check done.
Apr 26 12:17:28 srv-rhsoft systemd[1]: mdmonitor.service: main process exited, code=killed, status=6
Apr 26 12:17:29 srv-rhsoft systemd[1]: mdmonitor.service holdoff time over, scheduling restart.
Apr 26 12:17:29 srv-rhsoft systemd[1]: Unit mdmonitor.service entered failed state.
__________________

[root@srv-rhsoft:~]$ cat /proc/mdstat 
Personalities : [raid10] [raid1] 
md0 : active raid1 sdd1[3] sda1[4] sdc1[0] sdb1[5]
      511988 blocks super 1.0 [4/4] [UUUU]
      
md1 : active raid10 sda2[4] sdd2[3] sdb2[5] sdc2[0]
      30716928 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid10 sda3[4] sdd3[3] sdb3[5] sdc3[0]
      3875222528 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 2/29 pages [8KB], 65536KB chunk

unused devices: <none>
[root@srv-rhsoft:~]$ df
Dateisystem    Typ      Größe Benutzt Verf. Verw% Eingehängt auf
/dev/md1       ext4       30G    6,8G   23G   24% /
/dev/md0       ext4      497M     48M  450M   10% /boot
/dev/md2       ext4      3,7T    1,4T  2,3T   39% /mnt/data

Comment 1 Jes Sorensen 2012-05-04 13:16:09 UTC
Hi,

This sounds like the problem fixed in
https://bugzilla.redhat.com/show_bug.cgi?id=817023

817023 is against Fedora 17, but I pushed the same update out for Fedora 16.
Please try mdadm-3.2.3-9 currently available in updates-testing:

https://admin.fedoraproject.org/updates/FEDORA-2012-7145/mdadm-3.2.3-9.fc16

Thanks,
Jes

*** This bug has been marked as a duplicate of bug 817023 ***

Comment 2 Harald Reindl 2012-05-04 13:19:55 UTC
confirmed fix with mdadm-3.2.3-9.fc16.x86_64 
i saw the koji-build some days ago and Thursday/Friday 
are my "raid-check-days" on 3 machines, 2 of them no
problem this time, one still running

Comment 3 Jes Sorensen 2012-05-04 13:23:49 UTC
Excellent!

If you would leave karma on the fedora page once you've confirmed the third
one completed ok, that would be great.

Thanks,
Jes