Bug 816527

Summary:	after check md2 mdmonitor enters "failed state"
Product:	[Fedora] Fedora	Reporter:	Harald Reindl <h.reindl>
Component:	mdadm	Assignee:	Jes Sorensen <Jes.Sorensen>
Status:	CLOSED DUPLICATE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	16	CC:	agk, dledford, Jes.Sorensen, mbroz
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-05-04 13:16:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Harald Reindl 2012-04-26 10:35:09 UTC

the following happens since months each weekly raid-check
the only reason why "mdmonitor" doe snot stop completly is my modifed unit

[Unit]
Description=Software RAID monitoring and management
After=syslog.target
ConditionPathExists=/etc/mdadm.conf
[Service]
Type=forking
PIDFile=/var/run/mdadm/mdadm.pid
EnvironmentFile=-/etc/sysconfig/mdmonitor
ExecStart=/sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid
Restart=always
RestartSec=1
[Install]
WantedBy=multi-user.target
__________________

Apr 26 12:17:28 srv-rhsoft kernel: md: md2: data-check done.
Apr 26 12:17:28 srv-rhsoft systemd[1]: mdmonitor.service: main process exited, code=killed, status=6
Apr 26 12:17:29 srv-rhsoft systemd[1]: mdmonitor.service holdoff time over, scheduling restart.
Apr 26 12:17:29 srv-rhsoft systemd[1]: Unit mdmonitor.service entered failed state.
__________________

[root@srv-rhsoft:~]$ cat /proc/mdstat 
Personalities : [raid10] [raid1] 
md0 : active raid1 sdd1[3] sda1[4] sdc1[0] sdb1[5]
      511988 blocks super 1.0 [4/4] [UUUU]
      
md1 : active raid10 sda2[4] sdd2[3] sdb2[5] sdc2[0]
      30716928 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid10 sda3[4] sdd3[3] sdb3[5] sdc3[0]
      3875222528 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 2/29 pages [8KB], 65536KB chunk

unused devices: <none>
[root@srv-rhsoft:~]$ df
Dateisystem    Typ      Größe Benutzt Verf. Verw% Eingehängt auf
/dev/md1       ext4       30G    6,8G   23G   24% /
/dev/md0       ext4      497M     48M  450M   10% /boot
/dev/md2       ext4      3,7T    1,4T  2,3T   39% /mnt/data

Comment 1 Jes Sorensen 2012-05-04 13:16:09 UTC

Hi,

This sounds like the problem fixed in
https://bugzilla.redhat.com/show_bug.cgi?id=817023

817023 is against Fedora 17, but I pushed the same update out for Fedora 16.
Please try mdadm-3.2.3-9 currently available in updates-testing:

https://admin.fedoraproject.org/updates/FEDORA-2012-7145/mdadm-3.2.3-9.fc16

Thanks,
Jes

*** This bug has been marked as a duplicate of bug 817023 ***

Comment 2 Harald Reindl 2012-05-04 13:19:55 UTC

confirmed fix with mdadm-3.2.3-9.fc16.x86_64 
i saw the koji-build some days ago and Thursday/Friday 
are my "raid-check-days" on 3 machines, 2 of them no
problem this time, one still running

Comment 3 Jes Sorensen 2012-05-04 13:23:49 UTC

Excellent!

If you would leave karma on the fedora page once you've confirmed the third
one completed ok, that would be great.

Thanks,
Jes