816527 – after check md2 mdmonitor enters "failed state"

Bug 816527 - after check md2 mdmonitor enters "failed state"

Summary: after check md2 mdmonitor enters "failed state"

Keywords:
Status:	CLOSED DUPLICATE of bug 817023
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mdadm
Sub Component:
Version:	16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jes Sorensen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-04-26 10:35 UTC by Harald Reindl
Modified:	2012-05-04 13:23 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-05-04 13:16:09 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Harald Reindl 2012-04-26 10:35:09 UTC

the following happens since months each weekly raid-check
the only reason why "mdmonitor" doe snot stop completly is my modifed unit

[Unit]
Description=Software RAID monitoring and management
After=syslog.target
ConditionPathExists=/etc/mdadm.conf
[Service]
Type=forking
PIDFile=/var/run/mdadm/mdadm.pid
EnvironmentFile=-/etc/sysconfig/mdmonitor
ExecStart=/sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid
Restart=always
RestartSec=1
[Install]
WantedBy=multi-user.target
__________________

Apr 26 12:17:28 srv-rhsoft kernel: md: md2: data-check done.
Apr 26 12:17:28 srv-rhsoft systemd[1]: mdmonitor.service: main process exited, code=killed, status=6
Apr 26 12:17:29 srv-rhsoft systemd[1]: mdmonitor.service holdoff time over, scheduling restart.
Apr 26 12:17:29 srv-rhsoft systemd[1]: Unit mdmonitor.service entered failed state.
__________________

[root@srv-rhsoft:~]$ cat /proc/mdstat 
Personalities : [raid10] [raid1] 
md0 : active raid1 sdd1[3] sda1[4] sdc1[0] sdb1[5]
      511988 blocks super 1.0 [4/4] [UUUU]
      
md1 : active raid10 sda2[4] sdd2[3] sdb2[5] sdc2[0]
      30716928 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid10 sda3[4] sdd3[3] sdb3[5] sdc3[0]
      3875222528 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 2/29 pages [8KB], 65536KB chunk

unused devices: <none>
[root@srv-rhsoft:~]$ df
Dateisystem    Typ      Größe Benutzt Verf. Verw% Eingehängt auf
/dev/md1       ext4       30G    6,8G   23G   24% /
/dev/md0       ext4      497M     48M  450M   10% /boot
/dev/md2       ext4      3,7T    1,4T  2,3T   39% /mnt/data

Comment 1 Jes Sorensen 2012-05-04 13:16:09 UTC

Hi,

This sounds like the problem fixed in
https://bugzilla.redhat.com/show_bug.cgi?id=817023

817023 is against Fedora 17, but I pushed the same update out for Fedora 16.
Please try mdadm-3.2.3-9 currently available in updates-testing:

https://admin.fedoraproject.org/updates/FEDORA-2012-7145/mdadm-3.2.3-9.fc16

Thanks,
Jes

*** This bug has been marked as a duplicate of bug 817023 ***

Comment 2 Harald Reindl 2012-05-04 13:19:55 UTC

confirmed fix with mdadm-3.2.3-9.fc16.x86_64 
i saw the koji-build some days ago and Thursday/Friday 
are my "raid-check-days" on 3 machines, 2 of them no
problem this time, one still running

Comment 3 Jes Sorensen 2012-05-04 13:23:49 UTC

Excellent!

If you would leave karma on the fedora page once you've confirmed the third
one completed ok, that would be great.

Thanks,
Jes

Note You need to log in before you can comment on or make changes to this bug.