Red Hat Bugzilla – Bug 107988
mdadm --monitor rejects Event: line from /proc/mdstat
Last modified: 2007-11-30 17:06:59 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031023
Description of problem:
mdadm fails to ignore the Event: line present in the kernel's /proc/mdstat
output, that looks like this:
Personalities : [raid1]
read_ahead 1024 sectors
md11 : active raid1 [dev 08:01] hdc1
The Event: line seems to be a Red Hat Enterprise Linux-specific line, because I
can't see it in other kernels on other OSs such as Red Hat Linux or Fedora Core.
If raid monitoring is enabled, this line will be printed to the tty in which the
mdmonitor service was started every few minutes.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.service start mdmonitor
Actual Results: Starting mdmonitor: mdadm: bad /proc/mdstat line starts: Event:
Expected Results: This line should probably be ignored, although it could be
used to skip re-checking if it hasn't changed since the last check.
Seeing the same problem which makes monitoring less than helpful. I
just checked the mdadm-1.4.0 code and do not note that there is
anything there to handle this Event line.
Also, /usr/sbin/handle-mdadm-events is shown in the example
mdadm.conf, but there is no such program.
The Combination of kernel-2.4.21-6.EL and mdadm-1.4.0-1 as found
in the RHEL3-Update-Beta1 channel still seems to exhibit this bug.
I am particularly worried that this has escaped notice somewhere
because the mdadm-1.4.0 build date is 17Nov03, roughly 3 weeks after
this bug has been filed. This may slip RH9 QA for this problem
happens only when running on the enterprise kernel (standard Red Hat
Kernels do not have the Event: line)
I would suggest SEVERITY should be set to HIGH because mdmonitor
does NOT WORK AT ALL due to this problem.
The problem still persists with kernel-2.4.21-9.EL and mdadm-1.4.0-1
from Quarterly Update #1.
Slight correction to Mario Lorenz: mdadm --monitor *does* work in
spite of this warning. Basically, this is an annoying cosmetic bug,
it does *not* keep things from working. Case in point:
[root@dledford root]# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md1 : active raid5 sdf1 sde1 sdg1 sdd1 sdc1
1638144 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]
md0 : active raid5 sde2 sdf2 sdg2 sdd2 sdc2 sdb2 sda2
104196864 blocks level 5, 64k chunk, algorithm 0 [7/6] [UUUUU_U]
[===>.................] recovery = 17.0% (2966244/17366144)
unused devices: <none>
Notice the mdadm rpm version and the presence of the Event line in the
/proc/mdstat file. Here's the email I got from mdadm this morning:
From: mdadm monitoring <root@dledford>
Subject: Fail event on /dev/md0:dledford
Date: Mon, 23 Feb 2004 05:47:57 -0500
This is an automatically generated mail message from mdadm
running on dledford
A Fail event had been detected on md device /dev/md0.
It could be related to component device /dev/sde2.
Faithfully yours, etc.
So, just to set everyone at ease, this is *not* a functional problem,
just cosmetic, so priority doesn't need to be HIGH. As far as the
Event line is concerned, that may only be in Red Hat kernels in the
2.4 kernel series, but it's also in 2.6 kernels. It was an upstream
change that came from the md code maintainer.
In any case, mdadm-1.5.0-1 (which solves the Event issue) has been
built. I'll submit it for possible inclusion in the next update. If
it doesn't go through, then I'll make it available elsewhere.
In case anybody interested I made own build of version 1.5.0. This
build is based on RHEL3 errata (mdadm-1.4.0 with a patch with a fix
for a problem with recovery thread sleeping in mdmpd):
The release number is zero to allow regular update of this package by
RH's mdadm-1.5.0-1 (their next possible update through RHN).
I stand corrected. Yes, it does indeed work, provided mdadm.conf
has the correct devices in there, and not some /dev/loop's I used for
some earlier tests....
Hmm, my issue is different then. mdadm fails in cases where
mdadm.conf is not necessary. It works on RH9 and FC1, but not RHEL3.
An errata has been issued which should help the problem described in this bug report.
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen
this bug report if the solution does not work for you.