107988 – mdadm --monitor rejects Event: line from /proc/mdstat

Bug 107988 - mdadm --monitor rejects Event: line from /proc/mdstat

Summary: mdadm --monitor rejects Event: line from /proc/mdstat

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	mdadm
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	107563
Blocks:
TreeView+	depends on / blocked

Reported:	2003-10-25 18:06 UTC by Alexandre Oliva
Modified:	2007-11-30 22:06 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-05-12 02:40:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:201	0	normal	SHIPPED_LIVE	Updated mdadm package includes mdadm --monitor fix	2004-05-11 04:00:00 UTC

Description Alexandre Oliva 2003-10-25 18:06:00 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031023

Description of problem:
mdadm fails to ignore the Event: line present in the kernel's /proc/mdstat
output, that looks like this:

Personalities : [raid1]
read_ahead 1024 sectors
Event: 10
md11 : active raid1 [dev 08:01][2] hdc1[0]

The Event: line seems to be a Red Hat Enterprise Linux-specific line, because I
can't see it in other kernels on other OSs such as Red Hat Linux or Fedora Core.

If raid monitoring is enabled, this line will be printed to the tty in which the
mdmonitor  service was started every few minutes.

Version-Release number of selected component (if applicable):
mdadm-1.0.1-1 kernel-2.4.21-4.EL

How reproducible:
Always

Steps to Reproduce:
1.service start mdmonitor

Actual Results:  Starting mdmonitor: mdadm: bad /proc/mdstat line starts: Event:

Expected Results:  This line should probably be ignored, although it could be
used to skip re-checking if it hasn't changed since the last check.

Additional info:

Comment 1 Larry Fahnoe 2003-11-19 18:54:04 UTC

Seeing the same problem which makes monitoring less than helpful.  I
just checked the mdadm-1.4.0 code and do not note that there is
anything there to handle this Event line.

Also, /usr/sbin/handle-mdadm-events is shown in the example
mdadm.conf, but there is no such program.

--Larry

Comment 2 Mario Lorenz 2004-01-08 10:01:52 UTC

The Combination of kernel-2.4.21-6.EL and mdadm-1.4.0-1 as found
in the RHEL3-Update-Beta1 channel still seems to exhibit this bug.

I am particularly worried that this has escaped notice somewhere
because the mdadm-1.4.0 build date is 17Nov03, roughly 3 weeks after
this bug has been filed. This may slip RH9 QA for this problem 
happens only when running on the enterprise kernel (standard Red Hat
Kernels do not have the Event: line)

I would suggest SEVERITY should be set to HIGH because mdmonitor
does NOT WORK AT ALL due to this problem.

Comment 3 Bernd Bartmann 2004-01-18 17:26:07 UTC

The problem still persists with kernel-2.4.21-9.EL and mdadm-1.4.0-1
from Quarterly Update #1.

Comment 7 Doug Ledford 2004-02-23 14:13:15 UTC

Slight correction to Mario Lorenz: mdadm --monitor *does* work in
spite of this warning.  Basically, this is an annoying cosmetic bug,
it does *not* keep things from working.  Case in point:

[root@dledford root]# cat /proc/mdstat 
Personalities : [raid5] 
read_ahead 1024 sectors
Event: 52                  
md1 : active raid5 sdf1[2] sde1[3] sdg1[1] sdd1[4] sdc1[0]
      1638144 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]
      
md0 : active raid5 sde2[7] sdf2[4] sdg2[3] sdd2[6] sdc2[2] sdb2[1] sda2[0]
      104196864 blocks level 5, 64k chunk, algorithm 0 [7/6] [UUUUU_U]
      [===>.................]  recovery = 17.0% (2966244/17366144)
finish=119.2min speed=2011K/sec
unused devices: <none>
[root@dledford root]# 

Notice the mdadm rpm version and the presence of the Event line in the
/proc/mdstat file.  Here's the email I got from mdadm this morning:

From: 	mdadm monitoring <root@dledford>
To: 	dledford
Subject: 	Fail event on /dev/md0:dledford
Date: 	Mon, 23 Feb 2004 05:47:57 -0500

This is an automatically generated mail message from mdadm
running on dledford

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sde2.

Faithfully yours, etc.


So, just to set everyone at ease, this is *not* a functional problem,
just cosmetic, so priority doesn't need to be HIGH.  As far as the
Event line is concerned, that may only be in Red Hat kernels in the
2.4 kernel series, but it's also in 2.6 kernels.  It was an upstream
change that came from the md code maintainer.

In any case, mdadm-1.5.0-1 (which solves the Event issue) has been
built.  I'll submit it for possible inclusion in the next update.  If
it doesn't go through, then I'll make it available elsewhere.

Comment 8 Milan Kerslager 2004-02-23 15:45:04 UTC

In case anybody interested I made own build of version 1.5.0. This
build is based on RHEL3 errata (mdadm-1.4.0 with a patch with a fix
for a problem with recovery thread sleeping in mdmpd):

ftp://ftp.vslib.cz/pub/local/milan.kerslager/RHEL-3/RPMS/

The release number is zero to allow regular update of this package by
RH's mdadm-1.5.0-1 (their next possible update through RHN).

Comment 9 Mario Lorenz 2004-02-24 09:09:37 UTC

I stand corrected. Yes, it does indeed work, provided mdadm.conf
has the correct devices in there, and not some /dev/loop's I used for
some earlier tests....

Comment 10 Warren Togami 2004-02-24 10:14:47 UTC

Hmm, my issue is different then.  mdadm fails in cases where
mdadm.conf is not necessary.  It works on RH9 and FC1, but not RHEL3.
 Hmm...

Comment 13 John Flanagan 2004-05-12 02:40:49 UTC

An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-201.html

Note You need to log in before you can comment on or make changes to this bug.