433932 – RAID arrays not properly set up intermittantly

Bug 433932 - RAID arrays not properly set up intermittantly

Summary: RAID arrays not properly set up intermittantly

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mdadm
Sub Component:
Version:	rawhide
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F9Blocker
TreeView+	depends on / blocked

Reported:	2008-02-22 07:23 UTC by Bruno Wolff III
Modified:	2008-04-21 21:33 UTC (History)
CC List:	3 users (show)
Fixed In Version:	mdadm-2.6.4-4
Clone Of:
Environment:
Last Closed:	2008-04-21 21:33:53 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Console logs taken during startup (1.05 MB, application/x-bzip2) 2008-04-12 20:17 UTC, Frank Arnold	no flags	Details
Output of dmesg (40.47 KB, text/plain) 2008-04-15 22:59 UTC, Frank Arnold	no flags	Details
More verbose output of dmesg (49.84 KB, text/plain) 2008-04-16 11:44 UTC, Frank Arnold	no flags	Details
Show Obsolete (1) View All

Description Bruno Wolff III 2008-02-22 07:23:06 UTC

Description of problem: Occassionly while doing the file system check the boot process is halted with a complaint about a bad superblock on a raid array. This does not happen consistantly and if I reboot enough, eventually all of the arrays are set up correctly.


Version-Release number of selected component (if applicable):
udev-118-1.fc9.i386

How reproducible:
Roughly 50%.

Steps to Reproduce:
1.reboot
2.
3.
  
Actual results:Boot halts with a complaint about a bad superblock.


Expected results:
Boot runs to completion.

Additional info:The raid arrays are raid 1 using pata drives and the fs is ext3.

Comment 1 Harald Hoyer 2008-02-22 11:47:54 UTC

udev is doing the file system check??

Comment 2 Bill Nottingham 2008-02-22 15:20:12 UTC

Is it a specific component to the raid array that is missing each time?

Comment 3 Bruno Wolff III 2008-02-22 16:39:19 UTC

I am not sure that udev is doing this. There is a pause at the starting udev prompt (sometimes unusually long)
and then when the error happens that gets scrolled off the screen. So it might be 
something that happens shortly afterwards. Also the error message is where the check 
is made. The actual problem may be happening earlier. There are messages about
the various raid arrays being started before the starting udev message.
Different arrays have the problem at different boots. It has happened on two 
different machines, but it happens a lot more at home, since I have to reboot
more often to clear up some problems for hardware (video card and multiple cpu
related) that my old work machine doesn't have.
I haven't seen a message about the swap raid array yet, but that isn't getting fsck'd so it might still be having problems.

Comment 4 Bruno Wolff III 2008-03-30 16:40:27 UTC

This is still happening but as I have been watching it, it has only been
happening on two /boot partitions on one machine (out of three using raid),
since I have been watching more carefully.
It seems to be happening a little less often but that might be just statistical
variation.
I believe those two partitions have the same label as they were both initialized
as /boot partitions (for different installs) and perhaps this has some bearing
on the problem.

Comment 5 Bruno Wolff III 2008-04-06 04:41:19 UTC

I am still seeing this with kernel 2.6.25-0.195.rc8.git1.fc9.i686, udev
udev-120-1.fc9.i386.
This is the mdadm.conf file for both my system and the initrd image:
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md5 super-minor=5
ARRAY /dev/md1 super-minor=1
ARRAY /dev/md0 super-minor=0
ARRAY /dev/md2 super-minor=2
ARRAY /dev/md7 super-minor=7
ARRAY /dev/md3 super-minor=3

Comment 6 Frank Arnold 2008-04-07 04:45:51 UTC

Just a 'me too' here. As a workaround I put a '/sbin/mdadm -A -s' into
/etc/rc.sysinit right before fsck to get at least some other things tested.

Comment 7 Bruno Wolff III 2008-04-08 13:30:41 UTC

I needed to do some testing this morning for another bug and I recorded what
happened with 10 reboots. The part that is relevant here is that in 5 of these
10 reboots raid failed to work properly and in the other 5 it did work properly.
Maybe that statiscal information will be useful.

Comment 8 Bruno Wolff III 2008-04-11 17:52:39 UTC

Is there some more information that I can collect to help track down what is
happening with this? When this fails I have a shell prompt and can do some
poking around. I'd at least be able to find out how not to trigger the condition.
(I have a remotely administered machine runng FC5, that I'd like to upgrade to
F9, but having reboots fail this way would be painful.)

Comment 9 Harald Hoyer 2008-04-11 18:16:19 UTC

I think this bug is in rc.sysinit, not udev.
Reassigning.

Comment 10 Frank Arnold 2008-04-11 22:12:23 UTC

(In reply to comment #8)
> Is there some more information that I can collect to help track down what is
> happening with this?

Ok, let me recap what I've read so far:

* Fedora 8 runs a `/sbin/mdadm -A -s --auto=yes` inside /etc/rc.sysinit
  depending on the existence of /etc/mdadm.conf
* Now notting came up with a udev rule (/etc/rules.d/70-mdadm.rules) that is
  carried with the mdadm package
* This rule is supposed to replace the invocation of mdadm inside
  /etc/rc.sysinit, but it obviously does not work sometimes
* Some discussion that I don't (want to) fully understand happens in bug 429604

Practically, this one could be marked as a duplicate of bug 429604. If it won't
be fixed in time for F9 it should be possible to add the invocation of mdadm
back into rc.sysinit.

notting, harald, please correct me if I'm wrong.

Comment 11 Bill Nottingham 2008-04-12 02:02:40 UTC

The directory stuff in 429604 is only if you partition your arrays. For the
non-partitioned case, I need to get a machine that reproduces it. So far, all my
attempts to reproduce it have failed.

Comment 12 Bill Nottingham 2008-04-12 02:05:12 UTC

Bruno/Frank - are you currently up to date with udev?

Comment 13 Bruno Wolff III 2008-04-12 10:55:35 UTC

The machine that is showing the problems is at udev-120-2.fc9.i386. It looks
like -3 is available now. The comments don't suggest that a change affecting
this bug was made though.

Here is a lisit of /etc/udev/rules.d:
[root@bruno rules.d]# ls
10-console.rules       60-pcmcia.rules                    80-drivers.rules
10-libifp.rules        60-persistent-input.rules          85-pcscd_ccid.rules
40-alsa.rules          60-persistent-storage.rules        85-pcscd_egate.rules
40-multipath.rules     60-persistent-storage-tape.rules   88-clock.rules
40-redhat.rules        60-wacom.rules                     90-alsa.rules
50-udev-default.rules  61-persistent-storage-edd.rules    90-hal.rules
51-vdr.rules           64-device-mapper.rules             91-drm-modeset.rules
52_nut-usbups.rules    64-md-raid.rules                   95-pam-console.rules
60-cdrom_id.rules      70-mdadm.rules                     95-udev-late.rules
60-libmtp.rules        70-persistent-cd.rules             97-bluetooth.rules
60-libnjb.rules        70-persistent-net.rules            99-fuse.rules
60-net.rules           75-cd-aliases-generator.rules      zaptel.rules
60-openct.rules        75-persistent-net-generator.rules

Comment 14 Frank Arnold 2008-04-12 18:09:32 UTC

(In reply to comment #12)
> Bruno/Frank - are you currently up to date with udev? 

I'm updating daily right now. Currently udev-120-3.fc9.x86_64, which fails too.

If it would be of any help I can capture the startup debug output. With
udev_log="info" I'm not able to reproduce it anymore. But with udev_log="debug"
it failed. Tried it only once though because it takes some time.

Comment 15 Frank Arnold 2008-04-12 20:17:09 UTC

Created attachment 302239 [details]
Console logs taken during startup

Anyway, here is what I was able to fetch during startup (.tar.bz2, 1.1M):
udev-debug-startup-failed.log	 37M	Log generated with udev_log=debug
udev-info-startup-succeeded.log  655K	Log generated with udev_log=info

Comment 16 Bill Nottingham 2008-04-14 18:32:10 UTC

Does it persist if you move out some of the non-required rule files?

Comment 17 Bill Nottingham 2008-04-14 20:32:36 UTC

udevsettle[668]: udevsettle: timeout waiting for queue

(from your failed log)

So, it seems to 'fail' with debug only because it causes udev startup to take so
long that it times out and continues with it running in the background.

Comment 18 Harald Hoyer 2008-04-14 20:45:34 UTC

you can set the start_udev timeout on the kernel command line.. try:
udevtimeout=1000

Comment 19 Bruno Wolff III 2008-04-14 22:48:59 UTC

How do I tell which ones are non required? Should I just look for cruft left
behind after upgrades (by checking rpm -q --whatprovides for each entry) or did
you ahve something else in mind. Or was your comment directed to solving the
udev timeout issue before you had a better recommendation?

Comment 20 Bill Nottingham 2008-04-15 01:32:15 UTC

Bruno: are you still seeing this in cases where you aren't passing udev debug
options?

If so, I'd try just moving out of the way:
40-multipath.rules

and see if that makes things better.

Comment 21 Frank Arnold 2008-04-15 08:46:22 UTC

(In reply to comment #18)
> you can set the start_udev timeout on the kernel command line.. try:
> udevtimeout=1000

No udevsettle anymore, but it didn't run fully through and locked up somewhere.
Increased the timeout to 10000, ran it again, and it got stuck somewhere else.

I don't think this gives us much because it takes more than an hour to do a
single run.

I had a glance at the udev man page. Would it be a good idea to modify the rules
somehow to do some output, so we can actually see what gets set up? Like RUN
something to output the imported ENV variables for each device that gets set up?

(In reply to comment #16)
> Does it persist if you move out some of the non-required rule files?

I excluded following rules
  40-multipath
  60-pcmcia
  60-persistent-storage-tape
  60-wacom
  64-device-mapper

But still, one out of six arrays does not get set up properly. This result is
quite reliable to reproduce.

Comment 22 Bruno Wolff III 2008-04-15 10:57:59 UTC

I just tried 5 reboots this morning with udev set to debug and using the
udevtimeout kernel option and I didn't get it to fail. I'll try again tonight.
My reboots take about 15-20 minutes with the udev debugging on. I only end up
with about 9MB of logs. I am not sure why I am getting so much less.

Comment 23 Bill Nottingham 2008-04-15 15:10:50 UTC

Is it always the same array that fails? If so, is there anything about it
configured differently (either at setup, or in mdadm.conf)

mdadm -D output may help here.

The problem is I can't reproduce this; without that, it's hard to code a fix.

Comment 24 Bruno Wolff III 2008-04-15 15:48:33 UTC

In my case its the two /boot arrays (one is mounted as /otherboot, but was
originally a /boot partition on a test install). I have only seen those two have
the problem recently. They are md0 and md1.
I did take a look at mdadm -D output and didn't see anything odd looking. When I
tried it after a failed boot it said there wasn't any such array for the one
that didn't work. I'll try looking more carefully to see if I can spot something.

Comment 25 Frank Arnold 2008-04-15 15:57:08 UTC

(In reply to comment #23)
> Is it always the same array that fails?

No. Different ones each time. All arrays are similar in their setup in my case.

> The problem is I can't reproduce this; without that, it's hard to code a fix.

Yeah, I know ;) I'm currently at work, testing the other thing coming from you
that's scheduled to be released... so I can't do anything now. One thing I tried
before going to work:
I've modified the 70-mdadm.rules. I added the line
        RUN+="echo something", \
before the mdadm RUN line. This caused the system to boot up cleanly twice.
Tried the same thing with a usleep 1000, and it didn't work. I'll tinker around
a bit more as soon as I'll be back home.

Comment 26 Frank Arnold 2008-04-15 22:59:23 UTC

Created attachment 302535 [details]
Output of dmesg

(In reply to comment #25)
> I've modified the 70-mdadm.rules. I added the line
>	  RUN+="echo something", \
> before the mdadm RUN line. This caused the system to boot up cleanly twice.

Pure luck. Does not work each time. But I should have had the idea to look at
the output of dmesg sometime earlier...

The arrays that fail get logged there with:
md: array mdx already has disks!

Full dmesg attached (md3 failed in this case).

Comment 27 Frank Arnold 2008-04-16 11:44:06 UTC

Created attachment 302584 [details]
More verbose output of dmesg

I've added MD_BUG(); for the condition where it fails to see the actual state
of all arrays. In this case md6 failed and got one of its members already
assigned...

Comment 28 Bill Nottingham 2008-04-16 23:17:55 UTC

Thanks for the debugging!

Can you try the mdadm packages at:
http://koji.fedoraproject.org/koji/taskinfo?taskID=569782

Comment 29 Frank Arnold 2008-04-17 00:02:11 UTC

Some more information for interested parties, without looking at the changes to
mdadm:

For each array that failed to get set up md_ioctl() got called twice with the
command SET_ARRAY_INFO in a short period of time, and those calls seemed to
tread on each other's toes.
The best way would be a 'binding already in progress' flag for each device
inside the md driver to fit each and every corner case, if that's feasible.

At least the usual case does work now reliably. Rebooted more than 10 times in a
row and no double call was made.

Am I right? ... :)

Comment 30 Bill Nottingham 2008-04-17 00:17:45 UTC

Basically, the change made in that package is for the routine to open the device
with O_EXCL so that only one mdadm call has the device open at once; if it can't
open it, it waits a short time and tries again.

Comment 31 Bill Nottingham 2008-04-17 03:29:55 UTC

Frank, just confirming - the new package works for you?

Comment 32 Frank Arnold 2008-04-17 08:12:18 UTC

(In reply to comment #31)
> Frank, just confirming - the new package works for you?

Yes, it's working.

Comment 33 Bruno Wolff III 2008-04-17 16:20:18 UTC

I tried the foo version of mdadm this morning (and removed the udev debuging
which seemed to mask the problem) and had no raid failures in 10 reboots. At the
50% failure rate I was seeing previously, this should have happened by change
only 1 time in 1000. So I think it is pretty likely you fixed the problem.
Thanks you guys!

Comment 34 Bill Nottingham 2008-04-17 16:27:12 UTC

Fixed in 2.6.4-4.

Comment 35 Will Woods 2008-04-21 21:33:53 UTC

This package has landed in rawhide, and the problem seems fixed for me.

I'm closing the bug to get it off the blocker list, but feel free to reopen it
if the problem manifests with mdadm-2.6.4-4 or later.

Note You need to log in before you can comment on or make changes to this bug.