569019 – udevd sometimes fires off lots of processes during boot

Bug 569019 - udevd sometimes fires off lots of processes during boot

Summary: udevd sometimes fires off lots of processes during boot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mdadm
Sub Component:
Version:	13
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	572066 576253 585527 (view as bug list)
Depends On:
Blocks:	575768
TreeView+	depends on / blocked

Reported:	2010-02-27 16:45 UTC by Bruno Wolff III
Modified:	2010-08-23 16:20 UTC (History)
CC List:	9 users (show)
Fixed In Version:	initscripts-9.09-1.fc13
Clone Of:
Clones:	575768 (view as bug list)
Environment:
Last Closed:	2010-04-28 03:09:10 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Boot.log (30.83 KB, text/plain) 2010-03-04 05:54 UTC, Mace Moneta	no flags	Details
View All

Description Bruno Wolff III 2010-02-27 16:45:43 UTC

Description of problem:
Sometimes while booting during udev rule processing there is a long delay and then a timeout occurs allowing the boot to proceed. A large number of messages referring to md device paths is displayed. Typically when this happens, udevd processes are still being created for a few minutes after boot has completed.
It doesn't happen every boot, but I haven't noticed a pattern to when it does or doesn't.

boot.log contained a bit over 800 lines mostly similar to:
  /sys/devices/virtual/block/md6 (5384)
  /sys/devices/virtual/block/md8 (5385)
  /sys/devices/virtual/block/md4 (5386)
  /sys/devices/virtual/block/md9 (5387)
  /sys/devices/virtual/block/md10 (5388)
  /sys/devices/virtual/block/md6 (5389)
  /sys/devices/virtual/block/md8 (5390)
  /sys/devices/virtual/block/md4 (5391)
  /sys/devices/virtual/block/md9 (5392)
  /sys/devices/virtual/block/md10 (5393)

Version-Release number of selected component (if applicable):
udev-151-3.fc13.i686
mdadm-3.1.1-0.gcd9a8b5.6.fc13.i686

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Mace Moneta 2010-03-04 05:54:58 UTC

Created attachment 397725 [details]
Boot.log

I'm seeing the same thing in F12.

kernel-2.6.32.9-70.fc12.x86_64
udev-145-15.fc12.x86_64
mdadm-3.1.1-0.gcd9a8b5.3.fc12.x86_64

Smolt Profile: http://www.smolts.org/client/show/pub_d98859db-2a89-44d6-baec-284b6acac7f9

Comment 2 Harald Hoyer 2010-03-04 10:11:25 UTC

(In reply to comment #1)
> Created an attachment (id=397725) [details]
> Boot.log
> 
> I'm seeing the same thing in F12.
> 
> kernel-2.6.32.9-70.fc12.x86_64
> udev-145-15.fc12.x86_64
> mdadm-3.1.1-0.gcd9a8b5.3.fc12.x86_64
> 
> Smolt Profile:
> http://www.smolts.org/client/show/pub_d98859db-2a89-44d6-baec-284b6acac7f9    

mdadm-3.1.1-0.gcd9a8b5.3.fc12.x86_64 ?? where did you get this from?

Does the problem persist, if you downgrade to mdadm-3.0.3-2.fc12? If not, then file a bug against mdadm.

Comment 3 Bruno Wolff III 2010-03-06 18:40:31 UTC

Going back to mdadm-3.0.3-3.fc13.i686 results in not seeing this, so I am switching this over to mdadm.

Comment 4 Anton Arapov 2010-03-16 14:37:10 UTC

I've just hit the same problem. The same mdadm versions. Downgrade helped.

Comment 5 Doug Ledford 2010-03-17 03:24:22 UTC

This is caused by a shift in the location of mdadm's map file.  The fact that it only happens intermittently is because there is a race between mdadm triggering a change event and handling the change event it is triggering.

Code has been added to the latest mdadm build to rectify this situation.  The fixed code will be present in the 3.1.2-1 build.

Comment 6 Bruno Wolff III 2010-03-17 04:04:45 UTC

I see there is an f14 build, but I don't see an f13 build. Is there going to be an f13 build soon? Is it worthwhile to test the f14 build on an f13 system?

Comment 7 Bruno Wolff III 2010-03-17 04:35:48 UTC

I tried out mdadm-3.1.2-1.fc14.i686 on one machine for one reboot so far and things look ago. But with the intermittant nature of the problem, this isn't a guaranty that things are really fixed. I have a couple of more machines updating now (along with a bunch of other f13 updates) and I'll try it on my work machine tomorrow. I'll report back again if I see problems or after all four machines have tested out OK.

Comment 8 Doug Ledford 2010-03-17 13:04:36 UTC

Trust me, the problem is well understood and is fixed ;-) I'm actually going to be making a few more fixes for other bugs today, plus one more failsafe in this same code, and that's why F13/F12 builds aren't done yet.

The problem was that /dev/md didn't exist, and the new mdadm wants to put it's mapfile in /dev/md/map. Without /dev/md already existing, it couldn't open the mapfile, so it would call RebuildMap internally, which would attempt to rebuild the map from scratch and write it out. It would fail. However, before it exited, it would trigger a udev change event on the device on the *assumption* that the RebuildMap would succeed. Since it didn't, and /dev/md still didn't exist, it would simply trigger a new RebuildMap call when the next udev spawned mdadm tried to process the change event. This was the loop, and the only thing that would break you out of it is if udev decided that the currently running mdadm had handled the change already or something like that. Well, that or if something created /dev/md. The solution then is to either A) make sure that both the initrd image and the root filesystem contain /dev/md before we start calling mdadm (which would require a dracut change and an mdadm packaging change) or B) make mdadm create /dev/md if it doesn't already exist when dealing with the mapfile. I chose B because mdadm already does this when creating symlinks for arrays, so there is a certain amount of expectation that mdadm will create the /dev/md directory itself and I didn't want to go around adding exceptions to other programs to cover the fact that mdadm isn't doing so.

Comment 9 Harald Hoyer 2010-03-17 17:32:30 UTC

*** Bug 572066 has been marked as a duplicate of this bug. ***

Comment 10 Bruno Wolff III 2010-03-17 17:44:11 UTC

Thanks for the detailed explanation.
mdadm-3.1.2-1.fc14 is working well for me and might be useful for other people that reboot systems fairly pending new releases. Thanks for providing that build.

Comment 11 Doug Ledford 2010-03-23 16:30:22 UTC

*** Bug 576253 has been marked as a duplicate of this bug. ***

Comment 12 Fedora Update System 2010-04-09 20:12:47 UTC

mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13

Comment 13 Fedora Update System 2010-04-13 01:40:20 UTC

mdadm-3.1.2-9.fc13, initscripts-9.09-1.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm initscripts'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13

Comment 14 Doug Ledford 2010-04-26 17:20:29 UTC

*** Bug 585527 has been marked as a duplicate of this bug. ***

Comment 15 Fedora Update System 2010-04-28 03:08:18 UTC

initscripts-9.09-1.fc13, mdadm-3.1.2-10.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 16 Dan Davis 2010-08-23 16:20:35 UTC

I'm having a problem very much like this during system installation of F13 with a specialized configuration.  The Packages directory contains mdadm-3.1.2-10.fc13 and initscripts-9.12-1.fc13, but I'm not sure what is in initrd etc.

On our old 1U appliance, we have four internal drives, and we use md to set up two mirrors for /boot and /, and a RAID-10 for our data file system.

Specifically, anaconda is calling udev to wait for /dev/md0 to settle, 
after md0 has already been deactivated in the "teardown" phase of Anaconda storage detection.   We believe it is specific to software raid configurations.

I've changed the udev_log option to udev_log="debug" and I'm recreating the CD's now.  I should be able to watch udev rules in /tmp/syslog in the installation enviroment.

Any suggestions as to what else I should try?

I should mention that it hangs forever in this partitioning phase, it is more than just slower than normal to settle.

Note You need to log in before you can comment on or make changes to this bug.