Red Hat Bugzilla – Bug 569019
udevd sometimes fires off lots of processes during boot
Last modified: 2010-08-23 12:20:35 EDT
Description of problem:
Sometimes while booting during udev rule processing there is a long delay and then a timeout occurs allowing the boot to proceed. A large number of messages referring to md device paths is displayed. Typically when this happens, udevd processes are still being created for a few minutes after boot has completed.
It doesn't happen every boot, but I haven't noticed a pattern to when it does or doesn't.
boot.log contained a bit over 800 lines mostly similar to:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Created attachment 397725 [details]
I'm seeing the same thing in F12.
Smolt Profile: http://www.smolts.org/client/show/pub_d98859db-2a89-44d6-baec-284b6acac7f9
(In reply to comment #1)
> Created an attachment (id=397725) [details]
> I'm seeing the same thing in F12.
> Smolt Profile:
mdadm-3.1.1-0.gcd9a8b5.3.fc12.x86_64 ?? where did you get this from?
Does the problem persist, if you downgrade to mdadm-3.0.3-2.fc12? If not, then file a bug against mdadm.
Going back to mdadm-3.0.3-3.fc13.i686 results in not seeing this, so I am switching this over to mdadm.
I've just hit the same problem. The same mdadm versions. Downgrade helped.
This is caused by a shift in the location of mdadm's map file. The fact that it only happens intermittently is because there is a race between mdadm triggering a change event and handling the change event it is triggering.
Code has been added to the latest mdadm build to rectify this situation. The fixed code will be present in the 3.1.2-1 build.
I see there is an f14 build, but I don't see an f13 build. Is there going to be an f13 build soon? Is it worthwhile to test the f14 build on an f13 system?
I tried out mdadm-3.1.2-1.fc14.i686 on one machine for one reboot so far and things look ago. But with the intermittant nature of the problem, this isn't a guaranty that things are really fixed. I have a couple of more machines updating now (along with a bunch of other f13 updates) and I'll try it on my work machine tomorrow. I'll report back again if I see problems or after all four machines have tested out OK.
Trust me, the problem is well understood and is fixed ;-) I'm actually going to be making a few more fixes for other bugs today, plus one more failsafe in this same code, and that's why F13/F12 builds aren't done yet.
The problem was that /dev/md didn't exist, and the new mdadm wants to put it's mapfile in /dev/md/map. Without /dev/md already existing, it couldn't open the mapfile, so it would call RebuildMap internally, which would attempt to rebuild the map from scratch and write it out. It would fail. However, before it exited, it would trigger a udev change event on the device on the *assumption* that the RebuildMap would succeed. Since it didn't, and /dev/md still didn't exist, it would simply trigger a new RebuildMap call when the next udev spawned mdadm tried to process the change event. This was the loop, and the only thing that would break you out of it is if udev decided that the currently running mdadm had handled the change already or something like that. Well, that or if something created /dev/md. The solution then is to either A) make sure that both the initrd image and the root filesystem contain /dev/md before we start calling mdadm (which would require a dracut change and an mdadm packaging change) or B) make mdadm create /dev/md if it doesn't already exist when dealing with the mapfile. I chose B because mdadm already does this when creating symlinks for arrays, so there is a certain amount of expectation that mdadm will create the /dev/md directory itself and I didn't want to go around adding exceptions to other programs to cover the fact that mdadm isn't doing so.
*** Bug 572066 has been marked as a duplicate of this bug. ***
Thanks for the detailed explanation.
mdadm-3.1.2-1.fc14 is working well for me and might be useful for other people that reboot systems fairly pending new releases. Thanks for providing that build.
*** Bug 576253 has been marked as a duplicate of this bug. ***
mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13 has been submitted as an update for Fedora 13.
mdadm-3.1.2-9.fc13, initscripts-9.09-1.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
su -c 'yum --enablerepo=updates-testing update mdadm initscripts'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13
*** Bug 585527 has been marked as a duplicate of this bug. ***
initscripts-9.09-1.fc13, mdadm-3.1.2-10.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.
I'm having a problem very much like this during system installation of F13 with a specialized configuration. The Packages directory contains mdadm-3.1.2-10.fc13 and initscripts-9.12-1.fc13, but I'm not sure what is in initrd etc.
On our old 1U appliance, we have four internal drives, and we use md to set up two mirrors for /boot and /, and a RAID-10 for our data file system.
Specifically, anaconda is calling udev to wait for /dev/md0 to settle,
after md0 has already been deactivated in the "teardown" phase of Anaconda storage detection. We believe it is specific to software raid configurations.
I've changed the udev_log option to udev_log="debug" and I'm recreating the CD's now. I should be able to watch udev rules in /tmp/syslog in the installation enviroment.
Any suggestions as to what else I should try?
I should mention that it hangs forever in this partitioning phase, it is more than just slower than normal to settle.