Bug 200514 - Udev initialisation takes so long it can affect fsck
Summary: Udev initialisation takes so long it can affect fsck
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: udev
Version: 5
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Harald Hoyer
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-07-28 11:49 UTC by David Howells
Modified: 2007-11-30 22:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-09-20 11:04:08 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description David Howells 2006-07-28 11:49:03 UTC
Description of problem:

On my Dual 200MHz PPro testbox, with SELinux enabled and using a vanilla 
kernel (approximately FC6's kernel), the system will almost always wind up in 
the filesystem repair shell.

Version-Release number of selected component (if applicable):

e2fsprogs-1.38-12
udev-084-13
initscripts-8.31.5-1
linux-2.6.18-rc2 up to date with git to 27th July 2006.

How reproducible:

Almost 100% (it appears timing related), but it requires a slow system to show 
the effect.

Steps to Reproduce:
1. Install vanilla kernel
2. Boot to it
3. Wait for fsck to fail or complete.
  
Actual results:

System jumps to filesystem repair shell.

Expected results:

System should boot normally.

Additional info:

The problem appears to be udev takes such a long time to run that it gets 
backgrounded by the boot procedure.  However, shortly after it is 
backgrounded, fsck is run.  What appears to be happening is that udev hasn't 
actually created any block device references at this point, and so fsck goes 
searching through all the chardev lists in /sys - of which there are a lot, 
since each tty dev entry points back to the list of tty dev entries. 
Eventually fsck dies on SIGKILL.

From examining things with strace, I can say:

 (1) rc.sysinit isn't SIGKILL'ing fsck.

 (2) stracing fsck will cause fsck to succeed - I think because it slows fsck 
and thus allows udev to catch up.

 (3) fsck goes and does a lot of stat64'ing of chardevs in /sys, indirectly 
via /dev/.udev.  The paths look like this:

/dev/.udev/failed/devices@platform@serial8250/tty:ttyS0/subsystem/ptyc3/subsystem/ptyc3/dev

     Note that each of four tty:ttyS[0-3] is iterated through, as are all 
584 "?ty??" in the first subsystem directory, and also in the second 
subdirectory (that's recursive through symlinkage).  This is on the order of 
1.4 million chardevs.

     Using gdb shows the search is being conducted in libblkid from 
e2fsprogs-libs.

I've upgraded by testbox from an early FC5 to the latest FC5 and that doesn't 
change the problem.

As I said above, I think the root of the problem is that udev takes so long to 
run (judging by the way the PIDs crank it's running nearly 2000 programs), and 
this is a problem on a slow machine.

Running fsck with the same parameter list once the repair shell is available 
works almost instantly.

fsck -T -t noopts=_netdev -A -a -C

I'm not sure whether this belongs against the udev, initscripts or e2fsprogs 
packages, but I think the first has to be the major culprit: udev needs to be 
faster or optional.

Comment 1 Kay Sievers 2006-08-02 14:46:29 UTC
Searching /dev for device nodes that way can't really work, it's a weird
concept, and in this implementation obviously broken. (For that reason, on SUSE,
we patched mount and fsck to use libvolume_id provided by udev.)

Comment 2 Bill Nottingham 2006-08-02 14:52:37 UTC
However, shouldn't having udevsettle in the udev start procedure handle this
for any reasonably local devices (IDE, SCSI)?

Comment 3 Kay Sievers 2006-08-03 07:39:36 UTC
Hmm, isn't this the problem:
 "3) fsck goes and does a lot of stat64'ing of chardevs in /sys"
 "This is on the order of 1.4 million chardevs."

What do you mean? What should udevsettle handle?

Comment 4 Bill Nottingham 2006-08-03 13:02:11 UTC
Maybe I misread; I though the problem was:

udev starts, starts coldplug, start scripts exit
  <loads scsi, sata, etc driver>
  <disk scan starts>
fsck runs, can't find device
  <disk scan finishes>

and udevsettle would help with that. Perhaps I missed the actual problem here.


Comment 5 Kay Sievers 2006-08-03 15:12:26 UTC
Oh right, looks like. You require a newer udev, or backport udevsettle for that.
Udevd needs to export the current seqnum, which udevsettle can compare against
the actual kernel number to see if the kernel has events in the queue and not
only in the udev daemon queue. Maybe that's the reason for the failure?


Note You need to log in before you can comment on or make changes to this bug.