Bug 670964 - udevadm settle - timeout of 180 seconds reached
udevadm settle - timeout of 180 seconds reached
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
14
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-19 14:15 EST by Andrew McNabb
Modified: 2015-01-27 06:08 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-16 18:01:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
boot.log (21.93 KB, text/plain)
2011-01-19 14:15 EST, Andrew McNabb
no flags Details
/var/log/messages from a boot with a hang (56.23 KB, text/plain)
2011-03-23 13:24 EDT, Andrew McNabb
no flags Details
/var/log/messages from a boot without hanging (58.95 KB, text/plain)
2011-03-23 13:26 EDT, Andrew McNabb
no flags Details

  None (edit)
Description Andrew McNabb 2011-01-19 14:15:39 EST
Created attachment 474339 [details]
boot.log

Description of problem:

At boot, the system hangs for 3 minutes at "Starting udev".  The boot.log contains the message, "udevadm settle - timeout of 180 seconds reached, the event queue contains" followed by a big long list (which I will attach).  However, if I run "udevadm settle" manually after boot, it finishes in 0.004 seconds.


Version-Release number of selected component (if applicable):

udev-161-8.fc14.x86_64


How reproducible:

Always or nearly always.


Please let me know if there is any other information that would be helpful.  Thanks.
Comment 1 Andrew McNabb 2011-01-19 14:27:11 EST
So I've had this problem with a large number of new machines, but it looks like the problem goes away after a few boots.  I'm starting to think that it might have something to do with the new motherboards arriving with the date set in the future.  Does this seem plausible?  Anaconda doesn't seem to run an ntp client, so lots of timestamps get set in the future.
Comment 2 Harald Hoyer 2011-01-20 03:01:58 EST
can you add to the kernel command line "udev.log-priority=debug udev.children-max=1" and without "rhgb quiet" and see, if it hangs on a special udev event?
Comment 3 Andrew McNabb 2011-01-31 18:11:20 EST
I would be happy to add the kernel command-line options, but I won't be able to visually check whether it hangs on a special udev event (most of the machines are headless).  Will there be a log file that will show when the hang happens?  Unfortunately, the problem is unpredictable enough that I can't easily plug a monitor into a machine and force the problem to happen, but I can reboot a few dozen machines and note which of them have the problem.  Thanks.
Comment 4 Andrew McNabb 2011-03-03 17:59:29 EST
So I've continued to see this error message (so its apparent correlation with the clock described in comment #1 turns out to have been purely chance).

Since I added the "udev.log-priority=debug udev.children-max=1" to the kernel command line in grub.conf, I haven't seen the "udevadm settle - timeout" error.  I'll keep on rebooting 25 machines a few times and see if I can reproduce it.
Comment 5 Andrew McNabb 2011-03-03 18:11:30 EST
After about 100 reboots, I haven't been able to reproduce the problem when "udev.log-priority=debug udev.children-max=1" is in the kernel command line. However, for the machines that do not have this line I see the "udevadm settle - timeout" error about 30% of the time.  Is there anything else I can do to produce useful information?  Was there anything in the original boot.log that would be helpful?
Comment 6 Harald Hoyer 2011-03-04 02:08:58 EST
you could try to increase the children-max or decrease the log level:

next steps:

1. "udev.log-priority=debug udev.children-max=2"
2. "udev.log-priority=info udev.children-max=1"
3. "udev.log-priority=info udev.children-max=2"
4. "udev.children-max=1"
5. "udev.children-max=2"
Comment 7 Andrew McNabb 2011-03-22 16:21:50 EDT
I've tried increasing children-max and decreasing log level, and in all cases the timeout message did not appear (out of 25 boots on each setting).  When I then removed the children-max and log-priority settings, the message appeared very frequently (15 out of 25 boots).  What else should I try?
Comment 8 Harald Hoyer 2011-03-23 09:25:32 EDT
Any suspicious kernel messages in /var/log/messages?
Comment 9 Andrew McNabb 2011-03-23 13:24:35 EDT
Created attachment 487093 [details]
/var/log/messages from a boot with a hang
Comment 10 Andrew McNabb 2011-03-23 13:26:06 EDT
Created attachment 487094 [details]
/var/log/messages from a boot without hanging

I've attached /var/log/messages (trimmed to the appropriate time interval) for a boot with a hang and a boot without hanging.  The machines have essentially identical hardware.

I didn't see anything suspicious, but I may easily have missed something important.
Comment 11 Harald Hoyer 2011-03-24 12:22:33 EDT
Mar 22 14:14:15 19potato kernel: [    4.362669] udev[451]: starting version 161
Mar 22 14:14:15 19potato kernel: [  304.765206] Clocksource tsc unstable (delta = 299936213078 ns)
Mar 22 14:14:15 19potato kernel: [  304.770030] Switching to clocksource hpet
Mar 22 14:14:15 19potato kernel: [  306.047584] ACPI: WMI: Mapper loaded

versus

Mar 22 14:09:40 18potato kernel: [    2.014453] udev[162]: starting version 161
Mar 22 14:09:40 18potato kernel: [    2.037836] [drm] Initialized drm 1.1.0 20060810
....

Maybe "Clocksource tsc unstable" is a problem.

You can tell it to use the HPET clocksource from the start by adding "clocksource=hpet" to your kernel command line. 

Hope that helps.
Comment 12 Andrew McNabb 2011-03-24 12:54:33 EDT
I just tried adding "clocksource=hpet" to the kernel command line.  Unfortunately, the timeout error is still appearing on most boots.
Comment 13 Harald Hoyer 2011-03-24 13:10:51 EDT
does "Clocksource tsc unstable" appear for every timeout error?
Comment 14 Andrew McNabb 2011-03-24 13:37:58 EDT
In the latest boot of 25 machines, 14 had the timeout, and 15 had the "Clocksource tsc unstable" error. All 14 machines that timed out had the "Clocksource tsc unstable" error. One machine timed out with giving the "Clocksource tsc unstable" error.

The correlation isn't 100%, but it's extraordinarily close.
Comment 15 Andrew McNabb 2011-03-24 13:40:16 EDT
Sorry, I had that last sentence backwards.  One machine had the "Clocksource tsc unstable" error but did not time out.
Comment 16 Andreas Haertel 2011-05-02 15:49:30 EDT
I have the same problem, but with 
 udevd[453]: worker [457] unexpectedly returned with status 0x0100
 udevd[453]: worker [457] failed while handling '/devices/pci0000:00/0000:00:1e.0/0000:04:01.2/video4linux/video1'
A more detailed description can be found on 
http://www.fedoraforum.org/forum/showthread.php?p=1467388&posted=1#post1467388

Often the timeout vanishes if I try booting two or three times, using the reset button during waiting for udev. 

Hope to find a solution here, thanks and please ask for more information if you need it! 

Andreas
Comment 17 dosboss64 2011-06-06 19:44:17 EDT
I also encountered this problem with my Fedora 14 workstation (2.6.35.13-91.fc14.x86_64) I think I have found the issue in my case.  I get all the same issues listed in the ticket (e.g. 'udevadm settle - timeout...', 'Clocksource tsc unstable', etc.) with the exception in my case that after finally booting, a 'udevadm settle' command reproduces the issue.  In addition, my machine's load average stays in the 8.0-10.0 range until reboot (extremely abnormal, usually 0.05 or less), and a 'top' and 'iotop' show almost no activity in the system.

In addition to several other devices (USB and IEEE1394), I have a cheap Logitech webcam attached, the output from hot-plugging this device is below (serial number obscured on purpose):

[ 1298.615502] usb 1-3.2: new high speed USB device using ehci_hcd and address 5
[ 1298.829132] usb 1-3.2: New USB device found, idVendor=046d, idProduct=0802
[ 1298.829142] usb 1-3.2: New USB device strings: Mfr=0, Product=0, SerialNumber=2
[ 1298.829150] usb 1-3.2: SerialNumber: XXXXXXXXXX
[ 1299.102907] Linux video capture interface: v2.00
[ 1299.404793] uvcvideo: Found UVC 1.00 device <unnamed> (046d:0802)
[ 1299.405092] usbcore: registered new interface driver snd-usb-audio
[ 1299.422466] input: UVC Camera (046d:0802) as /devices/pci0000:00/0000:00:0b.1/usb1/1-3/1-3.2/1-3.2:1.0/input/input4
[ 1299.422687] usbcore: registered new interface driver uvcvideo
[ 1299.422693] USB Video Class driver (v1.0.0)

When the device is plugged in before boot, the errors and timeouts occur.  When removed *before* boot, the machine boots normally.  Removal during boot seems to not make a difference.  Plugging it in after booting completely allows the camera and system to function normally.

I honestly cannot remember the exact model number (and it's not printed on the unit), but the device is extremely similar to this one: http://www.logitech.com/en-us/webcam-communications/webcams/devices/5865 (Logitech - Webcam C200)  I acquired this device while still running F11, and the problem has remained through all Fedora versions till the current one I am running.  I don't mind removing the camera when not needed (to facilitate booting automatically after e.g. a power outage), but I just hope this info helps someone.
Comment 18 Fedora End Of Life 2012-08-16 18:01:13 EDT
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.