Bug 405361 - System freeze at "Starting udev..."
System freeze at "Starting udev..."
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
athlon Linux
low Severity medium
: ---
: ---
Assigned To: Anton Arapov
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-29 23:15 EST by Andrew Schultz
Modified: 2014-06-18 04:01 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-01 02:38:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch against initscripts to remove leftover files in /var/run/console (373 bytes, patch)
2008-06-19 06:13 EDT, Klaus Steinberger
no flags Details | Diff
Patch to remove leftover files in /var/run/console (622 bytes, patch)
2008-06-19 11:57 EDT, Klaus Steinberger
no flags Details | Diff
kernel log with udevdebug (42.31 KB, text/plain)
2008-06-19 19:41 EDT, Andrew Schultz
no flags Details
udevdebug log without libmtp (81.85 KB, text/plain)
2008-06-20 12:38 EDT, Andrew Schultz
no flags Details
lspci output (893 bytes, text/plain)
2008-07-08 20:01 EDT, Andrew Schultz
no flags Details
/proc/cpuinfo (831 bytes, text/plain)
2008-07-09 16:53 EDT, Andrew Schultz
no flags Details
dmesg with s1 (12.23 KB, text/plain)
2008-07-09 16:55 EDT, Andrew Schultz
no flags Details
proposed patch (1.92 KB, patch)
2008-07-09 17:16 EDT, Anton Arapov
no flags Details | Diff
lspci -vv (5.34 KB, text/plain)
2008-08-15 14:27 EDT, Andrew Schultz
no flags Details

  None (edit)
Description Andrew Schultz 2007-11-29 23:15:35 EST
On bootup, the system freezes at "Starting udev".  Unfortunately, there is no
output to any log.  This is an AMD Athlon 2800 MP.  As a workaround, i can boot
up with nosmp and the system runs fine.

Version-Release number of selected component (if applicable):
kernel-2.6.18-53.1.4.el5 (also 2.6.18-53.el5)
The system ran fine with the RHEL4 kernel (2.6.8-55.0.12.ELsmp)

How reproducible:
100%
Comment 1 Andrew Schultz 2007-12-01 21:10:11 EST
I tried running FC6's release kernel (2.6.18-1.2798.fc6) and most updated kernel
(2.6.22.9-61.fc6) and both booted fine in SMP mode.
Comment 2 Prarit Bhargava 2007-12-06 10:01:16 EST
Andrew, Modify the rc.sysinit script so that it does a verbose output of the
"udev" steps.  

This should give you some better output (and debugging info for this BZ!)

P.
Comment 3 Andrew Schultz 2007-12-06 14:09:55 EST
/sbin/start_udev invokes udevsettle, which never exits.

I ran udevsettle under strace and it ended with

nanosleep({0, 50000000}, NULL)
stat64("/dev/.udev/queue",...
nanosleep({0, 50000000},

[the last nanosleep never finishes]
Comment 4 Anton Arapov 2008-06-16 04:59:43 EDT
Andrew, I'd like to know, whether /sbin/start_dev is not working for you on
RHEL5.2, kernel-2.6.18-92+

Thanks in advance.
Comment 5 Andrew Schultz 2008-06-16 10:36:19 EDT
This still fails for me with the latest kernel, 2.6.18-92.1.1.el5
Comment 6 Harald Hoyer 2008-06-17 05:22:36 EDT
udevsettle just waits that the work queue of udevd is empty.

In most cases, udevd does not complete the queue, because a kernel module is
hanging.

Adding "udevinfo" or "udevdebug" to the kernel command line enables output of
debug messages, which can be redirected via a serial console with e.g.
"console=ttyS0,9600n8" (
http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/configure-kernel.html )
Comment 7 Harald Hoyer 2008-06-17 05:23:38 EDT
btw, udev can't log anywhere, because the filesystem is not mounted writeable.
Comment 8 Anton Arapov 2008-06-17 06:59:31 EDT
Andrew, I'd like to ask you to provide debug messages.
Comment 9 Klaus Steinberger 2008-06-19 06:13:51 EDT
Created attachment 309838 [details]
Patch against initscripts to remove leftover files in /var/run/console
Comment 10 Klaus Steinberger 2008-06-19 06:20:23 EDT
I had also the udev hangs. My systems are authenticating against LDAP. I did
some debugging, udev started to setup devices and began to fork off
"pam_console_apply" after that it hung. (sorry no debug screenshot, it was on
the console)

The hung happened in case pam_console leftover his status files in
/var/run/console, so pam_console_applys tried to give the permission to the
console user. But at the time udev starts, there is no network connection
available. I suspect that the LDAP requests did not fail, but instead hung, and
so also pam_console_apply did not return.

I uploaded a onliner patch already against initscripts, which removes leftover
files just before the start of udev.

I my case that fixed the problem.

Sincerly,
Klaus
Comment 11 Harald Hoyer 2008-06-19 10:00:36 EDT
(In reply to comment #9)
> Created an attachment (id=309838) [edit]
> Patch against initscripts to remove leftover files in /var/run/console
> 

Nice catch.. did you file a bugzilla against initscripts with this?
Comment 12 Klaus Steinberger 2008-06-19 11:57:37 EDT
Created attachment 309859 [details]
Patch to remove leftover files in /var/run/console

The previous patch did not work, as the filesystem is not writable at that
time.

As a workaround the new patch remount rw, removes the files and then remount's
ro again.

That's not the perfect solution, but should work. 

As a long term solution, there should be two things:
1. Make pam_console_apply bulletproof against network outage. so it will not
   hang on ldap requests.
2. Also cleanup /var/run/console on Shutdown (will not help for crashes of
course)
Comment 13 Andrew Schultz 2008-06-19 19:41:57 EDT
Created attachment 309891 [details]
kernel log with udevdebug

This is the bootup log with udevdebug.	I tried udevinfo, but it didn't seem to
print anything extra.  The "add_to_rules: unknown key 'ATTR{id____}'" messages
have started only with more recent kernels (perhaps 2.6.18-92).  I get them
even with nosmp.
Comment 14 Harald Hoyer 2008-06-20 02:04:27 EDT
please provide the output of:
# grep -rl ATTR /etc/udev/rules.d/|sort -u
Comment 15 Harald Hoyer 2008-06-20 02:08:01 EDT
Hmm, looking at the udevdebug output, I see no module gets loaded and nothing
special happening.

I guess this is a kernel smp lockup.
Comment 16 Anton Arapov 2008-06-20 02:15:13 EDT
(In reply to comment #15)
Yes, and it was original intention... I'm working on it and wanted to exclude
any possible non-kernel issues.
Comment 17 Harald Hoyer 2008-06-20 02:25:34 EDT
comment #14 would be nice to have
Comment 18 Andrew Schultz 2008-06-20 12:38:41 EDT
Created attachment 309935 [details]
udevdebug log without libmtp

# grep -rl ATTR /etc/udev/rules.d/|sort -u

the offending file there was from libmtp (rebuilt from the F8 SRPM).  I removed
that and collected a new log.
Comment 19 Anton Arapov 2008-06-23 04:53:12 EDT
what a wonderful udevlog...
Andrew, does it still hangs after you removed libmtp?
Comment 20 Andrew Schultz 2008-06-23 11:50:00 EDT
yes
Comment 21 Anton Arapov 2008-06-24 03:38:12 EDT
hmm... I'm taken aback! You have not any oops/whatnot message in dmesg. I have
no luck with reproducing, but have no appropritate hardware... :(

Andrew, please, try to get vmcore when system hung.
http://kbase.redhat.com/faq/FAQ_105_9036.shtm
try Alt-syrq-c, if it will not produce vmcore automagically on hang.
Comment 22 Andrew Schultz 2008-06-24 20:27:59 EDT
kdump did not seem to kick in automatically, and alt-sysq-c did not seem to
help.    I'm not sure where it would put the vmcore file anyway at that point in
the bootup sequence.
Comment 23 Anton Arapov 2008-06-25 02:24:42 EDT
the hope, that this will trigger automatically at this moment of boot has been
died. :(
Comment 24 Anton Arapov 2008-07-07 08:07:17 EDT
Andrew, please provide your dmesg, everything you can get before hang. Or at
least: grep -i -E "hpet|clock" over dmesg output.

thanks.
Comment 25 Andrew Schultz 2008-07-07 10:14:06 EDT
AFAICT, all of the dmesg output before the hang is in attachment 309935 [details].  grep says:

Real Time Clock Driver v1.12ac
Time: pit clocksource has been installed.
Comment 26 Anton Arapov 2008-07-07 10:18:01 EDT
oops. my bad, didn't notice that it's already here ...
Comment 27 Anton Arapov 2008-07-08 03:11:45 EDT
Andrew, does acpi=noirq parameter avoid the issue? 
Comment 28 Anton Arapov 2008-07-08 03:49:32 EDT
Andrew, provide lspci output as well.

meanwhile, will be also usefull to know whether the system boots with noapic
option and hpet=force separately ... not sure whether we have =force in 2.6.18.
.. but please, try it.
Comment 29 Anton Arapov 2008-07-08 04:04:54 EDT
putting bz to NEEDINFO.

Andrew please provide me info, I've asked about as soon as you can. And I will
make a couple of shots in the dark, in order to fix the issue. :)
Comment 30 Andrew Schultz 2008-07-08 20:01:06 EDT
Created attachment 311328 [details]
lspci output

One of your shots in the dark found a mark!

acpi=noirq  => still hangs
noapic	    => no hang
hpet=force  => still hangs

Please let me know of any other info that would be helpful
Comment 31 Anton Arapov 2008-07-09 02:49:10 EDT
hehehe, I did not shoot yet... wait for a while. :)
Comment 32 Anton Arapov 2008-07-09 06:23:47 EDT
please, test these two kernels:
http://people.redhat.com/aarapov/kernel/bz405361/
kernel-2.6.18-92.1.6.el5.bz405361.s1.i686.rpm
kernel-2.6.18-92.1.6.el5.bz405361.s2.i686.rpm

and let me know the results.
/me crossed fingers. :)
Comment 33 Andrew Schultz 2008-07-09 16:01:03 EDT
kernel-2.6.18-92.1.6.el5.bz405361.s1.i686.rpm
kernel-2.6.18-92.1.6.el5.bz405361.s2.i686.rpm

Both of these kernels booted successfully. :)
Comment 34 Anton Arapov 2008-07-09 16:16:27 EDT
Perfect! So that my assumptions were correct!
Andrew, keep using .s1.i686.rpm, it's less intrusive and likely it will be used
as a fix. If you will have a chance to work on this kernel for a while, at least
week, let me know whether you will face any problems.
And the last, please, attach here the output of: cat /proc/cpuinfo. :-\

thank you for your activity. :)
Comment 35 Anton Arapov 2008-07-09 16:24:12 EDT
Andrew, and please, attach the output of dmesg of 's1' kernel.
thanks again.
Comment 36 Anton Arapov 2008-07-09 16:26:14 EDT
hmm... I'm a little bit confused by success of both kernels ...
I need dmesg of 's2' also, to get the picture. :)
Comment 37 Andrew Schultz 2008-07-09 16:53:45 EDT
Created attachment 311419 [details]
/proc/cpuinfo
Comment 38 Andrew Schultz 2008-07-09 16:55:05 EDT
Created attachment 311420 [details]
dmesg with s1

> hmm... I'm a little bit confused by success of both kernels ...

"oops"
RPM happily added nosmp as commandline argument for both kernels.  Removing
that, s1 still boots happily and s2 hangs.
Comment 39 Anton Arapov 2008-07-09 17:03:18 EDT
hah! now it's much better! :) that's exactly what I expected.
Comment 41 Anton Arapov 2008-07-09 17:16:58 EDT
Created attachment 311423 [details]
proposed patch
Comment 43 Anton Arapov 2008-07-11 03:16:45 EDT
Andrew, I'd like to ask you once again. :)
I need 'lspci -vv' output.
Comment 44 Anton Arapov 2008-08-01 02:38:22 EDT
I'm closing this bug as WONTFIX.
- upstream fix avoid the problem, not solving
- Red Hat has Customers, who can be injured by the fix,
  it changes behavior
- the problem will never be fixed in upstream
  clocksource is old and is not used anymore
- have no hardware to play, and will not have

you can use noapic parameter as a workaround so far if it helps to boot the
system and it works stable after.
Comment 45 Andrew Schultz 2008-08-15 14:27:55 EDT
Created attachment 314399 [details]
lspci -vv

> you can use noapic parameter as a workaround so far if it helps to boot the
> system and it works stable after.

FWIW, noapic allows the system to boot, but it's unstable.  Switching to 2.6.18-92.1.10, I tried noapic and had to grab the SRPM and rebuild with attachment 311423 [details] to get a stable SMP system.

booting with nosmp is stable, but certainly suboptimal.

I'm attaching the lspci -vv output you asked for previously.
Comment 46 Pekka Savola 2008-09-14 10:16:52 EDT
FWIW, I had kind of inverse problem: on F9 running 2.6.25 or 2.6.26 kernel, if I add "nosmp" line as boot option, boot hangs at udev.  "acpi=off nosmp" works though, as works the default SMP mode.  This is an AMD Sempron system with 1 CPU.
Comment 47 deckies 2012-01-08 20:15:47 EST
Hi guys,

If you are using VMmare and you choose two (2) processors in your virtual machine, your system will freeze at udev...Try to use only one (1) processor in your virtual machine. 

I hope i can contribute to this forum...

Cheers to all!

Note You need to log in before you can comment on or make changes to this bug.