405361 – System freeze at "Starting udev..."

Bug 405361 - System freeze at "Starting udev..."

Summary: System freeze at "Starting udev..."

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	athlon
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Anton Arapov
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-11-30 04:15 UTC by Andrew Schultz
Modified:	2014-06-18 08:01 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-08-01 06:38:22 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch against initscripts to remove leftover files in /var/run/console (373 bytes, patch) 2008-06-19 10:13 UTC, Klaus Steinberger	no flags	Details \| Diff
Patch to remove leftover files in /var/run/console (622 bytes, patch) 2008-06-19 15:57 UTC, Klaus Steinberger	no flags	Details \| Diff
kernel log with udevdebug (42.31 KB, text/plain) 2008-06-19 23:41 UTC, Andrew Schultz	no flags	Details
udevdebug log without libmtp (81.85 KB, text/plain) 2008-06-20 16:38 UTC, Andrew Schultz	no flags	Details
lspci output (893 bytes, text/plain) 2008-07-09 00:01 UTC, Andrew Schultz	no flags	Details
/proc/cpuinfo (831 bytes, text/plain) 2008-07-09 20:53 UTC, Andrew Schultz	no flags	Details
dmesg with s1 (12.23 KB, text/plain) 2008-07-09 20:55 UTC, Andrew Schultz	no flags	Details
proposed patch (1.92 KB, patch) 2008-07-09 21:16 UTC, Anton Arapov	no flags	Details \| Diff
lspci -vv (5.34 KB, text/plain) 2008-08-15 18:27 UTC, Andrew Schultz	no flags	Details
Show Obsolete (3) View All

Description Andrew Schultz 2007-11-30 04:15:35 UTC

On bootup, the system freezes at "Starting udev".  Unfortunately, there is no
output to any log.  This is an AMD Athlon 2800 MP.  As a workaround, i can boot
up with nosmp and the system runs fine.

Version-Release number of selected component (if applicable):
kernel-2.6.18-53.1.4.el5 (also 2.6.18-53.el5)
The system ran fine with the RHEL4 kernel (2.6.8-55.0.12.ELsmp)

How reproducible:
100%

Comment 1 Andrew Schultz 2007-12-02 02:10:11 UTC

I tried running FC6's release kernel (2.6.18-1.2798.fc6) and most updated kernel
(2.6.22.9-61.fc6) and both booted fine in SMP mode.

Comment 2 Prarit Bhargava 2007-12-06 15:01:16 UTC

Andrew, Modify the rc.sysinit script so that it does a verbose output of the
"udev" steps.  

This should give you some better output (and debugging info for this BZ!)

P.

Comment 3 Andrew Schultz 2007-12-06 19:09:55 UTC

/sbin/start_udev invokes udevsettle, which never exits.

I ran udevsettle under strace and it ended with

nanosleep({0, 50000000}, NULL)
stat64("/dev/.udev/queue",...
nanosleep({0, 50000000},

[the last nanosleep never finishes]

Comment 4 Anton Arapov 2008-06-16 08:59:43 UTC

Andrew, I'd like to know, whether /sbin/start_dev is not working for you on
RHEL5.2, kernel-2.6.18-92+

Thanks in advance.

Comment 5 Andrew Schultz 2008-06-16 14:36:19 UTC

This still fails for me with the latest kernel, 2.6.18-92.1.1.el5

Comment 6 Harald Hoyer 2008-06-17 09:22:36 UTC

udevsettle just waits that the work queue of udevd is empty.

In most cases, udevd does not complete the queue, because a kernel module is
hanging.

Adding "udevinfo" or "udevdebug" to the kernel command line enables output of
debug messages, which can be redirected via a serial console with e.g.
"console=ttyS0,9600n8" (
http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/configure-kernel.html )

Comment 7 Harald Hoyer 2008-06-17 09:23:38 UTC

btw, udev can't log anywhere, because the filesystem is not mounted writeable.

Comment 8 Anton Arapov 2008-06-17 10:59:31 UTC

Andrew, I'd like to ask you to provide debug messages.

Comment 9 Klaus Steinberger 2008-06-19 10:13:51 UTC

Created attachment 309838 [details]
Patch against initscripts to remove leftover files in /var/run/console

Comment 10 Klaus Steinberger 2008-06-19 10:20:23 UTC

I had also the udev hangs. My systems are authenticating against LDAP. I did
some debugging, udev started to setup devices and began to fork off
"pam_console_apply" after that it hung. (sorry no debug screenshot, it was on
the console)

The hung happened in case pam_console leftover his status files in
/var/run/console, so pam_console_applys tried to give the permission to the
console user. But at the time udev starts, there is no network connection
available. I suspect that the LDAP requests did not fail, but instead hung, and
so also pam_console_apply did not return.

I uploaded a onliner patch already against initscripts, which removes leftover
files just before the start of udev.

I my case that fixed the problem.

Sincerly,
Klaus

Comment 11 Harald Hoyer 2008-06-19 14:00:36 UTC

(In reply to comment #9)
> Created an attachment (id=309838) [edit]
> Patch against initscripts to remove leftover files in /var/run/console
> 

Nice catch.. did you file a bugzilla against initscripts with this?

Comment 12 Klaus Steinberger 2008-06-19 15:57:37 UTC

Created attachment 309859 [details]
Patch to remove leftover files in /var/run/console

The previous patch did not work, as the filesystem is not writable at that
time.

As a workaround the new patch remount rw, removes the files and then remount's
ro again.

That's not the perfect solution, but should work. 

As a long term solution, there should be two things:
1. Make pam_console_apply bulletproof against network outage. so it will not
   hang on ldap requests.
2. Also cleanup /var/run/console on Shutdown (will not help for crashes of
course)

Comment 13 Andrew Schultz 2008-06-19 23:41:57 UTC

Created attachment 309891 [details]
kernel log with udevdebug

This is the bootup log with udevdebug.	I tried udevinfo, but it didn't seem to
print anything extra.  The "add_to_rules: unknown key 'ATTR{id____}'" messages
have started only with more recent kernels (perhaps 2.6.18-92).  I get them
even with nosmp.

Comment 14 Harald Hoyer 2008-06-20 06:04:27 UTC

please provide the output of:
# grep -rl ATTR /etc/udev/rules.d/|sort -u

Comment 15 Harald Hoyer 2008-06-20 06:08:01 UTC

Hmm, looking at the udevdebug output, I see no module gets loaded and nothing
special happening.

I guess this is a kernel smp lockup.

Comment 16 Anton Arapov 2008-06-20 06:15:13 UTC

(In reply to comment #15)
Yes, and it was original intention... I'm working on it and wanted to exclude
any possible non-kernel issues.

Comment 17 Harald Hoyer 2008-06-20 06:25:34 UTC

comment #14 would be nice to have

Comment 18 Andrew Schultz 2008-06-20 16:38:41 UTC

Created attachment 309935 [details]
udevdebug log without libmtp

# grep -rl ATTR /etc/udev/rules.d/|sort -u

the offending file there was from libmtp (rebuilt from the F8 SRPM).  I removed
that and collected a new log.

Comment 19 Anton Arapov 2008-06-23 08:53:12 UTC

what a wonderful udevlog...
Andrew, does it still hangs after you removed libmtp?

Comment 20 Andrew Schultz 2008-06-23 15:50:00 UTC

yes

Comment 21 Anton Arapov 2008-06-24 07:38:12 UTC

hmm... I'm taken aback! You have not any oops/whatnot message in dmesg. I have
no luck with reproducing, but have no appropritate hardware... :(

Andrew, please, try to get vmcore when system hung.
http://kbase.redhat.com/faq/FAQ_105_9036.shtm
try Alt-syrq-c, if it will not produce vmcore automagically on hang.

Comment 22 Andrew Schultz 2008-06-25 00:27:59 UTC

kdump did not seem to kick in automatically, and alt-sysq-c did not seem to
help.    I'm not sure where it would put the vmcore file anyway at that point in
the bootup sequence.

Comment 23 Anton Arapov 2008-06-25 06:24:42 UTC

the hope, that this will trigger automatically at this moment of boot has been
died. :(

Comment 24 Anton Arapov 2008-07-07 12:07:17 UTC

Andrew, please provide your dmesg, everything you can get before hang. Or at
least: grep -i -E "hpet|clock" over dmesg output.

thanks.

Comment 25 Andrew Schultz 2008-07-07 14:14:06 UTC

AFAICT, all of the dmesg output before the hang is in attachment 309935 [details].  grep says:

Real Time Clock Driver v1.12ac
Time: pit clocksource has been installed.

Comment 26 Anton Arapov 2008-07-07 14:18:01 UTC

oops. my bad, didn't notice that it's already here ...

Comment 27 Anton Arapov 2008-07-08 07:11:45 UTC

Andrew, does acpi=noirq parameter avoid the issue?

Comment 28 Anton Arapov 2008-07-08 07:49:32 UTC

Andrew, provide lspci output as well.

meanwhile, will be also usefull to know whether the system boots with noapic
option and hpet=force separately ... not sure whether we have =force in 2.6.18.
.. but please, try it.

Comment 29 Anton Arapov 2008-07-08 08:04:54 UTC

putting bz to NEEDINFO.

Andrew please provide me info, I've asked about as soon as you can. And I will
make a couple of shots in the dark, in order to fix the issue. :)

Comment 30 Andrew Schultz 2008-07-09 00:01:06 UTC

Created attachment 311328 [details]
lspci output

One of your shots in the dark found a mark!

acpi=noirq  => still hangs
noapic	    => no hang
hpet=force  => still hangs

Please let me know of any other info that would be helpful

Comment 31 Anton Arapov 2008-07-09 06:49:10 UTC

hehehe, I did not shoot yet... wait for a while. :)

Comment 32 Anton Arapov 2008-07-09 10:23:47 UTC

please, test these two kernels:
http://people.redhat.com/aarapov/kernel/bz405361/
kernel-2.6.18-92.1.6.el5.bz405361.s1.i686.rpm
kernel-2.6.18-92.1.6.el5.bz405361.s2.i686.rpm

and let me know the results.
/me crossed fingers. :)

Comment 33 Andrew Schultz 2008-07-09 20:01:03 UTC

kernel-2.6.18-92.1.6.el5.bz405361.s1.i686.rpm
kernel-2.6.18-92.1.6.el5.bz405361.s2.i686.rpm

Both of these kernels booted successfully. :)

Comment 34 Anton Arapov 2008-07-09 20:16:27 UTC

Perfect! So that my assumptions were correct!
Andrew, keep using .s1.i686.rpm, it's less intrusive and likely it will be used
as a fix. If you will have a chance to work on this kernel for a while, at least
week, let me know whether you will face any problems.
And the last, please, attach here the output of: cat /proc/cpuinfo. :-\

thank you for your activity. :)

Comment 35 Anton Arapov 2008-07-09 20:24:12 UTC

Andrew, and please, attach the output of dmesg of 's1' kernel.
thanks again.

Comment 36 Anton Arapov 2008-07-09 20:26:14 UTC

hmm... I'm a little bit confused by success of both kernels ...
I need dmesg of 's2' also, to get the picture. :)

Comment 37 Andrew Schultz 2008-07-09 20:53:45 UTC

Created attachment 311419 [details]
/proc/cpuinfo

Comment 38 Andrew Schultz 2008-07-09 20:55:05 UTC

Created attachment 311420 [details]
dmesg with s1

> hmm... I'm a little bit confused by success of both kernels ...

"oops"
RPM happily added nosmp as commandline argument for both kernels.  Removing
that, s1 still boots happily and s2 hangs.

Comment 39 Anton Arapov 2008-07-09 21:03:18 UTC

hah! now it's much better! :) that's exactly what I expected.

Comment 40 Anton Arapov 2008-07-09 21:12:52 UTC

fix:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f4a0b917ce72ef47e438d354c433eb645218e87

Comment 41 Anton Arapov 2008-07-09 21:16:58 UTC

Created attachment 311423 [details]
proposed patch

Comment 43 Anton Arapov 2008-07-11 07:16:45 UTC

Andrew, I'd like to ask you once again. :)
I need 'lspci -vv' output.

Comment 44 Anton Arapov 2008-08-01 06:38:22 UTC

I'm closing this bug as WONTFIX.
- upstream fix avoid the problem, not solving
- Red Hat has Customers, who can be injured by the fix,
  it changes behavior
- the problem will never be fixed in upstream
  clocksource is old and is not used anymore
- have no hardware to play, and will not have

you can use noapic parameter as a workaround so far if it helps to boot the
system and it works stable after.

Comment 45 Andrew Schultz 2008-08-15 18:27:55 UTC

Created attachment 314399 [details]
lspci -vv

> you can use noapic parameter as a workaround so far if it helps to boot the
> system and it works stable after.

FWIW, noapic allows the system to boot, but it's unstable.  Switching to 2.6.18-92.1.10, I tried noapic and had to grab the SRPM and rebuild with attachment 311423 [details] to get a stable SMP system.

booting with nosmp is stable, but certainly suboptimal.

I'm attaching the lspci -vv output you asked for previously.

Comment 46 Pekka Savola 2008-09-14 14:16:52 UTC

FWIW, I had kind of inverse problem: on F9 running 2.6.25 or 2.6.26 kernel, if I add "nosmp" line as boot option, boot hangs at udev.  "acpi=off nosmp" works though, as works the default SMP mode.  This is an AMD Sempron system with 1 CPU.

Comment 47 deckies 2012-01-09 01:15:47 UTC

Hi guys,

If you are using VMmare and you choose two (2) processors in your virtual machine, your system will freeze at udev...Try to use only one (1) processor in your virtual machine. 

I hope i can contribute to this forum...

Cheers to all!

Note You need to log in before you can comment on or make changes to this bug.