On bootup, the system freezes at "Starting udev". Unfortunately, there is no output to any log. This is an AMD Athlon 2800 MP. As a workaround, i can boot up with nosmp and the system runs fine. Version-Release number of selected component (if applicable): kernel-2.6.18-53.1.4.el5 (also 2.6.18-53.el5) The system ran fine with the RHEL4 kernel (2.6.8-55.0.12.ELsmp) How reproducible: 100%
I tried running FC6's release kernel (2.6.18-1.2798.fc6) and most updated kernel (2.6.22.9-61.fc6) and both booted fine in SMP mode.
Andrew, Modify the rc.sysinit script so that it does a verbose output of the "udev" steps. This should give you some better output (and debugging info for this BZ!) P.
/sbin/start_udev invokes udevsettle, which never exits. I ran udevsettle under strace and it ended with nanosleep({0, 50000000}, NULL) stat64("/dev/.udev/queue",... nanosleep({0, 50000000}, [the last nanosleep never finishes]
Andrew, I'd like to know, whether /sbin/start_dev is not working for you on RHEL5.2, kernel-2.6.18-92+ Thanks in advance.
This still fails for me with the latest kernel, 2.6.18-92.1.1.el5
udevsettle just waits that the work queue of udevd is empty. In most cases, udevd does not complete the queue, because a kernel module is hanging. Adding "udevinfo" or "udevdebug" to the kernel command line enables output of debug messages, which can be redirected via a serial console with e.g. "console=ttyS0,9600n8" ( http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/configure-kernel.html )
btw, udev can't log anywhere, because the filesystem is not mounted writeable.
Andrew, I'd like to ask you to provide debug messages.
Created attachment 309838 [details] Patch against initscripts to remove leftover files in /var/run/console
I had also the udev hangs. My systems are authenticating against LDAP. I did some debugging, udev started to setup devices and began to fork off "pam_console_apply" after that it hung. (sorry no debug screenshot, it was on the console) The hung happened in case pam_console leftover his status files in /var/run/console, so pam_console_applys tried to give the permission to the console user. But at the time udev starts, there is no network connection available. I suspect that the LDAP requests did not fail, but instead hung, and so also pam_console_apply did not return. I uploaded a onliner patch already against initscripts, which removes leftover files just before the start of udev. I my case that fixed the problem. Sincerly, Klaus
(In reply to comment #9) > Created an attachment (id=309838) [edit] > Patch against initscripts to remove leftover files in /var/run/console > Nice catch.. did you file a bugzilla against initscripts with this?
Created attachment 309859 [details] Patch to remove leftover files in /var/run/console The previous patch did not work, as the filesystem is not writable at that time. As a workaround the new patch remount rw, removes the files and then remount's ro again. That's not the perfect solution, but should work. As a long term solution, there should be two things: 1. Make pam_console_apply bulletproof against network outage. so it will not hang on ldap requests. 2. Also cleanup /var/run/console on Shutdown (will not help for crashes of course)
Created attachment 309891 [details] kernel log with udevdebug This is the bootup log with udevdebug. I tried udevinfo, but it didn't seem to print anything extra. The "add_to_rules: unknown key 'ATTR{id____}'" messages have started only with more recent kernels (perhaps 2.6.18-92). I get them even with nosmp.
please provide the output of: # grep -rl ATTR /etc/udev/rules.d/|sort -u
Hmm, looking at the udevdebug output, I see no module gets loaded and nothing special happening. I guess this is a kernel smp lockup.
(In reply to comment #15) Yes, and it was original intention... I'm working on it and wanted to exclude any possible non-kernel issues.
comment #14 would be nice to have
Created attachment 309935 [details] udevdebug log without libmtp # grep -rl ATTR /etc/udev/rules.d/|sort -u the offending file there was from libmtp (rebuilt from the F8 SRPM). I removed that and collected a new log.
what a wonderful udevlog... Andrew, does it still hangs after you removed libmtp?
yes
hmm... I'm taken aback! You have not any oops/whatnot message in dmesg. I have no luck with reproducing, but have no appropritate hardware... :( Andrew, please, try to get vmcore when system hung. http://kbase.redhat.com/faq/FAQ_105_9036.shtm try Alt-syrq-c, if it will not produce vmcore automagically on hang.
kdump did not seem to kick in automatically, and alt-sysq-c did not seem to help. I'm not sure where it would put the vmcore file anyway at that point in the bootup sequence.
the hope, that this will trigger automatically at this moment of boot has been died. :(
Andrew, please provide your dmesg, everything you can get before hang. Or at least: grep -i -E "hpet|clock" over dmesg output. thanks.
AFAICT, all of the dmesg output before the hang is in attachment 309935 [details]. grep says: Real Time Clock Driver v1.12ac Time: pit clocksource has been installed.
oops. my bad, didn't notice that it's already here ...
Andrew, does acpi=noirq parameter avoid the issue?
Andrew, provide lspci output as well. meanwhile, will be also usefull to know whether the system boots with noapic option and hpet=force separately ... not sure whether we have =force in 2.6.18. .. but please, try it.
putting bz to NEEDINFO. Andrew please provide me info, I've asked about as soon as you can. And I will make a couple of shots in the dark, in order to fix the issue. :)
Created attachment 311328 [details] lspci output One of your shots in the dark found a mark! acpi=noirq => still hangs noapic => no hang hpet=force => still hangs Please let me know of any other info that would be helpful
hehehe, I did not shoot yet... wait for a while. :)
please, test these two kernels: http://people.redhat.com/aarapov/kernel/bz405361/ kernel-2.6.18-92.1.6.el5.bz405361.s1.i686.rpm kernel-2.6.18-92.1.6.el5.bz405361.s2.i686.rpm and let me know the results. /me crossed fingers. :)
kernel-2.6.18-92.1.6.el5.bz405361.s1.i686.rpm kernel-2.6.18-92.1.6.el5.bz405361.s2.i686.rpm Both of these kernels booted successfully. :)
Perfect! So that my assumptions were correct! Andrew, keep using .s1.i686.rpm, it's less intrusive and likely it will be used as a fix. If you will have a chance to work on this kernel for a while, at least week, let me know whether you will face any problems. And the last, please, attach here the output of: cat /proc/cpuinfo. :-\ thank you for your activity. :)
Andrew, and please, attach the output of dmesg of 's1' kernel. thanks again.
hmm... I'm a little bit confused by success of both kernels ... I need dmesg of 's2' also, to get the picture. :)
Created attachment 311419 [details] /proc/cpuinfo
Created attachment 311420 [details] dmesg with s1 > hmm... I'm a little bit confused by success of both kernels ... "oops" RPM happily added nosmp as commandline argument for both kernels. Removing that, s1 still boots happily and s2 hangs.
hah! now it's much better! :) that's exactly what I expected.
fix: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f4a0b917ce72ef47e438d354c433eb645218e87
Created attachment 311423 [details] proposed patch
Andrew, I'd like to ask you once again. :) I need 'lspci -vv' output.
I'm closing this bug as WONTFIX. - upstream fix avoid the problem, not solving - Red Hat has Customers, who can be injured by the fix, it changes behavior - the problem will never be fixed in upstream clocksource is old and is not used anymore - have no hardware to play, and will not have you can use noapic parameter as a workaround so far if it helps to boot the system and it works stable after.
Created attachment 314399 [details] lspci -vv > you can use noapic parameter as a workaround so far if it helps to boot the > system and it works stable after. FWIW, noapic allows the system to boot, but it's unstable. Switching to 2.6.18-92.1.10, I tried noapic and had to grab the SRPM and rebuild with attachment 311423 [details] to get a stable SMP system. booting with nosmp is stable, but certainly suboptimal. I'm attaching the lspci -vv output you asked for previously.
FWIW, I had kind of inverse problem: on F9 running 2.6.25 or 2.6.26 kernel, if I add "nosmp" line as boot option, boot hangs at udev. "acpi=off nosmp" works though, as works the default SMP mode. This is an AMD Sempron system with 1 CPU.
Hi guys, If you are using VMmare and you choose two (2) processors in your virtual machine, your system will freeze at udev...Try to use only one (1) processor in your virtual machine. I hope i can contribute to this forum... Cheers to all!