Bug 625077
Summary: | kernel-2.6.36-0.[1234].rc1.git[01].fc15 fails to boot | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tom London <selinux> |
Component: | kernel | Assignee: | Roland McGrath <roland> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | anton, aquini, dougsland, eblake, frankly3d, gansalmon, itamar, jlayton, jonathan, kernel-maint, kurtdriver, madhu.chinakonda, masao-takahashi, michal, pebolle, rjones, robatino, spoffley, tomek, yaneti, zing |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-08-24 14:07:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
kernel-2.6.36-0.3.rc1.git1.fc15.x86_64 fails to boot in exactly the same fashion as kernel-2.6.36-0.1.rc1.git0.fc15.x86_64. Sorry, no better debug info other than the screenshot already posted..... Some other way to help? I got a same problem. kernel-2.6.36-0.2.rc1.git1.fc15.i686 Created attachment 439634 [details]
log of failed boot with kernel-2.6.36-0.1.rc1.git0.fc15.i686
0) similar situation with kernel-2.6.36-0.1.rc1.git0.fc15.i686 for me, on a IBM ThinkPad T41.
1) Note from the log that the cryptomgr_test message is just the last thing printed to the screen: there's a lot more happening after that. One just never gets to see a (graphical or text) login prompt.
initcall_debug ends with "calling sha1_generic_mod_init+0x0/0x12 @ 1 (In reply to comment #3) > One just never gets to see a (graphical or text) login prompt. For what it's worth: ditto for logging in over serial line. These are the last lines printed over the serial line to the other machine: [...] type=2000 audit(1282213485.561:1): initialized highmem bounce pool size: 64 pages HugeTLB registered 4 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.2 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) msgmni has been set to 1680 SELinux: Registering netfilter hooks cryptomgr_test used greatest stack depth: 7096 bytes left (In reply to comment #3) > Created attachment 439634 [details] > log of failed boot with kernel-2.6.36-0.1.rc1.git0.fc15.i686 > > 1) Note from the log that the cryptomgr_test message is just the last thing > printed to the screen: there's a lot more happening after that. 2.6.36-0.0.rc0.git1.fc15.i686 != kernel-2.6.36-0.1.rc1.git0.fc15.i686, so the log wasn't from latest kernel (that fails to boot) but from the last one that still works for me. It seems the cryptomgr_test line really was the last thing the kernel prints. (And since the system needs to be shut down by removing power, nothing shows up in the log.) * Wed Aug 18 2010 Chuck Ebbert <cebbert> - 2.6.36-0.3.rc1.git1 - Fix hangs on boot with some AMD processors (x86-cpu-fix-regression-in-amd-errata-checking-code.patch) - Drop unused ssb_check_for_sprom.patch kernel-2.6.36-0.3.rc1.git1.fc15.x86_64 & kernel-2.6.36-0.1.rc1.git0.fc15.x86_64 hang at boot for me. kernel-2.6.36-0.0.rc0.git1.fc15.x86_64 works. The above fix either didn't work or was not all that was wrong. Created attachment 439746 [details]
boot messages from guest serial console
The machine not booting in my case is a rawhide KVM guest. I have it set up for serial console and here are the boot messages. It freezes after the cryptomgr_test line.
kernel-2.6.36-0.4.rc1.git1.fc15.x86_64 fails to boot, seemingly in exactly the same way..... It freezes after the cryptomgr_test line. Created attachment 439831 [details]
traces displayed on a screen after a failed boot
.6.36-0.1.rc1.git0.fc15.x86_64 fails to boot for me getting stuck what looks like a somewhat different place. Lines printed on my screen look like this:
.....
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 4440k freed
DMA-API: preallocated 32768 debug entries
DMA-API: debugging enabled by kernel config
agpgart-amd64 0000:00:00.0: AGP bridge [1106/3188]
agpgart-amd64 0000:00:00.0: AGP aperture is 256M @ 0xe0000000
work_for_cpu used greatest stack depth: 6008 bytes left
and at this moment everything gets stuck for a number of minutes but not forever. If you will wait long enough then suddenly a screen looks like on an attached picture and then it is frozen apparently permanently. Obviously something got scrolled off of the top but this happens in one swoop. There is no visible scrolling of any sort whatsoever.
For a reference dmesg from a boot with 2.6.36-0.0.rc0.git1.fc15.x86_64. The next line which never shows up with rc1 should read:
audit: initializing netlink socket (disabled)
Created attachment 439832 [details]
dmesg output from booting 2.6.36-0.0.rc0.git1.fc15.x86_64
Michal, I have the exact same symptoms and I have isolated it to the fedora utrace patch. Whether or not it also causes the cryptomgr_test thing perhaps some here can test. Same problem in a VirtualBox x86_64 guest. The last 3 kernels hang at the cryptomgr_test line. The last one which is bootable is 2.6.36-0.0.rc0.git1.fc15.x86_64. Smolt URL: http://www.smolts.org/client/show/pub_7ab0c5a8-dcfb-41fc-8c85-fcd0ec5fc674 This continues on my laptop even after commenting out the latest kernel. I just today upgraded from F13 to rawhide, this time stopping for a few hours at F14. I wonder if http://lkml.org/lkml/2010/8/17/439 and followups with a thread subject "2.6.36-rc1: Doesn't boot on an AMD-based machine" is not relevant at least to some reported troubles? (In reply to comment #15) > I wonder if http://lkml.org/lkml/2010/8/17/439 and followups with a thread > subject "2.6.36-rc1: Doesn't boot on an AMD-based machine" is not relevant at > least to some reported troubles? That was fixed in 2.6.36-0.3.rc1.git1 Still hangs with the latest patches from http://people.redhat.com/roland/utrace/2.6-current/ Confirmed, removing the utrace patches makes it boot. Dropped the utrace patch in 2.6.36-0.6.rc1.git3 Confirmed that kernel-2.6.36-0.5.rc1.git3.fc15 from today's Rawhide push doesn't boot, but kernel-2.6.36-0.6.rc1.git3.fc15 from Koji does. kernel-2.6.36-0.6.rc1.git3.fc15.x86_64 boots for me as well.... kernel-2.6.36-0.6.rc1.git3.fc15 boots, yet I am getting : kernel: CPU0: AMD Phenom(tm) 9500 Quad-Core Processor stepping 02 kernel: NMI watchdog enabled, takes one hw-pmu counter. kernel: lockdep: fixing up alternatives. kernel: kernel: =================================================== kernel: [ INFO: suspicious rcu_dereference_check() usage. ] kernel: --------------------------------------------------- kernel: kernel/sched.c:618 invoked rcu_dereference_check() without protection! kernel: kernel: other info that might help us debug this: kernel: kernel: kernel: rcu_scheduler_active = 1, debug_locks = 0 kernel: 3 locks held by kworker/0:0/4: kernel: #0: (events){+.+.+.}, at: [<ffffffff81067e11>] process_one_work+0x160/0x2ec kernel: #1: ((&c_idle.work)){+.+.+.}, at: [<ffffffff81067e11>] process_one_work+0x160/0x2ec kernel: #2: (&rq->lock){-.....}, at: [<ffffffff81490279>] init_idle+0x30/0x131 kernel: kernel: stack backtrace: kernel: Pid: 4, comm: kworker/0:0 Not tainted 2.6.36-0.6.rc1.git3.fc15.x86_64 #1 kernel: Call Trace: kernel: [<ffffffff8107d944>] lockdep_rcu_dereference+0xaa/0xb3 kernel: [<ffffffff810401ce>] task_group+0x80/0x8f kernel: [<ffffffff810401f4>] set_task_rq+0x17/0x73 kernel: [<ffffffff81490333>] init_idle+0xea/0x131 kernel: [<ffffffff81490703>] fork_idle+0x92/0xa3 kernel: [<ffffffff8101050d>] ? sched_clock+0x9/0xd kernel: [<ffffffff8148e17c>] do_fork_idle+0x1c/0x2d kernel: [<ffffffff81067e6d>] process_one_work+0x1bc/0x2ec kernel: [<ffffffff81067e11>] ? process_one_work+0x160/0x2ec kernel: [<ffffffff8148e160>] ? do_fork_idle+0x0/0x2d kernel: [<ffffffff81068d6c>] ? manage_workers.clone.9+0xe0/0x173 kernel: [<ffffffff81068f03>] worker_thread+0x104/0x19b kernel: [<ffffffff81068dff>] ? worker_thread+0x0/0x19b kernel: [<ffffffff8106c84c>] kthread+0x9d/0xa5 kernel: [<ffffffff8100aae4>] kernel_thread_helper+0x4/0x10 kernel: [<ffffffff81498150>] ? restore_args+0x0/0x30 kernel: [<ffffffff8106c7af>] ? kthread+0x0/0xa5 kernel: [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10 kernel: Booting Node 0, Processors #1 kernel: NMI watchdog enabled, takes one hw-pmu counter. kernel: lockdep: fixing up alternatives. kernel: #2 kernel: NMI watchdog enabled, takes one hw-pmu counter. kernel: lockdep: fixing up alternatives. kernel: #3 kernel: NMI watchdog enabled, takes one hw-pmu counter. (In reply to comment #22) I guess the above information is irrelavent since I look back at my logs and have been getting something of this sort as far back as 2.6.34-0.38.rc5.git0.fc14.x86_64 *** Bug 624854 has been marked as a duplicate of this bug. *** (In reply to comment #22) > kernel-2.6.36-0.6.rc1.git3.fc15 boots, yet I am getting : ... > kernel: kernel/sched.c:618 invoked rcu_dereference_check() without protection! That one will be bug 572520 and bug 610967 AFAICS. I am getting a similar problem on an ACER Aspire 5810T-8952 Laptop with an Intel Core 2 Solo SU3500 processor running 64 bit rawhide. When I attempt to boot any kernel after 2.6.36-0.0.rc0.git1.fc15.x86_64 I just get a black screen, the disks do not appear to be accessing and I cannot even boot into Init level 1. No messages to any log files. This appears to fix it for me: http://koji.fedoraproject.org/koji/buildinfo?buildID=191339 Anyone else? The fixed kernel-2.6.36-0.6.rc1.git3.fc15 is in today's Rawhide push as well. From "rawhide report: 20100822 changes": kernel-2.6.36-0.6.rc1.git3.fc15 ------------------------------- * Sat Aug 21 2010 Chuck Ebbert <cebbert at redhat.com> - 2.6.36-0.6.rc1.git3 - Drop utrace patch that causes hang on boot. The last two kernels (2.6.36-0.7.rc2.git0.fc15 and 2.6.36-0.8.rc2.git0.fc15) boot as well, even though today's rawhide update says kernel-2.6.36-0.8.rc2.git0.fc15 ------------------------------- * Mon Aug 23 2010 Roland McGrath <roland at redhat.com> - 2.6.36-0.8.rc2.git0 - utrace update Does this mean that the cause of the hang is now understood, and this bug can be closed? |
Created attachment 439401 [details] Screenshot showing boot hang Description of problem: Boot hangs very early in cycle. I attach screenshot showing last messages: SELinux: Registering netfilter hooks cryptomgr_test used greatest stack depth: 5968 bytes left cryptomgr_test used greatest stack depth: 5944 bytes left System is Thinkpad X200: [root@tlondon ~]# lspci 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07) 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07) 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network Connection (rev 03) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03) 00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03) 00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03) 00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93) 00:1f.0 ISA bridge: Intel Corporation ICH9M-E LPC Interface Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03) 03:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection [root@tlondon ~]# Version-Release number of selected component (if applicable): kernel-2.6.36-0.1.rc1.git0.fc15.x86_64 How reproducible: Every boot attempt Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: