Bug 795992

Summary: Install DVD won't boot, displays messages: rcu_sched detected stalls on CPUs/tasks
Product: [Fedora] Fedora Reporter: Tim Flink <tflink>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, stanley.king
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.3.0-0.rc4.git1.4.fc17 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-28 10:56:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
console log of failed boot attempt with f17 alpha RC3 DVD
none
console log of boot attempt with updated kernel none

Description Tim Flink 2012-02-22 00:19:29 UTC
Created attachment 564804 [details]
console log of failed boot attempt with f17 alpha RC3 DVD

When I boot the F17 Alpha RC3 or RC4 DVD on my machine, it just spits out error messages for every core on my CPU (Intel Core 2 Quad Q6600) on a regular basis. I tried leaving it alone for at least 30 minutes but the boot process never finished.

The error messages that I'm seeing (1 for each core) are similar to:

NMI backtrace for cpu 1
CPU 1 
Modules linked in:

Pid: 0, comm: swapper/1 Tainted: G          I  3.3.0-0.rc3.git7.2.fc17.x86_64 #1 Hewlett-Packard HP xw4600 Workstation/0AA0h
RIP: 0010:[<ffffffff81043ca6>]  [<ffffffff81043ca6>] native_safe_halt+0x6/0x10
RSP: 0018:ffff880118cade18  EFLAGS: 00000206
RAX: ffff880118ca2680 RBX: ffff8801100e2770 RCX: 0000000225c17d03
RDX: ffff880118ca2680 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff880118cade18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801100e2520
R13: 0000000000000001 R14: ffff8801100e2540 R15: 127488014ef52db3
FS:  0000000000000000(0000) GS:ffff88011b000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/1 (pid: 0, threadinfo ffff880118cac000, task ffff880118ca2680)
Stack:
 ffff880118cade28 ffffffff813b1061 ffff880118cade38 ffffffff813b109a
 ffff880118cade98 ffffffff813b1111 0000000000000003 000000003b91ebf1
 0000000000000003 000000003b91ebf1 0000000000000000 ffff8801100e2540
Call Trace:
 [<ffffffff813b1061>] acpi_safe_halt+0x2f/0x4d
 [<ffffffff813b109a>] acpi_idle_do_entry+0x1b/0x2b
 [<ffffffff813b1111>] acpi_idle_enter_c1+0x67/0xc9
 [<ffffffff81519c53>] cpuidle_idle_call+0xb3/0x540
 [<ffffffff8101821f>] cpu_idle+0xbf/0x130 
 [<ffffffff8168a5f9>] start_secondary+0x290/0x292
Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 
Call Trace:
 [<ffffffff813b1061>] acpi_safe_halt+0x2f/0x4d
 [<ffffffff813b109a>] acpi_idle_do_entry+0x1b/0x2b
 [<ffffffff813b1111>] acpi_idle_enter_c1+0x67/0xc9
 [<ffffffff81519c53>] cpuidle_idle_call+0xb3/0x540
 [<ffffffff8101821f>] cpu_idle+0xbf/0x130
 [<ffffffff8168a5f9>] start_secondary+0x290/0x292

I've attached the boot log that I grabbed from the serial console with args:
initrd=initrd.img root=live:CDLABEL=Fedora\x2017-Alpha\x20x86_64 rd.luks=0 rd.md=0 rd.dm=0 rd.debug console=tty0 console=ttyS0,38400n8 BOOT_IMAGE=vmlinuz

Comment 1 Josh Boyer 2012-02-22 01:29:45 UTC
That's, erm... cute.

We had a report of massive slowness for some people in bug 795050.  I dropped an RCU related patch because of it that might be causing this.  I will admit that is just a slightly educated guess, but if you could try:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3809140

when it completes we'll know for sure.

Comment 2 Tim Flink 2012-02-22 17:46:51 UTC
Created attachment 565056 [details]
console log of boot attempt with updated kernel

I built a custom boot.iso with the kernel mentioned in comment#1

I see the same symptoms on boot with a slightly different stack trace. I wonder if the following warning is at all related:

------------[ cut here ]------------
WARNING: at drivers/iommu/dmar.c:492 warn_invalid_dmar+0x92/0xa0()
Hardware name: HP xw4600 Workstation
Your BIOS is broken; DMAR reported at address fed90000 returns all ones!
BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.22; Product Version:  
Modules linked in:
Pid: 0, comm: swapper Not tainted 3.3.0-0.rc3.git7.2.fc17.x86_64 #1
Call Trace:
 [<ffffffff81060bef>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff81060c8f>] warn_slowpath_fmt_taint+0x3f/0x50
 [<ffffffff81044119>] ? native_flush_tlb_single+0x9/0x10
 [<ffffffff81f0aa98>] ? __early_set_fixmap+0x99/0xa0
 [<ffffffff815390f2>] warn_invalid_dmar+0x92/0xa0
 [<ffffffff81f34f9f>] check_zero_address+0xc8/0xf7
 [<ffffffff816ab0df>] ? bad_to_user+0x7f9/0x7f9
 [<ffffffff81f34fe5>] detect_intel_iommu+0x17/0xb9
 [<ffffffff81efe068>] pci_iommu_alloc+0x4a/0x73
 [<ffffffff81f0a857>] mem_init+0x19/0xed
 [<ffffffff816900b5>] ? set_nmi_gate+0x48/0x4a
 [<ffffffff81ef6a3a>] start_kernel+0x1f4/0x407
 [<ffffffff81ef6346>] x86_64_start_reservations+0x131/0x135
 [<ffffffff81ef644a>] x86_64_start_kernel+0x100/0x10f
---[ end trace a7919e7f17c0a725 ]---

Comment 3 Josh Boyer 2012-02-22 18:05:03 UTC
(In reply to comment #2)
> Created attachment 565056 [details]
> console log of boot attempt with updated kernel
> 
> I built a custom boot.iso with the kernel mentioned in comment#1

Erm... I think whatever you did went wrong.  3.3.0-rc3.git7.2.fc17.x86_64 is the kernel you originally had issues with.  The kernel I built in comment #1 is 3.3.0-rc4.git1.4

> I see the same symptoms on boot with a slightly different stack trace. I wonder
> if the following warning is at all related:
> 
> ------------[ cut here ]------------
> WARNING: at drivers/iommu/dmar.c:492 warn_invalid_dmar+0x92/0xa0()
> Hardware name: HP xw4600 Workstation
> Your BIOS is broken; DMAR reported at address fed90000 returns all ones!
> BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.22; Product Version:  
> Modules linked in:

Broken BIOSes are usually pretty crappy.  Look for an update or boot with iommu=off

Comment 4 Tim Flink 2012-02-22 18:46:50 UTC
(In reply to comment #3)
> Erm... I think whatever you did went wrong.  3.3.0-rc3.git7.2.fc17.x86_64 is
> the kernel you originally had issues with.  The kernel I built in comment #1 is 3.3.0-rc4.git1.4

Crap, I didn't notice that. One of these days, I'm going to fix this iso building script to quit when it can't find the updated builds I want.

Will retry, verifying the presence of the updated kernel this time.

Comment 5 Tim Flink 2012-02-22 19:35:14 UTC
OK, I built another custom boot.iso using the right kernel this time.

I am now able to boot into the installer without issue. The new kernel appears to have fixed the problem I was seeing.

Comment 6 Josh Boyer 2012-02-22 19:53:41 UTC
(In reply to comment #5)
> OK, I built another custom boot.iso using the right kernel this time.
> 
> I am now able to boot into the installer without issue. The new kernel appears
> to have fixed the problem I was seeing.

Thanks Tim.  I'll get this queued up as an update today.

Comment 7 Fedora Update System 2012-02-22 19:58:20 UTC
kernel-3.3.0-0.rc4.git1.4.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.3.0-0.rc4.git1.4.fc17

Comment 8 Fedora Update System 2012-02-23 22:31:13 UTC
Package kernel-3.3.0-0.rc4.git1.4.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.3.0-0.rc4.git1.4.fc17'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2304/kernel-3.3.0-0.rc4.git1.4.fc17
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2012-02-28 10:56:03 UTC
kernel-3.3.0-0.rc4.git1.4.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.