Bug 455621
Summary: | 100% CPU Use in Hardware Interrupts with noapic option after updating to new kernel | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alex Chernyakhovsky <achernya> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 9 | CC: | macro |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-09-10 01:20:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alex Chernyakhovsky
2008-07-16 17:24:49 UTC
I have a tx1000 that also needs noapic to boot. It occasionally gets into that state where a CPU gets stuck processing interrupts but usually it stops after a while. F9 kernels starting with 2.6.25.11-95 have the sysrq-l key added to show a backtrace on all processors. It can be activated by the command echo 'l' >/proc/sysrq-trigger Additional debugging will hopefully be added to the next release so we can find out why noapic is needed too. I have checked my tx2000z laptop's F9 installation, and I only have kernel-2.6.25.9-76.fc9.x86_64 and kernel-2.6.25.10-86.fc9.x86_64 installed. Is 2.6.25.11-95 still in rawhide? Additionally, I see the following message shortly after boot-up or reloading the USB modules: Jul 20 15:57:38 localhost kernel: irq 7: nobody cared (try booting with the "irqpoll" option) Jul 20 15:57:38 localhost kernel: Pid: 0, comm: swapper Tainted: P 2.6.25.9-76.fc9.x86_64 #1 Jul 20 15:57:38 localhost kernel: Jul 20 15:57:38 localhost kernel: Call Trace: Jul 20 15:57:38 localhost kernel: <IRQ> [<ffffffff8107180f>] __report_bad_irq+0x38/0x7c Jul 20 15:57:38 localhost kernel: [<ffffffff81071a35>] note_interrupt+0x1e2/0x2 49 Jul 20 15:57:38 localhost kernel: [<ffffffff8107221b>] handle_level_irq+0xb1/0xe7 Jul 20 15:57:38 localhost kernel: [<ffffffff8100e48f>] do_IRQ+0xf7/0x167 Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100c3f1>] ret_from_intr+0x0/0xa Jul 20 15:57:38 localhost kernel: <EOI> [<ffffffff8100b029>] ? default_idle+0x39/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100b024>] ? default_idle+0x34/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100afa8>] ? cpu_idle+0x78/0xc0 Jul 20 15:57:38 localhost kernel: [<ffffffff81289e7b>] ? start_secondary+0x3fc/0x40b Jul 20 15:57:38 localhost kernel: Jul 20 15:57:38 localhost kernel: handlers: Jul 20 15:57:38 localhost kernel: [<ffffffff811be414>] (usb_hcd_irq+0x0/0x63) Jul 20 15:57:38 localhost kernel: Disabling IRQ #7 Perhaps it is related? Following the recommendation to boot with irqpoll (in addition to my normal noapic) results in a deadlocked machine. However, USB works, as long as no new devices are plugged in. (In reply to comment #2) > I have checked my tx2000z laptop's F9 installation, and I only have > kernel-2.6.25.9-76.fc9.x86_64 and kernel-2.6.25.10-86.fc9.x86_64 installed. Is > 2.6.25.11-95 still in rawhide? > It has not been built yet. > Additionally, I see the following message shortly after boot-up or reloading the > USB modules: > > Jul 20 15:57:38 localhost kernel: irq 7: nobody cared (try booting with the > "irqpoll" option) > Jul 20 15:57:38 localhost kernel: Pid: 0, comm: swapper Tainted: P > 2.6.25.9-76.fc9.x86_64 #1 > Jul 20 15:57:38 localhost kernel: > Jul 20 15:57:38 localhost kernel: Call Trace: > Jul 20 15:57:38 localhost kernel: <IRQ> [<ffffffff8107180f>] > __report_bad_irq+0x38/0x7c > Jul 20 15:57:38 localhost kernel: [<ffffffff81071a35>] note_interrupt+0x1e2/0x2 > 49 > Jul 20 15:57:38 localhost kernel: [<ffffffff8107221b>] handle_level_irq+0xb1/0xe7 > Jul 20 15:57:38 localhost kernel: [<ffffffff8100e48f>] do_IRQ+0xf7/0x167 > Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100c3f1>] ret_from_intr+0x0/0xa > Jul 20 15:57:38 localhost kernel: <EOI> [<ffffffff8100b029>] ? > default_idle+0x39/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100b024>] ? default_idle+0x34/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100afa8>] ? cpu_idle+0x78/0xc0 > Jul 20 15:57:38 localhost kernel: [<ffffffff81289e7b>] ? > start_secondary+0x3fc/0x40b > Jul 20 15:57:38 localhost kernel: > Jul 20 15:57:38 localhost kernel: handlers: > Jul 20 15:57:38 localhost kernel: [<ffffffff811be414>] (usb_hcd_irq+0x0/0x63) > Jul 20 15:57:38 localhost kernel: Disabling IRQ #7 > > Perhaps it is related? Following the recommendation to boot with irqpoll (in > addition to my normal noapic) results in a deadlocked machine. > > However, USB works, as long as no new devices are plugged in. Use 'noirqdebug' and the bogus interrupts will be ignored instead of causing errors. 2.6.25.11-97 has been submitted to the updates-testing repository. I have installed 2.6.25.11-97 and run the specified command. I see the following in /var/log/messages: Jul 30 13:33:10 localhost kernel: SysRq : Show backtrace of all active CPUs Jul 30 13:33:10 localhost kernel: CPU1: Jul 30 13:33:10 localhost kernel: ffff8100bb693f18 0000000000000046 ffffffff81195b3e 0000000000000000 Jul 30 13:33:10 localhost kernel: 0000000000000001 00007f464090b760 ffff8100bb693f58 ffffffff8100d817 Jul 30 13:33:10 localhost kernel: ffff8100bb693f78 ffffffff81195b86 ffff8100bb693f78 0000000000000000 Jul 30 13:33:10 localhost kernel: Call Trace: Jul 30 13:33:10 localhost kernel: <IRQ> [<ffffffff81195b3e>] ? showacpu+0x0/0x5b Jul 30 13:33:10 localhost kernel: [<ffffffff8100d817>] ? show_stack+0x10/0x12 Jul 30 13:33:10 localhost kernel: [<ffffffff81195b86>] ? showacpu+0x48/0x5b Jul 30 13:33:10 localhost kernel: [<ffffffff8101b0ea>] ? smp_call_function_interrupt+0x48/0x71 Jul 30 13:33:10 localhost kernel: [<ffffffff8100ca36>] ? call_function_interrupt+0x66/0x70 Jul 30 13:33:10 localhost kernel: <EOI> This is a boot with noapic. Backtrace without noapic (lucky boot? unsure at this time) Jul 30 13:39:57 localhost kernel: SysRq : Show backtrace of all active CPUs Jul 30 13:39:57 localhost kernel: CPU0: Jul 30 13:39:57 localhost kernel: ffffffff814bbf18 0000000000000046 ffffffff81195b3e 0000000000000000 Jul 30 13:39:57 localhost kernel: 0000000000000020 0000000000000001 ffffffff814bbf58 ffffffff8100d817 Jul 30 13:39:57 localhost kernel: ffffffff814bbf78 ffffffff81195b86 ffffffff814bbf88 0000000000000000 Jul 30 13:39:57 localhost kernel: Call Trace: Jul 30 13:39:57 localhost kernel: <IRQ> [<ffffffff81195b3e>] ? showacpu+0x0/0x5b Jul 30 13:39:57 localhost kernel: [<ffffffff8100d817>] ? show_stack+0x10/0x12 Jul 30 13:39:57 localhost kernel: [<ffffffff81195b86>] ? showacpu+0x48/0x5b Jul 30 13:39:57 localhost kernel: [<ffffffff8101b0ea>] ? smp_call_function_interrupt+0x48/0x71 Jul 30 13:39:57 localhost kernel: [<ffffffff8100ca36>] ? call_function_interrupt+0x66/0x70 Jul 30 13:39:57 localhost kernel: <EOI> [<ffffffff810cc63c>] ? inotify_inode_queue_event+0x34/0xd9 Jul 30 13:39:57 localhost kernel: [<ffffffff8110149b>] ? security_file_permission+0x11/0x13 Jul 30 13:39:57 localhost kernel: [<ffffffff810a43d6>] ? do_readv_writev+0x17e/0x193 Jul 30 13:39:57 localhost kernel: [<ffffffff8106c6c3>] ? audit_syscall_entry+0x126/0x15a Jul 30 13:39:57 localhost kernel: [<ffffffff8106c394>] ? audit_syscall_exit+0x331/0x353 Jul 30 13:39:57 localhost kernel: [<ffffffff810a4429>] ? vfs_writev+0x3e/0x49 Jul 30 13:39:57 localhost kernel: [<ffffffff810a447b>] ? sys_writev+0x47/0x94 Jul 30 13:39:57 localhost kernel: [<ffffffff8100c052>] ? tracesys+0xd5/0xda Jul 30 13:39:57 localhost kernel: The computer should work fine without noapic with kernels 2.6.25 and later. It was only older kernels that needed that option. Check the boot messages and look for a line containing " using 0xed I/O delay port". If that is present IOAPIC mode should just work. Unfortunately, I cannot get the tx2000z to reliably boot without noapic in any kernel. I just tried, and received a Machine Check Exception (the whole reason I had to use noapic in the first place): HARDWARE ERROR CPU 0: Machine Check Exception 4 Bank 4: b2000000000f0f TSC 11e4deb697 The kernel claims that this is a hardware issue, but since Windows Vista can survive, as can a noapic'd kernel, I strongly doubt the kernel's claim. Furthermore, when I _do_ get the tx2000z to boot without noapic, it can't wake up from suspend-to-ram. When I run 2.6.25.9-76 with noapic, everything works properly (with the exception of an irq 7: nobody cared). After some playing with options, including re-installing kmod-nvidia from livna, I found that 2.6.25-11.97 performs the same way as the earlier kernels, with the exception that the second core is stuck at 100% Hardware interrupts until IRQ 7 nobody cared occurs. (This uses the noapic option). (In reply to comment #8) > Unfortunately, I cannot get the tx2000z to reliably boot without noapic in any > kernel. I just tried, and received a Machine Check Exception (the whole reason I > had to use noapic in the first place): > > HARDWARE ERROR > CPU 0: Machine Check Exception 4 Bank 4: b2000000000f0f > TSC 11e4deb697 > > The kernel claims that this is a hardware issue, but since Windows Vista can > survive, as can a noapic'd kernel, I strongly doubt the kernel's claim. > > Furthermore, when I _do_ get the tx2000z to boot without noapic, it can't wake > up from suspend-to-ram. When I run 2.6.25.9-76 with noapic, everything works > properly (with the exception of an irq 7: nobody cared). Does it print the line containing " using 0xed I/O delay port" ? If not, try booting with the option "io_delay=0xed" I have checked the boot logs. There is no mention of 0xed at all. I continue to get an MCE if I boot without any options. However, adding io_delay=0xed seems to fix that. There is still nothing printed, but no MCE, and I am able to suspend to ram. It has survived a few boot ups, and seems to be functioning correctly. Also, I found out that the 100% CPU utilization (under noapic) seems to be caused by the USB bus. Once the IRQ 7 nobody cared occurs, the (spurious?) USB interrupts stop, and the CPU utilization would return to idle. Furthermore, the io_delay has cleared up the IRQ 7 nobody cared and reduced the total wakeups (as shown with powertop). It is now between 500 and 600 (wake ups per 10 seconds) , depending on use, while earlier, it would be at least 1500 wake ups (per 10 seconds). This is a massive improvement. |