From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9) Gecko/2008061712 Fedora/3.0-1.fc9 Firefox/3.0 Description of problem: I am running an HP tx2000z series laptop (a tx2117cl to be exact) with a dual-core AMD Turion 64 X2 TL-62 (2.1Ghz cores). I had Fedora 9 with kernel-2.6.25.9-76.fc9 running perfectly, including suspend-to-ram and hibernate if I booted with the noapic option. According to top, there were no negative side effects to this, and the machine worked perfectly, aside from a few minor glitches not relevant to this bug, However, when I updated to kernel-2.6.25.10-86.fc9.x86_64, the second CPU is always 100% in use performing hardware interrupts (hi). Since such did not occur earlier, I can only presume that this is a bug. I cannot boot without noapic consistently. Doing so will give rise to a Machine Check Exception. However, since this machine works perfectly with noapic and in vista, I strongly doubt that there is a hardware issue. Version-Release number of selected component (if applicable): kernel-2.6.25.10-86.fc9.x86_64 How reproducible: Always Steps to Reproduce: 1. Install latest kernel update (kernel-2.6.25.10-86.fc9.x86_64) on a tx2000z series tablet 2. Attempt to boot with noapic 3. Observe 100% cpu use in one core doing hardware interrupts Actual Results: 100% CPU use in hardware interrupts. Expected Results: Both cores 99% idle, being used by the processes, not the hardware. Additional info: This is a tablet pc, manufactured by HP. As with all other HP laptops, it requires noapic to boot properly. This is the first time I have seen CPU utilization on this tablet in doing so.
I have a tx1000 that also needs noapic to boot. It occasionally gets into that state where a CPU gets stuck processing interrupts but usually it stops after a while. F9 kernels starting with 2.6.25.11-95 have the sysrq-l key added to show a backtrace on all processors. It can be activated by the command echo 'l' >/proc/sysrq-trigger Additional debugging will hopefully be added to the next release so we can find out why noapic is needed too.
I have checked my tx2000z laptop's F9 installation, and I only have kernel-2.6.25.9-76.fc9.x86_64 and kernel-2.6.25.10-86.fc9.x86_64 installed. Is 2.6.25.11-95 still in rawhide? Additionally, I see the following message shortly after boot-up or reloading the USB modules: Jul 20 15:57:38 localhost kernel: irq 7: nobody cared (try booting with the "irqpoll" option) Jul 20 15:57:38 localhost kernel: Pid: 0, comm: swapper Tainted: P 2.6.25.9-76.fc9.x86_64 #1 Jul 20 15:57:38 localhost kernel: Jul 20 15:57:38 localhost kernel: Call Trace: Jul 20 15:57:38 localhost kernel: <IRQ> [<ffffffff8107180f>] __report_bad_irq+0x38/0x7c Jul 20 15:57:38 localhost kernel: [<ffffffff81071a35>] note_interrupt+0x1e2/0x2 49 Jul 20 15:57:38 localhost kernel: [<ffffffff8107221b>] handle_level_irq+0xb1/0xe7 Jul 20 15:57:38 localhost kernel: [<ffffffff8100e48f>] do_IRQ+0xf7/0x167 Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100c3f1>] ret_from_intr+0x0/0xa Jul 20 15:57:38 localhost kernel: <EOI> [<ffffffff8100b029>] ? default_idle+0x39/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100b024>] ? default_idle+0x34/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f Jul 20 15:57:38 localhost kernel: [<ffffffff8100afa8>] ? cpu_idle+0x78/0xc0 Jul 20 15:57:38 localhost kernel: [<ffffffff81289e7b>] ? start_secondary+0x3fc/0x40b Jul 20 15:57:38 localhost kernel: Jul 20 15:57:38 localhost kernel: handlers: Jul 20 15:57:38 localhost kernel: [<ffffffff811be414>] (usb_hcd_irq+0x0/0x63) Jul 20 15:57:38 localhost kernel: Disabling IRQ #7 Perhaps it is related? Following the recommendation to boot with irqpoll (in addition to my normal noapic) results in a deadlocked machine. However, USB works, as long as no new devices are plugged in.
(In reply to comment #2) > I have checked my tx2000z laptop's F9 installation, and I only have > kernel-2.6.25.9-76.fc9.x86_64 and kernel-2.6.25.10-86.fc9.x86_64 installed. Is > 2.6.25.11-95 still in rawhide? > It has not been built yet. > Additionally, I see the following message shortly after boot-up or reloading the > USB modules: > > Jul 20 15:57:38 localhost kernel: irq 7: nobody cared (try booting with the > "irqpoll" option) > Jul 20 15:57:38 localhost kernel: Pid: 0, comm: swapper Tainted: P > 2.6.25.9-76.fc9.x86_64 #1 > Jul 20 15:57:38 localhost kernel: > Jul 20 15:57:38 localhost kernel: Call Trace: > Jul 20 15:57:38 localhost kernel: <IRQ> [<ffffffff8107180f>] > __report_bad_irq+0x38/0x7c > Jul 20 15:57:38 localhost kernel: [<ffffffff81071a35>] note_interrupt+0x1e2/0x2 > 49 > Jul 20 15:57:38 localhost kernel: [<ffffffff8107221b>] handle_level_irq+0xb1/0xe7 > Jul 20 15:57:38 localhost kernel: [<ffffffff8100e48f>] do_IRQ+0xf7/0x167 > Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100c3f1>] ret_from_intr+0x0/0xa > Jul 20 15:57:38 localhost kernel: <EOI> [<ffffffff8100b029>] ? > default_idle+0x39/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100b024>] ? default_idle+0x34/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100aff0>] ? default_idle+0x0/0x5f > Jul 20 15:57:38 localhost kernel: [<ffffffff8100afa8>] ? cpu_idle+0x78/0xc0 > Jul 20 15:57:38 localhost kernel: [<ffffffff81289e7b>] ? > start_secondary+0x3fc/0x40b > Jul 20 15:57:38 localhost kernel: > Jul 20 15:57:38 localhost kernel: handlers: > Jul 20 15:57:38 localhost kernel: [<ffffffff811be414>] (usb_hcd_irq+0x0/0x63) > Jul 20 15:57:38 localhost kernel: Disabling IRQ #7 > > Perhaps it is related? Following the recommendation to boot with irqpoll (in > addition to my normal noapic) results in a deadlocked machine. > > However, USB works, as long as no new devices are plugged in. Use 'noirqdebug' and the bogus interrupts will be ignored instead of causing errors.
2.6.25.11-97 has been submitted to the updates-testing repository.
I have installed 2.6.25.11-97 and run the specified command. I see the following in /var/log/messages: Jul 30 13:33:10 localhost kernel: SysRq : Show backtrace of all active CPUs Jul 30 13:33:10 localhost kernel: CPU1: Jul 30 13:33:10 localhost kernel: ffff8100bb693f18 0000000000000046 ffffffff81195b3e 0000000000000000 Jul 30 13:33:10 localhost kernel: 0000000000000001 00007f464090b760 ffff8100bb693f58 ffffffff8100d817 Jul 30 13:33:10 localhost kernel: ffff8100bb693f78 ffffffff81195b86 ffff8100bb693f78 0000000000000000 Jul 30 13:33:10 localhost kernel: Call Trace: Jul 30 13:33:10 localhost kernel: <IRQ> [<ffffffff81195b3e>] ? showacpu+0x0/0x5b Jul 30 13:33:10 localhost kernel: [<ffffffff8100d817>] ? show_stack+0x10/0x12 Jul 30 13:33:10 localhost kernel: [<ffffffff81195b86>] ? showacpu+0x48/0x5b Jul 30 13:33:10 localhost kernel: [<ffffffff8101b0ea>] ? smp_call_function_interrupt+0x48/0x71 Jul 30 13:33:10 localhost kernel: [<ffffffff8100ca36>] ? call_function_interrupt+0x66/0x70 Jul 30 13:33:10 localhost kernel: <EOI> This is a boot with noapic.
Backtrace without noapic (lucky boot? unsure at this time) Jul 30 13:39:57 localhost kernel: SysRq : Show backtrace of all active CPUs Jul 30 13:39:57 localhost kernel: CPU0: Jul 30 13:39:57 localhost kernel: ffffffff814bbf18 0000000000000046 ffffffff81195b3e 0000000000000000 Jul 30 13:39:57 localhost kernel: 0000000000000020 0000000000000001 ffffffff814bbf58 ffffffff8100d817 Jul 30 13:39:57 localhost kernel: ffffffff814bbf78 ffffffff81195b86 ffffffff814bbf88 0000000000000000 Jul 30 13:39:57 localhost kernel: Call Trace: Jul 30 13:39:57 localhost kernel: <IRQ> [<ffffffff81195b3e>] ? showacpu+0x0/0x5b Jul 30 13:39:57 localhost kernel: [<ffffffff8100d817>] ? show_stack+0x10/0x12 Jul 30 13:39:57 localhost kernel: [<ffffffff81195b86>] ? showacpu+0x48/0x5b Jul 30 13:39:57 localhost kernel: [<ffffffff8101b0ea>] ? smp_call_function_interrupt+0x48/0x71 Jul 30 13:39:57 localhost kernel: [<ffffffff8100ca36>] ? call_function_interrupt+0x66/0x70 Jul 30 13:39:57 localhost kernel: <EOI> [<ffffffff810cc63c>] ? inotify_inode_queue_event+0x34/0xd9 Jul 30 13:39:57 localhost kernel: [<ffffffff8110149b>] ? security_file_permission+0x11/0x13 Jul 30 13:39:57 localhost kernel: [<ffffffff810a43d6>] ? do_readv_writev+0x17e/0x193 Jul 30 13:39:57 localhost kernel: [<ffffffff8106c6c3>] ? audit_syscall_entry+0x126/0x15a Jul 30 13:39:57 localhost kernel: [<ffffffff8106c394>] ? audit_syscall_exit+0x331/0x353 Jul 30 13:39:57 localhost kernel: [<ffffffff810a4429>] ? vfs_writev+0x3e/0x49 Jul 30 13:39:57 localhost kernel: [<ffffffff810a447b>] ? sys_writev+0x47/0x94 Jul 30 13:39:57 localhost kernel: [<ffffffff8100c052>] ? tracesys+0xd5/0xda Jul 30 13:39:57 localhost kernel:
The computer should work fine without noapic with kernels 2.6.25 and later. It was only older kernels that needed that option. Check the boot messages and look for a line containing " using 0xed I/O delay port". If that is present IOAPIC mode should just work.
Unfortunately, I cannot get the tx2000z to reliably boot without noapic in any kernel. I just tried, and received a Machine Check Exception (the whole reason I had to use noapic in the first place): HARDWARE ERROR CPU 0: Machine Check Exception 4 Bank 4: b2000000000f0f TSC 11e4deb697 The kernel claims that this is a hardware issue, but since Windows Vista can survive, as can a noapic'd kernel, I strongly doubt the kernel's claim. Furthermore, when I _do_ get the tx2000z to boot without noapic, it can't wake up from suspend-to-ram. When I run 2.6.25.9-76 with noapic, everything works properly (with the exception of an irq 7: nobody cared).
After some playing with options, including re-installing kmod-nvidia from livna, I found that 2.6.25-11.97 performs the same way as the earlier kernels, with the exception that the second core is stuck at 100% Hardware interrupts until IRQ 7 nobody cared occurs. (This uses the noapic option).
(In reply to comment #8) > Unfortunately, I cannot get the tx2000z to reliably boot without noapic in any > kernel. I just tried, and received a Machine Check Exception (the whole reason I > had to use noapic in the first place): > > HARDWARE ERROR > CPU 0: Machine Check Exception 4 Bank 4: b2000000000f0f > TSC 11e4deb697 > > The kernel claims that this is a hardware issue, but since Windows Vista can > survive, as can a noapic'd kernel, I strongly doubt the kernel's claim. > > Furthermore, when I _do_ get the tx2000z to boot without noapic, it can't wake > up from suspend-to-ram. When I run 2.6.25.9-76 with noapic, everything works > properly (with the exception of an irq 7: nobody cared). Does it print the line containing " using 0xed I/O delay port" ? If not, try booting with the option "io_delay=0xed"
I have checked the boot logs. There is no mention of 0xed at all. I continue to get an MCE if I boot without any options. However, adding io_delay=0xed seems to fix that. There is still nothing printed, but no MCE, and I am able to suspend to ram. It has survived a few boot ups, and seems to be functioning correctly.
Also, I found out that the 100% CPU utilization (under noapic) seems to be caused by the USB bus. Once the IRQ 7 nobody cared occurs, the (spurious?) USB interrupts stop, and the CPU utilization would return to idle. Furthermore, the io_delay has cleared up the IRQ 7 nobody cared and reduced the total wakeups (as shown with powertop). It is now between 500 and 600 (wake ups per 10 seconds) , depending on use, while earlier, it would be at least 1500 wake ups (per 10 seconds). This is a massive improvement.