Hide Forgot
Description of problem: With cpuspeed enabled, heavy I/O tends to crash the kernel. Usually throws messages about DMA errors on hd? on the console. Everything works fine with cpuspeed off. The problem seems very similar to https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=125419 and https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140873, even though this is with acpu-cpufreq instead of longhaul. Version-Release number of selected component (if applicable): kernel-2.6.18-1.2798.fc6 How reproducible: Start something that does a lot if disk I/O. Strangely enough a kernel compile might not trigger it, but an "svnadmin load" (~45MB dumpfile, ~1100 revisions) does it nearly every time. Steps to Reproduce: 1. start cpuspeed 2. start an I/O-heavy process 3. wait Actual results: Kernel panic or lockup. Expected results: Uneventful operation. Additional info: The problem occurs even when booting off a SATA disk with ATA disabled (ide0=noprobe ide1=noprobe), but in that case it just hangs silently. I also tried a vanilla 2.6.19-rc2 kernel (just in case longhaul works better - there seemed to be several longhaul-related fixes in -rc1) and it works without any problems with acpi-cpufreq (longhaul refuses to load due to APIC detected, and doesn't recognise the CPU with APIC disabled).
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug, however this version of Fedora is no longer maintained. Please attempt to reproduce this bug with a current version of Fedora (presently Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a few days if there is no further information lodged. Thanks for using Fedora!
Feel free to close the bug since the F8 kernel uses eps instead of acpi-cpufreq (yes, /proc/cpuinfo does list the est flag) so it doesn't apply. FWIW, I hadn't tested it extensively before because once cpuspeed started and the cpu went to the lower clock speed it wouldn't scale up again. This doesn't seem to be the case with the current kernel (2.6.23.9-85.fc8PAE) and it seems stable so far (no problems after 1 hour of continuous "svnadmin load"s).
Looks like I spoke too soon :-( I got a kernel panic a little bit later (svnadmin load loop still running). The stack trace part that was still visible on the screen was: [<c0445678>] tick_sched_timer+0x0/0xbb [<c05edff5>] tcp_transmit_skb+0x6x6/0x6f7 [<c0432c40>] irq_exit+0x53/0x6b [<c041d05e>] smp_apic_timer_interrupt+0x71/0x7d [<c0405c2c>] apic_timer_interrupt+0x28/0x30 [<c04359f6>] lock_timer_base+0x19/0x35 [<c05ef9a2>] __tcp_push_pending_frames+0x709/0x7ba [<c05e5326>] tcp_sendmsg+0x16c/0xa40 [<c05e5b02>] tcp_sendmsg+0x948/0xa40 [<c04938bc>] dput+0x30/0xd7 [<c048cde1>] __link_path_walk+0xa74/0xbaf [<c05b44e5>] sock_aio_write+0xea/0xf6 [<c0484698>] do_sync_write+0xc7/0x10a [<c043e3f5>] autoremove_wake_function+0x0/0x35 [<c04453fe>] tick_program_event+0x33/0x52 [<c0484f1a>] vfs_write+0xbc/0x15a [<c0485523>] sys_write+0x41/0x67 [<c0405112>] sysenter_past_esp+0x6b/0xa1 ======================= Code: 00 00 00 85 d2 74 06 83 7a 0c 00 75 17 89 54 24 04 89 f0 89 ea 89 0c 24 83 c9 ff e8 37 fa ff ff 89 c3 eb 0d 8b 5a 0c 0f b7 42 0a <8b> 04 83 89 42 0c 89 f8 50 9d 90 8d b4 26 00 00 00 00 66 85 ed EIP: [<c048108e>] kmem_cache_alloc+0x5a/0x99 SS:ESP 0068:d0b4bc1c Kernel panic - not syncing: Fatal exception in interrupt (transcribed by hand, I doublechecked but there may be errors)
(This is a mass-update to all current FC6 kernel bugs in NEW state) Hello, I'm reviewing this bug list as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug, however this version of Fedora is no longer maintained. Please attempt to reproduce this bug with a current version of Fedora (presently Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a few days if there is no further information lodged. Thanks for using Fedora!
Per the previous comment in this bug, I am closing it as INSUFFICIENT_DATA, since no information has been lodged for over 30 days. Please re-open this bug or file a new one if you can provide the requested data, and thanks for filing the original report!
The previous comment (#4) was identical to #1, and I did respond to that with the results of testing on F8 as requested. If you need any other information please just ask.
Sorry for not seeing the previous comment before today, and apologies for closing this bug - it got caught in mass closing :( Does booting with nohz acpi=off help?
there's a large number of cases where cpu scaling on boards with VIA CPUs causes problems. The odd thing is what works fine on one board, fails on another even though they have the same CPU. Short of blacklisting known good/bad boards (which would be a huge task) I'm not really sure we've any hope of diagnosing this problem, and "don't enable power management" is the only recourse open to us.
One thing that might be worth a shot: if there's a BIOS update for your board, it may be worth giving it a try.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
Apologies for the delay, it took me a while to track down a USB floppy drive for flashing a newer BIOS. With BIOS 1.09 (from <http://www.via.com.tw/en/products/mainboards/downloads.jsp?motherboard_id=399>, previously running 1.07) and kernel-PAE-2.6.24.4-64.fc8 the box lasted a bit over a day with cpuspeed enabled. The symptoms were different though, instead of a kernel panic I "just" got I/O errors taking the (SATA) disk offline. Kernel messages on the console were the following: > kernel: journal commit I/O error > ext3_abort called > EXT3-ds error (device sda6): ext3_journal_start_sb: Detected aborted journal > Remounting filesystem read-only (repeated once for each mounted filesystem), and dmesg output contained just: > sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sda, sector 3720455 repeated ad nauseum. Not sure whether the change is because of the kernel, the BIOS or both, or even if it's the same issue (although with cpuspeed off it's been stable for ~10 days now so it's at least related). With nohz acpi=off on the kernel command line I did get a panic, but this time the stack trace ended in > [<c04052ae>] work_notifysig+0x13/0x19 and the kernel messages ended with > Fixing recursive fault but reboot is needed! after the trace. Dave, ISTR that you have some EN1500 boards (which seem identical to the EN1200 in question modulo the CPU clock and cooling), have you managed to get those stable with frequency scaling on?
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
As this bug is in MODIFIED, Fedora believes that a fix has been committed that resolves the problem listed in this bug report. If this is not the case, please re-open this report, noting the version of the package that you reproduced the bug against. Thanks for the report!
(In reply to comment #13) > As this bug is in MODIFIED, Fedora believes that a fix has been committed that > resolves the problem listed in this bug report. Thanks for the heads up, that seems to be the case. Now running up-to-date F10 (kernel-PAE-2.6.27.5-117.fc10.i686) on the same hardware and so far I have ~46 hours uptime (cpuspeed enabled) with the last ~10 hours continuously running svnadmin load / sleep in a loop and no problems.