212852 – Kernel panics/lockups with acpi-cpufreq on a VIA C7 (EN1200 mini-itx board)

Bug 212852 - Kernel panics/lockups with acpi-cpufreq on a VIA C7 (EN1200 mini-itx board)

Summary: Kernel panics/lockups with acpi-cpufreq on a VIA C7 (EN1200 mini-itx board)

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	8
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:	bzcl34nup
Depends On:
Blocks:	FCMETA_ACPI
TreeView+	depends on / blocked

Reported:	2006-10-29 17:59 UTC by Lucas Maneos
Modified:	2008-11-28 19:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:	F8
Clone Of:
Environment:
Last Closed:	2008-11-26 17:36:17 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Lucas Maneos 2006-10-29 17:59:24 UTC

Description of problem:

With cpuspeed enabled, heavy I/O tends to crash the kernel.  Usually throws
messages about DMA errors on hd? on the console.  Everything works fine with
cpuspeed off.

The problem seems very similar to
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=125419 and
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140873, even though this is
with acpu-cpufreq instead of longhaul.

Version-Release number of selected component (if applicable):

kernel-2.6.18-1.2798.fc6

How reproducible:

Start something that does a lot if disk I/O.  Strangely enough a kernel compile
might not trigger it, but an "svnadmin load" (~45MB dumpfile, ~1100 revisions)
does it nearly every time.

Steps to Reproduce:
1. start cpuspeed
2. start an I/O-heavy process
3. wait
  
Actual results:

Kernel panic or lockup.

Expected results:

Uneventful operation.

Additional info:

The problem occurs even when booting off a SATA disk with ATA disabled
(ide0=noprobe ide1=noprobe), but in that case it just hangs silently.

I also tried a vanilla 2.6.19-rc2 kernel (just in case longhaul works better -
there seemed to be several longhaul-related fixes in -rc1) and it works without
any problems with acpi-cpufreq (longhaul refuses to load due to APIC detected,
and doesn't recognise the CPU with APIC disabled).

Comment 1 Jon Stanley 2007-12-31 06:38:11 UTC

Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!

Comment 2 Lucas Maneos 2008-01-02 10:33:12 UTC

Feel free to close the bug since the F8 kernel uses eps instead of acpi-cpufreq
(yes, /proc/cpuinfo does list the est flag) so it doesn't apply.

FWIW, I hadn't tested it extensively before because once cpuspeed started and
the cpu went to the lower clock speed it wouldn't scale up again.  This doesn't
seem to be the case with the current kernel (2.6.23.9-85.fc8PAE) and it seems
stable so far (no problems after 1 hour of continuous "svnadmin load"s).

Comment 3 Lucas Maneos 2008-01-02 11:30:36 UTC

Looks like I spoke too soon :-(

I got a kernel panic a little bit later (svnadmin load loop still running).  The
stack trace part that was still visible on the screen was:

 [<c0445678>] tick_sched_timer+0x0/0xbb
 [<c05edff5>] tcp_transmit_skb+0x6x6/0x6f7
 [<c0432c40>] irq_exit+0x53/0x6b
 [<c041d05e>] smp_apic_timer_interrupt+0x71/0x7d
 [<c0405c2c>] apic_timer_interrupt+0x28/0x30
 [<c04359f6>] lock_timer_base+0x19/0x35
 [<c05ef9a2>] __tcp_push_pending_frames+0x709/0x7ba
 [<c05e5326>] tcp_sendmsg+0x16c/0xa40
 [<c05e5b02>] tcp_sendmsg+0x948/0xa40
 [<c04938bc>] dput+0x30/0xd7
 [<c048cde1>] __link_path_walk+0xa74/0xbaf
 [<c05b44e5>] sock_aio_write+0xea/0xf6
 [<c0484698>] do_sync_write+0xc7/0x10a
 [<c043e3f5>] autoremove_wake_function+0x0/0x35
 [<c04453fe>] tick_program_event+0x33/0x52
 [<c0484f1a>] vfs_write+0xbc/0x15a
 [<c0485523>] sys_write+0x41/0x67
 [<c0405112>] sysenter_past_esp+0x6b/0xa1
 =======================
Code: 00 00 00 85 d2 74 06 83 7a 0c 00 75 17 89 54 24 04 89 f0 89 ea 89 0c 24 83
 c9 ff e8 37 fa ff ff 89 c3 eb 0d 8b 5a 0c 0f b7 42 0a <8b> 04 83 89 42 0c 89 f8
 50 9d 90 8d b4 26 00 00 00 00 66 85 ed
EIP: [<c048108e>] kmem_cache_alloc+0x5a/0x99 SS:ESP 0068:d0b4bc1c
Kernel panic - not syncing: Fatal exception in interrupt

(transcribed by hand, I doublechecked but there may be errors)

Comment 4 Jon Stanley 2008-01-08 01:50:56 UTC

(This is a mass-update to all current FC6 kernel bugs in NEW state)

Hello,

I'm reviewing this bug list as part of the kernel bug triage project, an attempt
to isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!

Comment 5 Jon Stanley 2008-02-08 04:29:05 UTC

Per the previous comment in this bug, I am closing it as INSUFFICIENT_DATA,
since no information has been lodged for over 30 days.

Please re-open this bug or file a new one if you can provide the requested data,
and thanks for filing the original report!

Comment 6 Lucas Maneos 2008-02-08 07:20:30 UTC

The previous comment (#4) was identical to #1, and I did respond to that with
the results of testing on F8 as requested.   If you need any other information
please just ask.

Comment 7 Jon Stanley 2008-03-06 18:26:37 UTC

Sorry for not seeing the previous comment before today, and apologies for
closing this bug - it got caught in mass closing :(

Does booting with nohz acpi=off help?

Comment 8 Dave Jones 2008-03-06 18:50:14 UTC

there's a large number of cases where cpu scaling on boards with VIA CPUs causes
problems.   The odd thing is what works fine on one board, fails on another even
though they have the same CPU.

Short of blacklisting known good/bad boards (which would be a huge task) I'm not
really sure we've any hope of diagnosing this problem, and "don't enable power
management" is the only recourse open to us.

Comment 9 Dave Jones 2008-03-06 18:50:40 UTC

One thing that might be worth a shot: if there's a BIOS update for your board,
it may be worth giving it a try.

Comment 10 Bug Zapper 2008-04-04 04:14:33 UTC

Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 11 Lucas Maneos 2008-04-17 15:08:46 UTC

Apologies for the delay, it took me a while to track down a USB floppy drive for
flashing a newer BIOS.

With BIOS 1.09 (from
<http://www.via.com.tw/en/products/mainboards/downloads.jsp?motherboard_id=399>,
previously running 1.07) and kernel-PAE-2.6.24.4-64.fc8 the box lasted a bit
over a day with cpuspeed enabled.  The symptoms were different though, instead
of a kernel panic I "just" got I/O errors taking the (SATA) disk offline.

Kernel messages on the console were the following:

> kernel: journal commit I/O error
> ext3_abort called
> EXT3-ds error (device sda6): ext3_journal_start_sb: Detected aborted journal
> Remounting filesystem read-only

(repeated once for each mounted filesystem), and dmesg output contained just:

> sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sda, sector 3720455

repeated ad nauseum.

Not sure whether the change is because of the kernel, the BIOS or both, or even
if it's the same issue (although with cpuspeed off it's been stable for ~10 days
now so it's at least related).

With nohz acpi=off on the kernel command line I did get a panic, but this time
the stack trace ended in

> [<c04052ae>] work_notifysig+0x13/0x19

and the kernel messages ended with

> Fixing recursive fault but reboot is needed!

after the trace.

Dave, ISTR that you have some EN1500 boards (which seem identical to the EN1200
in question modulo the CPU clock and cooling), have you managed to get those
stable with frequency scaling on?

Comment 12 Bug Zapper 2008-11-26 07:03:32 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Jon Stanley 2008-11-26 17:36:17 UTC

As this bug is in MODIFIED, Fedora believes that a fix has been committed that resolves the problem listed in this bug report.

If this is not the case, please re-open this report, noting the version of the package that you reproduced the bug against.

Thanks for the report!

Comment 14 Lucas Maneos 2008-11-28 19:27:06 UTC

(In reply to comment #13)
> As this bug is in MODIFIED, Fedora believes that a fix has been committed that
> resolves the problem listed in this bug report.

Thanks for the heads up, that seems to be the case.  Now running up-to-date F10 (kernel-PAE-2.6.27.5-117.fc10.i686) on the same hardware and so far I have ~46 hours uptime (cpuspeed enabled) with the last ~10 hours continuously running svnadmin load / sleep in a loop and no problems.

Note You need to log in before you can comment on or make changes to this bug.