606582 – fedora completely unstable under high network traffic or disk usage

Bug 606582 - fedora completely unstable under high network traffic or disk usage

Summary: fedora completely unstable under high network traffic or disk usage

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	12
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-06-22 02:15 UTC by Harris Gilliam
Modified:	2010-09-10 04:33 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-09-10 04:33:31 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
output of lshw (50.86 KB, text/plain) 2010-06-22 02:15 UTC, Harris Gilliam	no flags	Details
View All

Description Harris Gilliam 2010-06-22 02:15:56 UTC

Created attachment 425797 [details]
output of lshw

Description of problem:

After upgrading to Fedora 12 my system has become increasingly more unstable. It crashes, completely locking up so I have to hit the reset button, almost daily.  On some days I get a series of kernel OOPSes just before the lockup. They are always something to do with spinlock errors, memory management errors, etc.  I have reported almost 20 of these to kerneloops.org using ABRT.

These crashes almost always happen when I running something that does a lot of network access (like deluge) and/or writes to disk alot (like transcode).  Sometimes it happens when watching a DVD or flash video (Youtube for example).


Version-Release number of selected component (if applicable):

Every kernel since Fedora 12 was released.  Currently using:

kernel-2.6.32.11-99.fc12.i686
kernel-2.6.32.12-115.fc12.i686
kernel-2.6.32.14-127.fc12.i686


How reproducible:

Consistently crashes.


Steps to Reproduce:
1. boot machine
2. run program that uses lots of network or disk
3. after a while machine crashes
  
Actual results:


Expected results:


Additional info:

I have been able to reduce the frequency of crashes by disabling ksm and kernel mode setting.

I am attaching the output of lshw.

Comment 1 Stanislaw Gruszka 2010-06-22 17:52:40 UTC

(In reply to comment #0)
> They are always something to do with spinlock errors, memory management errors,
> etc.  I have reported almost 20 of these to kerneloops.org using ABRT.

Could you please give us link for them. Or attach logs with messages here. Ideally if you could run debug kernel, and attach dmesg  including first oops/calltrace.

Comment 2 Harris Gilliam 2010-06-26 03:16:17 UTC

using kernel-debug-2.6.32.14-127.fc12.i686

oops calltrace:

BUG: Bad page state in process kswapd0  pfn:836f0
page:c3c88040 flags:80020008 count:0 mapcount:-128 mapping:(null)
index:18233
Pid: 50, comm: kswapd0 Tainted: P 2.6.32.14-127.fc12.i686.debug #1
Call Trace:
[<c07b7431>] ? printk+0x14/0x1b
[<c04b5bfe>] bad_page+0xe6/0xff
[<c04b60e0>] free_pages_check+0x2b/0x49
[<c04b687c>] free_hot_cold_page+0x32/0x1d5
[<c04b6a75>] __pagevec_free+0x56/0x63
[<c04bb56d>] shrink_page_list+0x2f5/0x3bd
[<c04bb9af>] shrink_list+0x37a/0x5c2
[<c04bbe0a>] shrink_zone+0x213/0x298
[<c04bc42a>] kswapd+0x3b5/0x566
[<c04bad0f>] ? isolate_pages_global+0x0/0x1d1
[<c0457804>] ? autoremove_wake_function+0x0/0x34
[<c04bc075>] ? kswapd+0x0/0x566
[<c04575e2>] kthread+0x64/0x69
[<c045757e>] ? kthread+0x0/0x69
[<c040421f>] kernel_thread_helper+0x7/0x10

Comment 3 Stanislaw Gruszka 2010-06-26 16:59:56 UTC

(In reply to comment #2)
> Pid: 50, comm: kswapd0 Tainted: P 2.6.32.14-127.fc12.i686.debug #1
                         ^^^^^^^^^^^

Stop to use any proprietary modules, we do not support tainted kernels. If problem still happens provide dmesg. This time please attach full dmesg: from system start to oops.

Comment 4 Chuck Ebbert 2010-06-26 20:43:41 UTC

Random errors like that could also be due to memory failures. Try running memtest overnight.

Comment 5 Harris Gilliam 2010-06-27 14:26:10 UTC

I'll repost without the NVidia module loaded... however be aware that the problem happens the same when using the nouveau driver.

The problem is that usually there is no OOPS... the machine just locks up and I have to hit the reset button.  So to get an OOPS trace I have to play with loading the system until it crashes "softly".  Might take a few days.

I'll do the memtest tonight.

Comment 6 Chuck Ebbert 2010-06-28 05:21:25 UTC

Please post the boot messages (contents of /var/log/dmesg).

Comment 7 Stanislaw Gruszka 2010-06-30 12:45:09 UTC

Execpt providing boot dmesg, please configure kdump (http://fedoraproject.org/wiki/Kernel/kdump or use system-config-kdump). This should allow to crash memory when system hung and get additional info from memory dump.

Comment 8 Stanislaw Gruszka 2010-09-09 08:12:05 UTC

Hi Harris, could you provide additional info requested in comment 6 and 7 ?

Comment 9 Harris Gilliam 2010-09-10 04:29:49 UTC

Sorry I didn't get back to this... performed the memtest as suggested by Chuck.  Turns out I had a faulty DIMM :-(  DOH!! Have been running for a while now without incident.

You can mark this one closed I think :-)

Comment 10 Stanislaw Gruszka 2010-09-10 04:33:31 UTC

Sure :-)

Note You need to log in before you can comment on or make changes to this bug.