Bug 606582 - fedora completely unstable under high network traffic or disk usage
Summary: fedora completely unstable under high network traffic or disk usage
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: i686
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-22 02:15 UTC by Harris Gilliam
Modified: 2010-09-10 04:33 UTC (History)
8 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2010-09-10 04:33:31 UTC


Attachments (Terms of Use)
output of lshw (50.86 KB, text/plain)
2010-06-22 02:15 UTC, Harris Gilliam
no flags Details

Description Harris Gilliam 2010-06-22 02:15:56 UTC
Created attachment 425797 [details]
output of lshw

Description of problem:

After upgrading to Fedora 12 my system has become increasingly more unstable. It crashes, completely locking up so I have to hit the reset button, almost daily.  On some days I get a series of kernel OOPSes just before the lockup. They are always something to do with spinlock errors, memory management errors, etc.  I have reported almost 20 of these to kerneloops.org using ABRT.

These crashes almost always happen when I running something that does a lot of network access (like deluge) and/or writes to disk alot (like transcode).  Sometimes it happens when watching a DVD or flash video (Youtube for example).


Version-Release number of selected component (if applicable):

Every kernel since Fedora 12 was released.  Currently using:

kernel-2.6.32.11-99.fc12.i686
kernel-2.6.32.12-115.fc12.i686
kernel-2.6.32.14-127.fc12.i686


How reproducible:

Consistently crashes.


Steps to Reproduce:
1. boot machine
2. run program that uses lots of network or disk
3. after a while machine crashes
  
Actual results:


Expected results:


Additional info:

I have been able to reduce the frequency of crashes by disabling ksm and kernel mode setting.

I am attaching the output of lshw.

Comment 1 Stanislaw Gruszka 2010-06-22 17:52:40 UTC
(In reply to comment #0)
> They are always something to do with spinlock errors, memory management errors,
> etc.  I have reported almost 20 of these to kerneloops.org using ABRT.

Could you please give us link for them. Or attach logs with messages here. Ideally if you could run debug kernel, and attach dmesg  including first oops/calltrace.

Comment 2 Harris Gilliam 2010-06-26 03:16:17 UTC
using kernel-debug-2.6.32.14-127.fc12.i686

oops calltrace:

BUG: Bad page state in process kswapd0  pfn:836f0
page:c3c88040 flags:80020008 count:0 mapcount:-128 mapping:(null)
index:18233
Pid: 50, comm: kswapd0 Tainted: P 2.6.32.14-127.fc12.i686.debug #1
Call Trace:
[<c07b7431>] ? printk+0x14/0x1b
[<c04b5bfe>] bad_page+0xe6/0xff
[<c04b60e0>] free_pages_check+0x2b/0x49
[<c04b687c>] free_hot_cold_page+0x32/0x1d5
[<c04b6a75>] __pagevec_free+0x56/0x63
[<c04bb56d>] shrink_page_list+0x2f5/0x3bd
[<c04bb9af>] shrink_list+0x37a/0x5c2
[<c04bbe0a>] shrink_zone+0x213/0x298
[<c04bc42a>] kswapd+0x3b5/0x566
[<c04bad0f>] ? isolate_pages_global+0x0/0x1d1
[<c0457804>] ? autoremove_wake_function+0x0/0x34
[<c04bc075>] ? kswapd+0x0/0x566
[<c04575e2>] kthread+0x64/0x69
[<c045757e>] ? kthread+0x0/0x69
[<c040421f>] kernel_thread_helper+0x7/0x10

Comment 3 Stanislaw Gruszka 2010-06-26 16:59:56 UTC
(In reply to comment #2)
> Pid: 50, comm: kswapd0 Tainted: P 2.6.32.14-127.fc12.i686.debug #1
                         ^^^^^^^^^^^

Stop to use any proprietary modules, we do not support tainted kernels. If problem still happens provide dmesg. This time please attach full dmesg: from system start to oops.

Comment 4 Chuck Ebbert 2010-06-26 20:43:41 UTC
Random errors like that could also be due to memory failures. Try running memtest overnight.

Comment 5 Harris Gilliam 2010-06-27 14:26:10 UTC
I'll repost without the NVidia module loaded... however be aware that the problem happens the same when using the nouveau driver.

The problem is that usually there is no OOPS... the machine just locks up and I have to hit the reset button.  So to get an OOPS trace I have to play with loading the system until it crashes "softly".  Might take a few days.

I'll do the memtest tonight.

Comment 6 Chuck Ebbert 2010-06-28 05:21:25 UTC
Please post the boot messages (contents of /var/log/dmesg).

Comment 7 Stanislaw Gruszka 2010-06-30 12:45:09 UTC
Execpt providing boot dmesg, please configure kdump (http://fedoraproject.org/wiki/Kernel/kdump or use system-config-kdump). This should allow to crash memory when system hung and get additional info from memory dump.

Comment 8 Stanislaw Gruszka 2010-09-09 08:12:05 UTC
Hi Harris, could you provide additional info requested in comment 6 and 7 ?

Comment 9 Harris Gilliam 2010-09-10 04:29:49 UTC
Sorry I didn't get back to this... performed the memtest as suggested by Chuck.  Turns out I had a faulty DIMM :-(  DOH!! Have been running for a while now without incident.

You can mark this one closed I think :-)

Comment 10 Stanislaw Gruszka 2010-09-10 04:33:31 UTC
Sure :-)


Note You need to log in before you can comment on or make changes to this bug.