Bug 114740 - Total System freeze during disk I/O under load
Summary: Total System freeze during disk I/O under load
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 1
Hardware: athlon
OS: Linux
low
low
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact:
URL:
Whiteboard:
: 145296 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-02-02 09:11 UTC by Simon Banks
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-02-17 17:00:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Simon Banks 2004-02-02 09:11:19 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031114

Description of problem:
I've been attempting to pin this on my bios/hardware for about 2
months now.  I've run day long sessions of memtest86 and cpu burn but
can't cause the same problem (do my memory is ok and all of the core
temps appear to be well within specs), nor can I reproduce it using
Microsloth O/S's so I'm finaly coming to the conclusion its a Fedora
Core issue.


Basicly within 15 mins to 5 hours of booting up a Fedora Core system
it locks completely and unterly (no flashing lights, no logs) requires
a power off to get booted again.  However this freeze only occurs when
the system is underload but which disk access not pure CPU. So
recompiling the kernel or transcoding a video codec for instance.  The
machine always locks during disk IO (The red disk IO light on the
front of my box which indicates IO by flashing is always ON).

I've run everything from a stock 2115 kernel to the latest and
greatest 2163 system, the symptoms are the same.

My machine in its skeleton form:-
Athlon XP 2400+ CPU
1 Gig of memory
ABIT KV7 motherboard (VIA KT600 chipset (V8237 southbridge))
1 120gig disk (EIDE)


Version-Release number of selected component (if applicable):
kernel-2.4.22.2140 through 2163

How reproducible:
Always

Steps to Reproduce:
1.Boot the box
2.Set a high load looping (kernel compile)
3.Wait anything up to 5 hours for the system to fall over
    

Actual Results:  She's dead Jim

The video display can still be seen, although nothing is being updated
(cursors, clocks, stock tickers etc.).  The machines disk IO light is
fixed on and the machine requires a reset/power off/on to get it up
again.  Networking is also dead

There are no logs and I can not obtain anything from the console

Expected Results:  I would not expect the machine to lock up under load

Additional info:

I've spend 2 months on this problem and have tried no end of BIOS,
kernel config boot options.  I've come (via a process of elimination)
that the problem is disk IO related. I've ruled out CPU/mother board
temps and memory flaws. I've even stuck in a spare 20 gig EIDE disk
and installed fedora from scratch to prove its not my 120gig EIDE
drive doing something odd.


I'm more than happy to run tests or provide any other infomation
that's available. Although I've never ever had the machine lockup in a
recoverable way it always requires a reset.

Comment 1 Simon Banks 2004-02-17 16:56:26 UTC
Please close/resolve/delete this bug as I finaly tracked down the
problem to a hangup in the VIA motherboard (KV7 bios revision 11) and
the problem has not reoccued in over a week at full load using fedora
kernel 2166 since I upgraded the mothboard bios to version 13. Apols
for even contemplating it was the kernel.

Comment 2 John McCrae 2005-02-09 15:08:15 UTC
*** Bug 145296 has been marked as a duplicate of this bug. ***

Comment 3 Penelope Fudd 2006-02-06 04:10:48 UTC
I just upgraded to the official 2.6.15-1.1830_FC4 and I just had my first kernel
oops in eons.  It got written to the logs (twice in a row).  I had just finished
playing some 'bzflag'.  I'm using the NVIDIA drivers
NVIDIA-Linux-x86-1.0-8178-pkg1.run, and I've got an Nvidia GeForce4 MX 440
graphics card.

Feb  5 20:03:45 kirk kernel: Bad page state at free_hot_cold_page (in process
'bzflag', page c14f94c0)
Feb  5 20:03:45 kirk kernel: flags:0x80000400 mapping:00000000 mapcount:0
count:0 (Tainted: P     )
Feb  5 20:03:45 kirk kernel: Backtrace:
Feb  5 20:03:45 kirk kernel:  [<c0141598>] bad_page+0x8c/0xc3     [<c0141e3f>]
free_hot_cold_page+0x3a/0x117
Feb  5 20:03:45 kirk kernel:  [<c01e097f>] memmove+0x24/0x2d     [<f8d0bba8>]
nv_vm_free_pages+0xa3/0xf5 [nvidia]
Feb  5 20:03:45 kirk kernel:  [<f8d08782>] nv_kern_vma_release+0x7c/0x97
[nvidia]     [<c014dffd>] remove_vma+0x28/0x45
Feb  5 20:03:45 kirk kernel:  [<c0150074>] exit_mmap+0xb1/0xda     [<c011ad09>]
mmput+0x1f/0x95
Feb  5 20:03:45 kirk kernel:  [<c011f647>] do_exit+0xfc/0x3cf     [<c0106edd>]
do_syscall_trace+0x1ac/0x1c4
Feb  5 20:03:45 kirk kernel:  [<c011f96f>] do_group_exit+0x29/0x90    
[<c0102e75>] syscall_call+0x7/0xb
Feb  5 20:03:45 kirk kernel: Trying to fix it up, but a reboot is needed
Feb  5 20:03:45 kirk kernel: Bad page state at free_hot_cold_page (in process
'bzflag', page c14fa080)
Feb  5 20:03:45 kirk kernel: flags:0x80000400 mapping:00000000 mapcount:0
count:0 (Tainted: P    B)
Feb  5 20:03:45 kirk kernel: Backtrace:
Feb  5 20:03:45 kirk kernel:  [<c0141598>] bad_page+0x8c/0xc3     [<c0141e3f>]
free_hot_cold_page+0x3a/0x117
Feb  5 20:03:45 kirk kernel:  [<f8d0bba8>] nv_vm_free_pages+0xa3/0xf5 [nvidia] 
   [<f8d08782>] nv_kern_vma_release+0x7c/0x97 [nvidia]
Feb  5 20:03:45 kirk kernel:  [<c014dffd>] remove_vma+0x28/0x45     [<c0150074>]
exit_mmap+0xb1/0xda
Feb  5 20:03:45 kirk kernel:  [<c011ad09>] mmput+0x1f/0x95     [<c011f647>]
do_exit+0xfc/0x3cf
Feb  5 20:03:45 kirk kernel:  [<c0106edd>] do_syscall_trace+0x1ac/0x1c4    
[<c011f96f>] do_group_exit+0x29/0x90
Feb  5 20:03:45 kirk kernel:  [<c0102e75>] syscall_call+0x7/0xb
Feb  5 20:03:45 kirk kernel: Trying to fix it up, but a reboot is needed

Comment 4 Penelope Fudd 2006-02-06 04:18:54 UTC
I booted from 2.6.14-1.1656_FC4 and it doesn't crash.
If I boot from 2.6.15-1.1830_FC4 it crashes every time I quit bzflag, even from
the starting screen.



Note You need to log in before you can comment on or make changes to this bug.