Bug 73729

Summary: 2.4.18-10 kernel crashes during heavy NFS traffic
Product: [Retired] Red Hat Linux Reporter: John Kuehne <jwkuehne>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-09-09 17:31:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Kuehne 2002-09-09 17:31:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020513

Description of problem:
Dell installed RH7.3, updated on rhn.redhat.com to latest on all
packages, is crashing intermittently during heavy filesystem/NFS
activity on nightly backups. A sample kernel message is
Aug 30 03:09:38 amanda kernel: ------------[ cut here ]------------
Aug 30 03:09:38 amanda kernel: kernel BUG at page_alloc.c:117!
Aug 30 03:09:38 amanda kernel: invalid operand: 0000
Aug 30 03:09:38 amanda kernel: i810_audio ac97_codec soundcore agpgart NVdriver 
binfmt_misc nfs nfsd lockd su
Aug 30 03:09:38 amanda kernel: CPU:    0
Aug 30 03:09:38 amanda kernel: EIP:    0010:[<c01316e7>]    Tainted: P
Aug 30 03:09:38 amanda kernel: EFLAGS: 00010282
Aug 30 03:09:38 amanda kernel:
Aug 30 03:09:38 amanda kernel: EIP is at __free_pages_ok [kernel] 0x57 (2.4.18-4
)
Aug 30 03:09:38 amanda kernel: eax: 00000020   ebx: c1655748   ecx: 00000001   e
dx: 00002866
Aug 30 03:09:38 amanda kernel: esi: 00000000   edi: c02c47bc   ebp: 00000000   e
sp: c177ff58
Aug 30 03:09:38 amanda kernel: ds: 0018   es: 0018   ss: 0018
Aug 30 03:09:38 amanda kernel: Process kswapd (pid: 5, stackpage=c177f000)
Aug 30 03:09:38 amanda kernel: Stack: c0225115 00000075 db6e47e0 c1655748 c013d0
e3 dfe29a00 c14bad58 00000030
Aug 30 03:09:38 amanda kernel:        c013b23a c1655748 c1655764 c02c47bc db6e47
e0 c012f164 c1655748 00000030
Aug 30 03:09:38 amanda kernel:        c1655748 c1655764 c02c47bc 00000147 c01307
26 dff67130 c177e000 c02c47e4
Aug 30 03:09:38 amanda kernel: Call Trace: [<c013d0e3>] try_to_free_buffers [ker
nel] 0xb3
Aug 30 03:09:38 amanda kernel: [<c013b23a>] try_to_release_page [kernel] 0x3a
Aug 30 03:09:38 amanda kernel: [<c012f164>] drop_page [kernel] 0x34
Aug 30 03:09:38 amanda kernel: [<c0130726>] refill_inactive_zone [kernel] 0x206
Aug 30 03:09:38 amanda kernel: [<c0131090>] kswapd [kernel] 0x280
Aug 30 03:09:38 amanda kernel: [<c0105000>] stext [kernel] 0x0
Aug 30 03:09:38 amanda kernel: [<c0107136>] kernel_thread [kernel] 0x26
Aug 30 03:09:38 amanda kernel: [<c0130e10>] kswapd [kernel] 0x0
Aug 30 03:09:38 amanda kernel:
Aug 30 03:09:38 amanda kernel:
Aug 30 03:09:38 amanda kernel: Code: 0f 0b 5d 58 8b 3d f0 e2 32 c0 89 d8 29 f8 6
9 c0 b7 6d db b6

The script that causes this is a simple tar-in-a-pipe across NFS:

#!/bin/csh
cd /nfs/otto; chmod -R 700 anna; rm -rf anna; mkdir anna
cd /; /bin/tar cf - home | (cd /nfs/otto/anna; /bin/tar xf -)
/usr/bin/du -sk home > /nfs/otto/anna/anna.du



Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1.Run script given in Description on filesystems containing several GB
2.Repeat nightly until no good
3.
	

Actual Results:  Occasional kernel crashes. System is left dead. Enterprise
computing
showstopper.

Additional info:

Crashes occur randomly. The chance of crash seems to be about 5%,
and increases withthe amount of data being transported. Filesystems
containing more than 6Gb seem to crash more often than those
containing a few tens of megabytes.

In the above script, /nfs/otto is automounted with NFSv3. Nfsstat looks normal
before crashes. The crash usually occurs in the tar
phase.

Machines are Dell Precision/340 with Linux installed and up2dated.

Comment 1 Arjan van de Ven 2002-09-09 17:53:26 UTC
thank you for reporting this nvidia bug report. However you reported it to the
wrong company. Red Hat does not support the NVidia binary only kernel modules
nor can fix bugs in them (and this is a very frequently reported BUG in the
nvidia module). Please report your bug to NVidia instead.