Bug 89531

Summary:	intermittent kernel oops on PIII
Product:	[Retired] Red Hat Linux	Reporter:	Jason Vasquez <jason>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED WONTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	9
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-09-30 15:40:50 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jason Vasquez 2003-04-23 20:37:39 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4a) Gecko/20030401

Description of problem:
I am receiving a kernel Oops on what appears to be an apparently random basis. 
This Oops will sometimes have no immediate effect, and other times will seem to
freeze all disk IO. (network ports still functioning, command prompt ok, etc.),
but anything that hits the disk will hang (e.g., ls).






Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
No clear path to reproduce.

Additional info:

[root@obiwan /]# uname -a
Linux obiwan.xxxxxxxx 2.4.20-9 #1 Wed Apr 2 13:42:50 EST 2003 i686 i686 i386
GNU/Linux


Entry from /var/log/messages:
Apr 23 15:00:01 obiwan kernel: ------------[ cut here ]------------
Apr 23 15:00:01 obiwan kernel: kernel BUG at vmscan.c:545!
Apr 23 15:00:01 obiwan kernel: invalid operand: 0000
Apr 23 15:00:01 obiwan kernel: smbfs parport_pc lp parport autofs ipt_ttl
ipt_unclean ipt_TCPMSS ip_nat_irc ip_nat_ftp ipt_limit ipt_state iptable_mangle
ipt_LOG ipt_MASQUERADE ipt_TOS ipt_
Apr 23 15:00:01 obiwan kernel: CPU:    0
Apr 23 15:00:01 obiwan kernel: EIP:    0060:[<c013c419>]    Not tainted
Apr 23 15:00:01 obiwan kernel: EFLAGS: 00010206
Apr 23 15:00:01 obiwan kernel:
Apr 23 15:00:01 obiwan kernel: EIP is at refill_inactive_zone [kernel] 0x469
(2.4.20-9)
Apr 23 15:00:01 obiwan kernel: eax: 00080002   ebx: c16edab8   ecx: c030d680  
edx: 00000000
Apr 23 15:00:01 obiwan kernel: esi: 00007ea9   edi: c16eda9c   ebp: 00007eaa  
esp: c170df9c
Apr 23 15:00:01 obiwan kernel: ds: 0068   es: 0068   ss: 0068
Apr 23 15:00:01 obiwan kernel: Process kswapd (pid: 5, stackpage=c170d000)
Apr 23 15:00:01 obiwan kernel: Stack: c170dfbc c013d398 c030d834 c030d7b4
00000006 fffffffe 00000006 0000003a
Apr 23 15:00:02 obiwan kernel:        000005dc 00000001 c170c000 c030d680
c170c305 00000000 c013d0dc c030d680
Apr 23 15:00:02 obiwan kernel:        00000006 00000006 c013cff0 00000000
00000000 c010742d 00000000 00000000
Apr 23 15:00:02 obiwan kernel: Call Trace:   [<c013d398>] wakeup_memwaiters
[kernel] 0xd8 (0xc170dfa0))
Apr 23 15:00:02 obiwan kernel: [<c013d0dc>] kswapd [kernel] 0xec (0xc170dfd4))
Apr 23 15:00:02 obiwan kernel: [<c013cff0>] kswapd [kernel] 0x0 (0xc170dfe4))
Apr 23 15:00:02 obiwan kernel: [<c010742d>] kernel_thread_helper [kernel] 0x5
(0xc170dff0))
Apr 23 15:00:02 obiwan kernel:
Apr 23 15:00:02 obiwan kernel:
Apr 23 15:00:02 obiwan kernel: Code: 0f 0b 21 02 b0 f3 25 c0 e9 61 fe ff ff 8b
4c 24 3c 8b 99 38
Apr 23 15:00:06 obiwan kernel:  <1>Unable to handle kernel NULL pointer
dereference at virtual address 00000074
Apr 23 15:00:06 obiwan kernel:  printing eip:
Apr 23 15:00:06 obiwan kernel: c0143ddb
Apr 23 15:00:06 obiwan kernel: *pde = 00000000
Apr 23 15:00:06 obiwan kernel: Oops: 0000
Apr 23 15:00:06 obiwan kernel: smbfs parport_pc lp parport autofs ipt_ttl
ipt_unclean ipt_TCPMSS ip_nat_irc ip_nat_ftp ipt_limit ipt_state iptable_mangle
ipt_LOG ipt_MASQUERADE ipt_TOS ipt_
Apr 23 15:00:06 obiwan kernel: CPU:    0
Apr 23 15:00:06 obiwan kernel: EIP:    0060:[<c0143ddb>]    Not tainted
Apr 23 15:00:06 obiwan kernel: EFLAGS: 00010202
Apr 23 15:00:06 obiwan kernel:
Apr 23 15:00:06 obiwan kernel: EIP is at page_referenced [kernel] 0x21b (2.4.20-9)
Apr 23 15:00:06 obiwan kernel: eax: c1000030   ebx: 00000003   ecx: 00000000  
edx: 00000001
Apr 23 15:00:06 obiwan kernel: esi: 0000001e   edi: c1356084   ebp: 00000000  
esp: dffc5f84
Apr 23 15:00:06 obiwan kernel: ds: 0068   es: 0068   ss: 0068
Apr 23 15:00:06 obiwan kernel: Process kscand/Normal (pid: 7, stackpage=dffc5000)
Apr 23 15:00:06 obiwan kernel: Stack: df689780 00000000 00000001 dffc5fb4
c030d898 c030d898 c030d834 c14041a4
Apr 23 15:00:06 obiwan kernel:        00000000 c013c6de dffc4000 c0125bb0
00000001 00000000 dffc4000 c030d680
Apr 23 15:00:06 obiwan kernel:        dffc4000 c013d5f4 c030d680 00000000
00000000 c025f3fb 000009c4 c013d500
Apr 23 15:00:06 obiwan kernel: Call Trace:   [<c013c6de>] scan_active_list
[kernel] 0x3e (0xdffc5fa8))
Apr 23 15:00:06 obiwan kernel: [<c0125bb0>] process_timeout [kernel] 0x0
(0xdffc5fb0))
Apr 23 15:00:06 obiwan kernel: [<c013d5f4>] kscand [kernel] 0xf4 (0xdffc5fc8))
Apr 23 15:00:06 obiwan kernel: [<c013d500>] kscand [kernel] 0x0 (0xdffc5fe0))
Apr 23 15:00:06 obiwan kernel: [<c010742d>] kernel_thread_helper [kernel] 0x5
(0xdffc5ff0))
Apr 23 15:00:06 obiwan kernel:
Apr 23 15:00:06 obiwan kernel:
Apr 23 15:00:06 obiwan kernel: Code: 8b 41 74 39 41 60 0f 43 54 24 04 45 4e 89
54 24 04 0f 89 3e

add'l info:
[root@obiwan /]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 7
model name      : Pentium III (Katmai)
stepping        : 3
cpu MHz         : 451.033
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
pse36 mmx fxsr sse
bogomips        : 901.12

[root@obiwan /]# cat /proc/pci
PCI devices found:
  Bus  0, device   0, function  0:
    Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 3).
      Master Capable.  Latency=32.
      Prefetchable 32 bit memory at 0xe0000000 [0xe3ffffff].
  Bus  0, device   1, function  0:
    PCI bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 3).
      Master Capable.  Latency=64.  Min Gnt=136.
  Bus  0, device   7, function  0:
    ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 2).
  Bus  0, device   7, function  1:
    IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 1).
      Master Capable.  Latency=32.
      I/O at 0xf000 [0xf00f].
  Bus  0, device   7, function  2:
    USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 1).
      IRQ 10.
      Master Capable.  Latency=32.
      I/O at 0xa000 [0xa01f].
  Bus  0, device   7, function  3:
    Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 2).
      IRQ 9.
  Bus  0, device  11, function  0:
    Ethernet controller: Lite-On Communications Inc LNE100TX (rev 32).
      IRQ 11.
      Master Capable.  Latency=32.
      I/O at 0xa400 [0xa4ff].
      Non-prefetchable 32 bit memory at 0xeb000000 [0xeb0000ff].
  Bus  0, device  13, function  0:
    Multimedia video controller: Creative Labs SB Live! EMU10k1 (rev 5).
      Master Capable.  Latency=255.  Min Gnt=2.Max Lat=20.
      I/O at 0xa800 [0xa81f].
  Bus  0, device  13, function  1:
    Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 5).
      Master Capable.  Latency=32.
      I/O at 0xac00 [0xac07].
  Bus  0, device  17, function  0:
    Ethernet controller: National Semiconductor Corporation DP83815 (MacPhyter)
Ethernet Controller (rev 0).
      IRQ 10.
      Master Capable.  Latency=32.  Min Gnt=11.Max Lat=52.
      I/O at 0xb000 [0xb0ff].
      Non-prefetchable 32 bit memory at 0xeb001000 [0xeb001fff].
  Bus  0, device  19, function  0:
    Unknown mass storage controller: Triones Technologies, Inc.
HPT366/368/370/370A/372 (rev 1).
      IRQ 11.
      Master Capable.  Latency=120.  Min Gnt=8.Max Lat=8.
      I/O at 0xb400 [0xb407].
      I/O at 0xb800 [0xb803].
      I/O at 0xbc00 [0xbcff].
  Bus  0, device  19, function  1:
    Unknown mass storage controller: Triones Technologies, Inc.
HPT366/368/370/370A/372 (#2) (rev 1).
      IRQ 11.
      Master Capable.  Latency=120.  Min Gnt=8.Max Lat=8.
      I/O at 0xc000 [0xc007].
      I/O at 0xc400 [0xc403].
      I/O at 0xc800 [0xc8ff].
  Bus  1, device   0, function  0:
    VGA compatible controller: nVidia Corporation NV5 [Riva TnT2] (rev 17).
      IRQ 5.
      Master Capable.  Latency=32.  Min Gnt=5.Max Lat=1.
      Non-prefetchable 32 bit memory at 0xe4000000 [0xe4ffffff].
      Prefetchable 32 bit memory at 0xe6000000 [0xe7ffffff].

Comment 1 Arjan van de Ven 2003-04-23 20:40:44 UTC

hmmm this sounds like memory corruption or bad memory on first sight.... is
there a way you can run memtest86 to test the ram ?

Comment 2 Jason Vasquez 2003-04-23 20:46:13 UTC

That was my first thought as well...I let memtest86 run through 2 complete runs
with 0 errors -- I can rerun again for a longer period if it seems worthwhile.

Comment 3 Jason Vasquez 2003-04-24 11:10:10 UTC

I just let memtest86 run for 8 hours with 0 errors.  Any other ideas?  Please
let me know if any other info would be helpful.

Thanks,
Jason

Comment 4 Jason Vasquez 2003-05-13 16:22:13 UTC

FYI, I have updated to kernel 2.4.21-rc1-ac1 (non-RPM).  The machine has been up
6  days now with zero kernel Oops'.  (I had been seeing them previously within
24-36 hours of each other).  I will continue to monitor and update this report,
but things seem to be better at this point...

Comment 5 Alan Cox 2003-06-05 16:20:27 UTC

Thanks thats good info - it implies whatever was corrupting something was a
kernel bug, and better yet its fixed 8)

Comment 6 Jay Denebeim 2003-07-13 19:24:40 UTC

For whatever it's worth, I have a duron based machine that's having a similar
trap (regularly).  It's running an uptodate redhat 9 system, with exim and inn
added.  Here's the trap I'm getting:

> Unable to handle kernel NULL pointer dereference at virtual address
> 00000074
>  printing eip:
> c01418c0
> *pde = 00000000
> Oops: 0000
> parport_pc lp parport nfsd lockd sunrpc autofs ppp_deflate zlib_deflate
> ppp_async ppp_generic slhc 8139too mii iptable_mangle ipt_MASQUERADE
> ipt_REDIRECT ipta
> CPU:    0
> EIP:    0060:[<c01418c0>]    Not tainted
> EFLAGS: 00010202
>
> EIP is at page_referenced [kernel] 0x210 (2.4.20-18.9)
> eax: c1000030   ebx: 00000200   ecx: 00000000   edx: 00000001
> esi: 00000000   edi: c1b31b00   ebp: 0000000e   esp: cf771f84
> ds: 0068   es: 0068   ss: 0068
> Process kscand/Normal (pid: 7, stackpage=cf771000)
> Stack: cf393bc0 00000000 00000002 cf771fb4 c135b938 c135b938 c030900c
> c135b98c
>        00000003 c013a644 cf770000 c0125460 00000001 00000003 cf770000
> c0308f00
>        cf770000 c013b484 c0308f00 00000003 00000001 c025ab4c 000009c4
> c013b3d0
> Call Trace:   [<c013a644>] scan_active_list [kernel] 0x34 (0xcf771fa8))
> [<c0125460>] process_timeout [kernel] 0x0 (0xcf771fb0))
> [<c013b484>] kscand [kernel] 0xb4 (0xcf771fc8))
> [<c013b3d0>] kscand [kernel] 0x0 (0xcf771fe0))
> [<c010727d>] kernel_thread_helper [kernel] 0x5 (0xcf771ff0))
>
>
> Code: 8b 41 74 39 41 60 0f 43 54 24 04 45 4e 89 54 24 04 0f 89 3e


Feel free to contact me if I can be of assistance.  I'm building a standard
2.4.21 kernel that I'll install as soon as I get the RPM built to see if I
continue to get the problem.

It started the last time I upgraded the kernel kernel-2.4.20-18.9 I believe.  It
happened once, then waited a week to happen again, the frequency has increased
since then, now it's happening several times a day.

Jay

Comment 7 Bugzilla owner 2004-09-30 15:40:50 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/