Bug 111718

Summary:	kernel oops after bad swap file entry
Product:	[Retired] Red Hat Linux	Reporter:	Need Real Name <tim>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED ERRATA	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.3	CC:	riel
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-01-05 04:30:13 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Need Real Name 2003-12-09 10:12:13 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031114

Description of problem:
our P4-based file server, which has never kernel panicked, oopsed, or
crashed in a year and a half of use, had a kernel oops a few days ago.
here is the relevant output from dmesg:

swap_free: Bad swap file entry 18f56044
Unable to handle kernel NULL pointer dereference at virtual address
00000070
 printing eip:
c013ade2
*pde = 00000000
Oops: 0000
smbfs radeon agpgart binfmt_misc nfs nfsd lockd sunrpc autofs e1000
ide-scsi ide-cd cdrom ehci-hcd usb-ohci usb-uhci usbcore ext3 jbd
raid5 xor ncr53c8xx sd_m
CPU:    0
EIP:    0010:[<c013ade2>]    Not tainted
EFLAGS: 00010202
 
EIP is at page_referenced [kernel] 0x242 (2.4.20-24.7)
eax: c1d84ec8   ebx: c1000030   ecx: 00000000   edx: 00000000
esi: f6535980   edi: 0000001d   ebp: c1575b00   esp: c34b1f7c
ds: 0018   es: 0018   ss: 0018
Process kscand (pid: 6, stackpage=c34b1000)
Stack: 00000001 00000000 00000000 c34b1fac c1575b1c c1575b00 00000003
000001f4
       c0133a9d c34b1fac c15edd0c c02df334 00000001 c34b0000 c02df1e8
00000003
       000001f4 c0135889 c02df1e8 00000003 00000001 c34b0000 00000001
00000000
Call Trace:   [<c0133a9d>] scan_active_list [kernel] 0x5d (0xc34b1f9c))
[<c0135889>] kscand [kernel] 0xc9 (0xc34b1fc0))
[<c0105000>] stext [kernel] 0x0 (0xc34b1fe8))
[<c0107146>] arch_kernel_thread [kernel] 0x26 (0xc34b1ff0))
[<c01357c0>] kscand [kernel] 0x0 (0xc34b1ff8))
 
 
Code: 8b 41 70 42 39 41 5c 0f 43 54 24 04 ff 04 24 4f 89 54 24 04
  
the swap_free error occured at 7:17:59 AM and the oops at 7:18:02 AM
last saturday morning. the only significant thing i think it would
have been doing was rsync'ing a wad of CCD data (few hundred MB at
most over 100 Mbit wire) from the machines on the mountain.  the
machine did not crash and seems to have been working fine since the
crash.  xinetd croaked and needed to be restarted since then, but i
never noticed anything else until i happened to run dmesg tonight.   

i noticed a few other bugzilla references to kscand problems with some
earlier errata kernel, but none seemed completely relevant to what i
see here.  


Version-Release number of selected component (if applicable):
kernel-2.4.20-24.7.i686

How reproducible:
Didn't try

Steps to Reproduce:
haven't tried to reproduce since it's a production server, though this
is a problem never before seen on this hardware. 

Actual Results:  the machine continued to mostly work as if nothing
happened, though xinetd did crash subsequent to the oops. 

Expected Results:  usually kernel oops mess things up badly so i'm
probably lucky.

Additional info:

output from free:
             total       used       free     shared    buffers     cached
Mem:       1031184     942960      88224          0      98516     615464
-/+ buffers/cache:     228980     802204
Swap:      2096480      83864    2012616

/proc/cpuinfo:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping        : 4
cpu MHz         : 2018.022
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 4023.91

df:
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/md0             459327632 150264584 285730492  35% /
/dev/hda1               505605     13304    466197   3% /boot
none                    515592         0    515592   0% /dev/shm

/proc/interrupts:
          CPU0
  0:   46022934          XT-PIC  timer
  1:          6          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  4:    2675596          XT-PIC  serial
  5:          0          XT-PIC  usb-uhci
  6:         30          XT-PIC  ncr53c8xx
  8:          1          XT-PIC  rtc
  9:   10457017          XT-PIC  ide2, ide3, ide4, ide5, usb-uhci,
usb-ohci, usb-ohci, ehci-hcd
 10:   10998648          XT-PIC  eth0
 12:         32          XT-PIC  PS/2 Mouse
 14:    2863036          XT-PIC  ide0
 15:       6319          XT-PIC  ide1
NMI:          0
ERR:     169977

uptime:
  3:07am  up 5 days,  7:51,  5 users,  load average: 0.17, 0.25, 0.16

Comment 1 Dave Jones 2004-01-05 04:30:13 UTC

Various VM stability fixes went into the -27.7 errata kernel. Try that.

RHL 7 & 8 are now EOL, so won't recieve further updates.