Bug 14864

Summary:	OOPS - Unable to handle kernel paging request
Product:	[Retired] Red Hat Linux	Reporter:	Richard Ames <richard>
Component:	kernel	Assignee:	Alan Cox <alan>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.2	CC:	fpotter, trev
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2000-10-30 20:28:39 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Richard Ames 2000-07-31 00:25:33 UTC

[root@server1 log]# uname -a
Linux server1 2.2.16-3 #1 Mon Jun 19 19:11:44 EDT 2000 i686 unknown

Appears to have occured just after the cron.daily jobs:
Jul 29 04:02:00 server1 anacron[1297]: Updated timestamp for job 
`cron.daily' to
 2000-07-29
...
Jul 29 04:02:33 server1 kernel: current->tss.cr3 = 00101000, %cr3 = 
00101000
Jul 29 04:02:33 server1 kernel: *pde = 00000000
Jul 29 04:02:33 server1 kernel: Oops: 0000
Jul 29 04:02:33 server1 kernel: CPU:    0
Jul 29 04:02:33 server1 kernel: EIP:    0010:[find_vma+62/100]
Jul 29 04:02:33 server1 kernel: EFLAGS: 00010206
Jul 29 04:02:33 server1 kernel: eax: 03ba08c0   ebx: c28583b8   ecx: 
c0219588
edx: 00000000
Jul 29 04:02:33 server1 kernel: esi: c0219588   edi: c2858000   ebp: 
00000030
esp: c3ef7f8c
Jul 29 04:02:33 server1 kernel: ds: 0018   es: 0018   ss: 0018
Jul 29 04:02:33 server1 kernel: Process kswapd (pid: 5, process nr: 5, 
stackpage
=c3ef7000)
Jul 29 04:02:33 server1 kernel: Stack: c28583b8 c0219588 c2858000 00000000 
00000
003 00000001 c012222b c2858000
Jul 29 04:02:33 server1 kernel:        00000030 0000001f 00000006 00000030 
00000
e00 c3ed8be0 c01222a2 00000006
Jul 29 04:02:33 server1 kernel:        00000030 c3ef6000 c01d8fce c3ef61c1 
c0122
337 00000030 00000f00 c3febfc0
Jul 29 04:02:33 server1 kernel: Call Trace: [swap_out+171/204] 
[do_try_to_free_p
ages+86/120] [tvecs+7374/13920] [kswapd+115/180] [get_options+0/116] 
[kernel_thr
ead+35/48]
Jul 29 04:02:33 server1 kernel: Code: 39 48 08 76 0d 89 c2 39 4a 04 76 0e 
8b 42
18 eb eb 90 8b 40
Jul 29 04:02:33 server1 kernel: Unable to handle kernel paging request at 
virtua
l address 03ba08c8

Comment 1 Fridiric Potter 2000-09-03 14:22:25 UTC

I encountered the same bug under the same conditions.

During the cron.daily ( 'locate' script) my kernel reported the following
error.

Then the system started to misbehave (process freezing, inetd malfunction 
etc...). I had to reboot the system to recover normal operations.

about 15 days of uptime before it occured.

This Kernel supports an Intelligent Network Peripheral in a TELCO network in
France. This 'strange' and 'step-by step' kernel failure resulted in a 
function outage on that network. Any news & feedback appreciated.

Thanks for your help


     Fred

Sep  3 04:02:14 ss7orleans kernel: Unable to handle kernel paging request at vir
tual address c924ddd0
Sep  3 04:02:14 ss7orleans kernel: current->tss.cr3 = 0457d000, %cr3 = 0457d000
Sep  3 04:02:14 ss7orleans kernel: *pde = 00000000
Sep  3 04:02:14 ss7orleans kernel: Oops: 0002
Sep  3 04:02:14 ss7orleans kernel: CPU:    0
Sep  3 04:02:14 ss7orleans kernel: EIP:    0010:[prune_dcache+126/300]
Sep  3 04:02:14 ss7orleans kernel: EFLAGS: 00010286
Sep  3 04:02:14 ss7orleans kernel: eax: c124ddd0   ebx: c3ceec80   ecx: c924ddd0
   edx: c3ceec98
Sep  3 04:02:14 ss7orleans kernel: esi: c3ceec60   edi: c124ddc0   ebp: 00000873
   esp: c4fc7e8c
Sep  3 04:02:14 ss7orleans kernel: ds: 0018   es: 0018   ss: 0018
Sep  3 04:02:14 ss7orleans kernel: Process slocate (pid: 25956, process nr: 106,
 stackpage=c4fc7000)
Sep  3 04:02:14 ss7orleans kernel: Stack: c0254a04 00001006 c4fc7ecc 00000000 00
001006 c0135308 fffff86d 00001006
Sep  3 04:02:14 ss7orleans kernel:        00000000 c0283720 c0254a04 c0283720 c4
3cdba0 c7097210 c709725c 00000000
Sep  3 04:02:14 ss7orleans kernel:        c4fc7ecc c4fc7ecc c0135396 00001006 00
000000 c0283720 c0254a04 c0283720
Sep  3 04:02:14 ss7orleans kernel: Call Trace: [try_to_free_inodes+316/396] [gro
w_inodes+30/440] [get_new_inode+182/300] [get_new_inode+197/300] [iget+112/120]
[ext2_lookup+84/124] [real_lookup+80/160]
Sep  3 04:02:14 ss7orleans kernel:        [lookup_dentry+296/488] [__namei+40/88
] [sys_newlstat+42/140] [system_call+52/56] [startup_32+43/164]
Sep  3 04:02:14 ss7orleans kernel: Code: 89 01 89 56 38 89 56 3c 83 7f 1c 01 0f
94 c0 31 c9 88 c1 89

Comment 2 Alan Cox 2000-09-16 21:53:32 UTC

Can you run memtest86 on the hardware and check the ram passes. A lot of these
problems seem to be hardware (bad ram or psu underpowered and very high disk
load) - but not all...

Comment 3 Richard Ames 2000-09-16 22:35:40 UTC

We ran memtest86 on this box for a full weekend (2xx passes) in the middle of 
August.  No faults.

As of last Wednesday the problem was still occuring, interval from about 2 days 
to a week.

The start of the issue was after the machine was powered down by pulling the 
plug from the wall  :-) so we suspect some kind of hardware fault or SW / disk 
corruption.

As it always says 'kernel paging' we decided to change the swap file...  We had 
a 600+ megabyte partition as swap (too big, config mistake), bad block scan 
said the partition was OK.  Anyway, we now have a 16 mbyte file as swap... and 
we are waiting for developments.

We are willing to do your bidding to help correct this.....

Comment 4 Fridiric Potter 2000-09-22 06:40:25 UTC

+Can you run memtest86 on the hardware and check the ram passes. A lot of these
+problems seem to be hardware (bad ram or psu underpowered and very high disk
+load) - but not all...

I agree with you. Some of those problems seem to be hardware.

In fact, the computer ran kernel 2.0.36 for like 'one year' before being update
to 2.2.16 whithout
any problems.
After 15 days of 2.2.16 it then crashed like previously described. I don't
suspect it is an hardware
issue.

fred

Comment 5 Richard Ames 2000-10-11 06:36:55 UTC

Since changing the swap file on 16 September (see above), this crash has not 
happened again...... so that may lead one to suspect swap file corruption ?????

Richard.

Comment 6 Trevor Peirce 2000-10-30 20:27:30 UTC

I'm experiencing a similar problem, crashing every 2 to 10 days, and those names slocate and kswapd are very framilar in these Oops problems. I've 
commented on Bug #2005 about the problem.

However, I have repartitioned and reformatted the harddrive in an attempt to clean up teh problem, but haven't had any luck.  My swap partition is 128 MB 
and usually there's about 2 MB of that used at any given time.  The system is a Pentium 150 with 48 MB Ram installed -- I read somewhere that one 
user fixed this exact problem by recompiling the kernel without CONFIG_APM disabled.. could that have anything to do with it?

Right now I've got the kernel that ships with RH 6.2.

Comment 7 Trevor Peirce 2000-10-30 20:28:37 UTC

Woops, Bug #20005

Comment 8 Richard Ames 2001-08-07 20:31:13 UTC

System has been running now for about 9 months without a reapperance of the\is
problem ....