Bug 8660

Summary:	2.2.14 (and 2.2.12-20) -> "Unable to handle paging request" and corruptions
Product:	[Retired] Red Hat Linux	Reporter:	Ian Jones <ian.jones>
Component:	kernel	Assignee:	Michael K. Johnson <johnsonm>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	high	Docs Contact:
Priority:	medium
Version:	6.1	CC:	army, harris, mashman, philw, wilburn, zack
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2000-08-08 19:07:16 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ian Jones 2000-01-20 18:04:25 UTC

I had these problems with the out-of-the-box 6.1, so I downloaded the
2.2.14 source and rebuilt, as the indications were that this would be more
stable.  Not so.  The problems usually occur when the system is busy
performing a copy of a large file.  300Mhz , 256M, with 3.5G and 10G
drives.  The following messages have accompanied the failures (timestamp
removed to make it a little easier to read):

localhost kernel: ll_rw_block: device 16:09: only 1024-char blocks
implemented (21504)
localhost kernel: Unable to handle kernel paging request at virtual
address 00001000
localhost kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
localhost kernel: *pde = 00000000
localhost kernel: Oops: 0000
localhost kernel: CPU:    0
localhost kernel: EIP:    0010:[__wake_up+25/72]
localhost kernel: EFLAGS: 00010213
localhost kernel: eax: c9373270   ebx: 00001000   ecx: c9373240   edx:
00000003
localhost kernel: esi: c937326c   edi: 00000003   ebp: cffe7f88   esp:
cffe7f84
localhost kernel: ds: 0018   es: 0018   ss: 0018
localhost kernel: Process kflushd (pid: 2, process nr: 2,
stackpage=cffe7000)
localhost kernel: Stack: 00000001 00000000 c0125ce5 00000000 c0172422
c9373240 00000000 c80c5f60
localhost kernel:        00004511 00000002 00000199 00000016 c0126fba
00000001 00000001 cffe7fec
localhost kernel:        00000f00 cfffbfcc c0106000 00000000 c0106000
c9373240 000046aa 00000008
localhost kernel: Call Trace: [end_buffer_io_sync+41/44]
[ll_rw_block+418/432] [bdflush+350/584] [get_options+0/116]
[get_options+0/116] [kernel_thread+35/48]
kernel: Code: 8b 13 8b 5b 04 8b 02 85 c7 74 f1 39 f3 74 0c 89 d0 e8 41 f9
localhost kernel: Unable to handle kernel paging request at virtual
address 00005034
localhost kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
localhost kernel: *pde = 00000000
localhost kernel: Oops: 0002
localhost kernel: CPU:    0
localhost kernel: EIP:    0010:[remove_from_queues+169/328]
localhost kernel: EFLAGS: 00010206
localhost kernel: eax: 00005000   ebx: 00000000   ecx: c9373240   edx:
cfe05664
localhost kernel: esi: c9373240   edi: 00000002   ebp: 00000008   esp:
cffe5fac
localhost kernel: ds: 0018   es: 0018   ss: 0018
localhost kernel: Process kupdate (pid: 3, process nr: 3,
stackpage=cffe5000)
localhost kernel: Stack: 00000002 c0125eb9 c9373240 c80c5f60 00007679
c0126c51 c0126cd1 c9373240
localhost kernel:        cffe4000 c01b0a6b cffe41c2 00000000 c9373240
c0127114 00000f00 cfffbfc0
localhost kernel:        c0106000 c0107aab 00000000 00000f00 c01e3fd8
localhost kernel: Call Trace: [refile_buffer+77/184]
[sync_old_buffers+21/400] [sync_old_buffers+149/400] [tvecs+11563/13280]
[kupdate+112/116] [get_options+0/116] [kernel_thread+35/48]
localhost kernel: Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00
00 ff 0d

Though the system will come back to life, I am badly suffering from dodgy
blocks in oracle datafiles.  I'm assuming these are related; but as I have
had the crashes when oracle has been shut, I think (for once) oracle is
the innocent party in the subsequent corruptions.  Any ideas?

Thanks.

Comment 1 zack 2000-01-26 17:34:59 UTC

I have had the same problem under both RedHat 5.1 and 6.1 (2.2.12-20) kernels,
on two different machines.  The processes under which this has occurred are:
killall, and w - during the 4 AM update on 5.1 - and, tar and rpm when used in
6.1.  A sample of the messages is below for anyone who understands kernel codes.
I have experienced severe filesystem corruption after these errors.


myhost kernel: Unable to handle kernel NULL pointer dereference at
 virtual address 00000000
myhost kernel: current->tss.cr3 = 06452000, %cr3 = 06452000
myhost kernel: *pde = 00000000
myhost kernel: Oops: 0000
myhost kernel: CPU:    0
myhost kernel: EIP:    0010:[kmem_cache_alloc+49/292]
myhost kernel: EFLAGS: 00010006
myhost kernel: eax: c6831fc0   ebx: c6831fc0   ecx: 00000000   edx
: 27d80000
myhost kernel: esi: 00000c00   edi: c7fff740   ebp: 00000282   esp
: c20a9e10
myhost kernel: ds: 0018   es: 0018   ss: 0018
myhost kernel: Process tar (pid: 2970, process nr: 62, stackpage=c20a9000)
myhost kernel: Stack: 00000001 00000400 c012727d c7fff740 00000003
 00000000 00000001 c012730a
myhost kernel:        00000001 00000400 00000308 00000330 c31bcaa0
 c20a9e50 c20a9e50 c20a8000
myhost kernel:        c20a8000 00000000 c0127977 c7b05000 00000400
 00000001 00000000 c20a9ef4
myhost kernel: Call Trace: [get_unused_buffer_head+85/160]
[create_buffers+66/408] [brw_page+131/880] [__brelse+19/84] [ext2_bmap+360/584]
[generic_readpage+129/144] [try_to_read_ahead+254/276]
myhost kernel:        [do_generic_file_read+750/1500] [generic_file_read+99/124]
[file_read_actor+0/80] [sys_read+174/196] [system_call+52/56]
myhost kernel: Code: 8b 01 89 03 85 c0 74 2b 8b 73 04 85 f6 75 10 89 19 89 c8 2b

myhost kernel: Unable to handle kernel NULL pointer dereference at
 virtual address 0000002b
myhost kernel: current->tss.cr3 = 05b4e000, %cr3 = 05b4e000
myhost kernel: *pde = 00000000
myhost kernel: Oops: 0000
myhost kernel: CPU:    0
myhost kernel: EIP:    0010:[try_to_read_ahead+111/276]
myhost kernel: EFLAGS: 00010202
myhost kernel: eax: c7fcfff0   ebx: 00191000   ecx: 00000000   edx
: 00000023
myhost kernel: esi: c27be550   edi: c240b000   ebp: 001a7000   esp
: c341ff18
myhost kernel: ds: 0018   es: 0018   ss: 0018
myhost kernel: Process rpm (pid: 2806, process nr: 19, stackpage=c341f000)
myhost kernel: Stack: c02ab708 001a7000 c7fcfff0 c011d67a c4b39a40
 001a7000 00000000 00000400
myhost kernel:        bffff17c 00000000 00000400 c7fcff1c 00016000
 00172000 00020000 0001f000
myhost kernel:        00000000 00000002 00000001 00172000 c27be550
 c011da1b c4b39a40 c4b39a54
myhost kernel: Call Trace: [do_generic_file_read+750/1500]
[generic_file_read+99/124] [file_read_actor+0/80] [sys_read+174/196]
[system_call+52/56] [startup_32+43/286]
myhost kernel: Code: 39 72 08 75 f4 39 6a 0c 75 ef ff 42 14 b8 02 00 00 00 0f ab

Comment 2 army 2000-05-25 23:53:59 UTC

Have had the same problems. I think the raid 1 has saved me of the corruptions
though (long rebuild on reboot)...
System is RH 6.1, software raid 1 (2 x 20GB HDD), 96MB RAM and 2 x 128 MB swap
(swap  and /boot not mirrored md0/1 both 9.8 GB), PPro 200
Here's my log:

code: 8b 01 89 03 85 c0 74 2b 8b 73 04 85 f6 75 10 89 19 89 c8 2b
Unable to handle kernel paging request at virtual address 002203df
current->tss.cr3 = 00101000, %cr3=00101000
*pde = 00000000
cpu: 0
eip: 0010 [<c012ea1>]
eflags: 00010006
eax: c1b53fe0 ebx: c1b53fe0 ecx: 002203df edx: c02a18c0 esc: c1838000
edi: c5fff020
ebp: 00000282 esp: c0225e30
process swapper (pid:0, process nr:0, stackpage: c0225000)
stack: 00000008 000b2000 c0120c2e c5fff020 00000008 c5fff2c8 c5fff2c0
00000008
00000286 c1838000 00000246 00000003 00000008 00000000 00000000 c01211f7
c5fff2c0
00000008 c3eca900 00000620 00000008 c56490e0 c014d379 00000624
call trace [<c0120c2e>] [<c01211f7>] [<c014d379>] [<c685560c>]
[<c685515f>] [<c010ab38>]
[<c0111a93>] [<c010ad32>] [<c010aaf7>] [<c01184f5>] [<c010ae54>]
[<c010ab38>] [<c01085fd>]
[<c0106000>] [<c0108620>] [<c0109d08>] [<c0106000>] [<c010607b>]
[<c0106000>] [<c0100176>]

code: 8b 01 89 03 85 c0 74 2b 8b 73 04 85 f6 75 10 89 19 89 c8 2b
Aiee, kernel panic
In swapper task - not synching

Comment 3 Alan Cox 2000-08-08 19:07:14 UTC

Let me know if any of these boxes pass memtest86 and show the problem still in
6.2 or 6.9.5

Comment 4 Alan Cox 2000-09-15 18:21:03 UTC

(re-open if you ever run the memtests)