208378 – Unable to handle kernel paging request (doing rsync)

Bug 208378 - Unable to handle kernel paging request (doing rsync)

Summary: Unable to handle kernel paging request (doing rsync)

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.7
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Eric Sandeen
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-09-28 08:46 UTC by Jeremy Sanders
Modified:	2012-06-20 16:12 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-06-20 16:12:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jeremy Sanders 2006-09-28 08:46:19 UTC

Description of problem:

System crashed rsyncing contents of one partition to another (both ext3).

Version-Release number of selected component (if applicable):

kernel-2.6.9-42.0.2.EL.i686
(on Scientific Linux 4.3, but I thought you'd also like to know about it).

How reproducible:

Random. Not reproducable.

This is from /var/log/messages:
Sep 28 04:02:08 xpc7 kernel: kjournald starting.  Commit interval 5 seconds
Sep 28 04:02:08 xpc7 kernel: EXT3 FS on hdc1, internal journal
Sep 28 04:02:08 xpc7 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 28 04:02:19 xpc7 kernel: Unable to handle kernel paging request at virtual
address 08000000
Sep 28 04:02:19 xpc7 kernel:  printing eip:
Sep 28 04:02:19 xpc7 kernel: c01877a1
Sep 28 04:02:19 xpc7 kernel: *pde = 39fd7067
Sep 28 04:02:19 xpc7 kernel: Oops: 0000 [#1]
Sep 28 04:02:19 xpc7 kernel: Modules linked in: joydev loop nfsd exportfs nfs
lockd nfs_acl autofs4 eeprom lm85 i2c_sensor i2c_i801 i2c_dev i2c_core sunrpc
dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd hw_random snd_ens1371
snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc snd_ac97_codec snd soundcore tg3 e100 mii floppy ext3 jbd
sata_sil libata sd_mod scsi_mod
Sep 28 04:02:19 xpc7 kernel: CPU:    0
Sep 28 04:02:19 xpc7 kernel: EIP:    0060:[<c01877a1>]    Not tainted VLI
Sep 28 04:02:19 xpc7 kernel: EFLAGS: 00010206   (2.6.9-42.0.2.EL) 
Sep 28 04:02:19 xpc7 kernel: EIP is at __d_lookup+0x65/0x1ef
Sep 28 04:02:19 xpc7 kernel: eax: c186f8fc   ebx: b95798b7   ecx: 00000011  
edx: c1822200
Sep 28 04:02:19 xpc7 kernel: esi: e3d43f0c   edi: 08000000   ebp: c8f07780  
esp: e3d43df0
Sep 28 04:02:19 xpc7 kernel: ds: 007b   es: 007b   ss: 0068
Sep 28 04:02:19 xpc7 kernel: Process rsync (pid: 9950, threadinfo=e3d43000
task=f642d8f0)
Sep 28 04:02:19 xpc7 kernel: Stack: 00000000 c186f8fc f7f57020 b95798b7 0000000d
e3d43e60 b95798b7 e3d43f0c 
Sep 28 04:02:19 xpc7 kernel:        00000000 e3d43e60 c017b783 c18edf80 e3d43e58
b95798b7 c826fbf4 b95798b7 
Sep 28 04:02:19 xpc7 kernel:        e3d43f0c c017c22c 00000000 ea42ba60 d274c7c8
d929c03d 40d1baa6 00000000 
Sep 28 04:02:19 xpc7 kernel: Call Trace:
Sep 28 04:02:19 xpc7 kernel:  [<c017b783>] do_lookup+0x1f/0x8f
Sep 28 04:02:19 xpc7 kernel:  [<c017c22c>] __link_path_walk+0xa39/0xd98
Sep 28 04:02:19 xpc7 kernel:  [<c017c5cc>] link_path_walk+0x41/0xb9
Sep 28 04:02:19 xpc7 kernel:  [<c017c8c4>] path_lookup+0x104/0x135
Sep 28 04:02:19 xpc7 kernel:  [<c017ca09>] __user_walk+0x21/0x51
Sep 28 04:02:19 xpc7 kernel:  [<c0176c1b>] vfs_lstat+0x11/0x37
Sep 28 04:02:19 xpc7 kernel:  [<c0177210>] sys_lstat64+0xf/0x23
Sep 28 04:02:19 xpc7 kernel:  [<c0318e57>] syscall_call+0x7/0xb
Sep 28 04:02:19 xpc7 kernel:  [<c031007b>] build_polexpire+0x81/0xe0
Sep 28 04:02:19 xpc7 kernel: Code: 24 0c 89 c2 81 f2 01 00 37 9e d3 ea 31 d0 8b
15 68 55 43 c0 23 05 60 55 43 c0 8d 04 82 89 44 24 04 8b 38 85 ff 0f 84 7f 01 00
00 <8b> 07 0f 18 00 90 8d 5f 88 8b 44 24 0c 39 43 2c 0f 85 62 01 00 
Sep 28 04:02:19 xpc7 kernel:  <0>Fatal exception: panic in 5 seconds
Sep 28 09:10:59 xpc7 syslogd 1.4.1: restart.

CPU is Intel P4 2.8GHz on Intel D845PEBT2 motherboard.

Comment 1 Eric Sandeen 2007-05-25 15:33:24 UTC

Jeremy, are you still seeing this?  Any chance you could gather a dump next time
it happens if so?

Comment 2 Jeremy Sanders 2007-05-25 15:48:56 UTC

I'm afraid I haven't seen this since reporting it. I'll report it if it happens
again.

Comment 3 Jeff Burke 2007-06-06 12:19:40 UTC

Note this is information I have grabbed from an external mail-list:
-----------------------------<snip>--------------------------------
I had a RHEL4 system crash a day or two ago.  First RedHat system that
I've ever seen completely hung, requiring me to hard power cycle it.
Felt like my Windows days.  But then it happened yesterday as well.
So something is wrong with this server.

It's fully patched (up2date), and is a Dell 2850.  I captured the
/var/log/messages right before it panicked and here's the logs:

kernel: Unable to handle kernel paging request at virtual address 0f3514db
kernel: printing eip:
kernel: c01705b8
kernel: *pde = 33c68001
kernel: Oops: 0000 [#1]
kernel: SMP
kernel: Modules linked in: mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler
dell_rbu autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd e1000 floppy ata_piix libata sg ext3 jbd megaraid_mbox
megaraid_mm sd_mod scsi_mod
kernel: CPU:    0
kernel: EIP:    0060:[<c01705b8>]    Not tainted VLI
kernel: EFLAGS: 00010206   (2.6.9-55.ELsmp)
kernel: EIP is at __d_lookup+0x65/0x109
kernel: eax: c2155c30   ebx: cada98f6   ecx: 00000011   edx: c212e200
kernel: esi: 0f3514db   edi: cada98f6   ebp:f43aa50c   esp: f3789e0c
kernel: ds: 007b   es: 007b   ss: 0068
kernel: Process bbtest-net (pid: 2942,threadinfo=f3789000 task=f24723b0)
kernel: Stack: 00000000 c2155c30 e1cbe00e cada98f6 0000000c f3789e80 cada98f6
00000000
kernel:        cada98f6 f3789f50 c0166ba3 f7f1be00 f3789e78 f3789e80 cada98f6
f543b548
kernel:        cada98f6 f3789f50 c0167475 00000000 00000000 00000000 fffcf000
c1c18aa0
kernel: Call Trace:
kernel:  [<c0166ba3>] do_lookup+0x23/0xb1
kernel:  [<c0167475>] __link_path_walk+0x844/0xc25
kernel:  [<c0167899>] link_path_walk+0x43/0xbe
kernel:  [<c02d443f>] __cond_resched+0x14/0x39
kernel:  [<c01c3e8a>] direct_strncpy_from_user+0x3e/0x5d
kernel:  [<c011b01b>] do_page_fault+0x1ae/0x5c6
kernel:  [<c0167c2e>] path_lookup+0x14b/0x17f
kernel:  [<c0168309>] open_namei+0x99/0x579
kernel:  [<c015a599>] filp_open+0x45/0x70
kernel:  [<c02d443f>] __cond_resched+0x14/0x39
kernel:  [<c01c3e8a>] direct_strncpy_from_user+0x3e/0x5d
kernel:  [<c015a8f5>] sys_open+0x31/0x7d
kernel:  [<c02d5ee3>] syscall_call+0x7/0xb
kernel: Code: 24 0c 89 c2 81 f2 01 00 37 9e d3 ea 31 d0 8b 15 e8 a0 44 c0 23 05
e0 a0 44 c0 8d 04 82 89 44 24 04 8b 30 85 f6 0f 84 99 00 00 00 <8b> 06 0f 18 00
90 8d 5e 98 0f ae e8 8d 76 00 8b 44 24 0c 39 43
kernel:  <0>Fatal exception: panic in 5 seconds

So I'm not sure what to make of that.  I noticed one process name in
there (bbtest-net) which is part of my BigBrother monitoring system.
But that's been running OK for years, and hasn't been changed
recently.  Not sure where else to look.  Could this be a hardware
(memory?) problem?

Shane
-----------------------------</snip>--------------------------------

I have replied to Shane and ask him for additional data

Comment 4 Shane Presley 2007-06-06 13:58:21 UTC

My system is the one reported above to the mailing list.  I've run Dell
diagnostic tests on disk & memory-- no errors. Here are some additional details:

Dell 2850, 2x 2.8GHZ Xeon Processors, 2GB RAM (2 x 1GB DIMMS)
PERC 4e/Si (Embedded RAID).  2x76GB SCSI drives in hardware RAID-1
Kernel: 2.6.9-42.0.10.ELsmp

Not using Logical Volumes.  Partition table:
Disk /dev/sda: 73.2 GB, 73274490880 bytes
255 heads, 63 sectors/track, 8908 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          25      200781   83  Linux
/dev/sda2              26         547     4192965   82  Linux swap
/dev/sda3             548        8908    67159732+  83  Linux

Comment 5 Hugh Merz 2007-06-11 14:23:16 UTC

We experienced a similar oops on our raid-server, although we were running a
remote rsync at the time.  Not sure if it is directly related, but the log follows:

Linux 2.6.9-55.EL #1 Fri Apr 20 16:35:59 EDT 2007 i686 i686 i386 GNU/Linux
Intel(R) Xeon(TM) CPU 2.40GHz -- 1GB RAM

kernel: Unable to handle kernel paging request at virtual address 76617332
kernel:  printing eip:
kernel: f89630cf
kernel: *pde = 00000000
kernel: Oops: 0000 [#1]
kernel: Modules linked in: nls_utf8 loop nfs nfsd exportfs lockd nfs_acl md5
ipv6 autofs4 eeprom i2c_sensor i2c_isa i2c_i801 i2c_dev i2c_core sunrpc
iptable_filter ip_tables dm_mirror dm_mod button battery ac uhci_hcd hw_random
e1000 floppy ata_piix libata ext3 jbd aic7xxx sd_mod scsi_mod
kernel: CPU:    0
kernel: EIP:    0060:[<f89630cf>]    Not tainted VLI
kernel: EFLAGS: 00010206   (2.6.9-55.EL)
kernel: EIP is at cache_clean+0x1ba/0x301 [sunrpc]
kernel: eax: 73626f6b   ebx: 7661732e   ecx: f8b0bfa0   edx: f8b0bfa0
kernel: esi: e8b6a480   edi: 00000000   ebp: c3af7000   esp: f4732d70
kernel: ds: 007b   es: 007b   ss: 0068
kernel: Process rpc.mountd (pid: 3665,threadinfo=f4732000 task=f47401b0)
kernel: Stack: 00000000 00000000 f4732ea0 f8963266 f8ae8121 f4fbd7e0 f4732da1
f4732de2
kernel:        f4732e22 f4732e62 f4732eaa f8978d76 f4730030 c017bb47 c0185aaf
f547a780
kernel:        f4732dec c017ba95 00000000 c0185aaf 00000000 f6ed8d74 f6f9ca20
00000001
kernel: Call Trace:
kernel:  [<f8963266>] cache_flush+0x1d/0x41 [sunrpc]
kernel:  [<f8ae8121>] svc_export_parse+0x2f0/0x32c [nfsd]
kernel:  [<c017bb47>] do_lookup+0x23/0xb1
kernel:  [<c0185aaf>] dput+0x33/0x423
kernel:  [<c017ba95>] follow_mount+0x4b/0x79
kernel:  [<c0185aaf>] dput+0x33/0x423
kernel:  [<c0187ee9>] __d_lookup+0x145/0x1ef
kernel:  [<c017bb47>] do_lookup+0x23/0xb1
kernel:  [<c0185aaf>] dput+0x33/0x423
kernel:  [<c017c923>] __link_path_walk+0xd4e/0xe06
kernel:  [<c0185aaf>] dput+0x33/0x423
kernel:  [<c017ca6b>] link_path_walk+0x90/0xb9
kernel:  [<f8ae7e31>] svc_export_parse+0x0/0x32c [nfsd]
kernel:  [<f8963ed3>] cache_write+0xf6/0x112 [sunrpc]
kernel:  [<c016c520>] vfs_write+0xb6/0xe2
kernel:  [<c016c5ea>] sys_write+0x3c/0x62
kernel:  [<c031aa0b>] syscall_call+0x7/0xb
kernel:  [<c031007b>] skb_to_sgvec+0x108/0x1fc
kernel: Code: c1 eb df 85 d2 0f 84 f6 00 00 00 8b
0d 04 7d 97 f8 3b 0a 0f 8d e8 00 00 00 8b 42 04 8d 34 88 8b 1e 85 db 74
68 8b 15 00 7d 97 f8 <8b> 43 04 39 42 2c 7e 04 40 89 42 2c 8b 43 04 3b
05 d0 d1 42 c0
kernel:  <0>Fatal exception: panic in 5 seconds

Comment 6 Bobby Robins 2009-02-11 18:18:58 UTC

I am seeing a similar behavior on one of my systems as well.  I am running RHEL ES 4.7 on IBM BladeCenter HS21 Blades.  We are using hardware raid to mirror both internal hdisks.  I have a server "freezing" with similar kernel output.  I have a support ticket open with RH.  They pointed me towards physical memory, but all diagnostics on hardware were successful and showed no errors.

Output below from netdump-server log in /var/crash:

Unable to handle kernel paging request at virtual address a8c283f3
 printing eip:
c012a90f
*pde = 00000000
Oops: 0002 [#1]
SMP 
Modules linked in: parport_pc lp parport netconsole netdump autofs4 i2c_dev i2c_core sunrpc cpufreq_powersave ib_srp ib_sdp ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core md5 ipv6 dm_mirror button battery ac i5000_edac edac_mc hw_random bnx2 sr_mod ext3 jbd dm_mod lpfc scsi_transport_fc ata_piix libata mptscsih mptsas mptspi mptscsi mptbase usb_storage uhci_hcd ohci_hcd ehci_hcdEIP is at run_timer_softirq+0xf9/0x145
eax: a8c283ef   ebx: c8c283ef   ecx: c283efa9   edx: c0c283ef
esi: c0c283ef   edi: c283e860   ebp: c03e5fcc   esp: c03e5fc8
ds: 007b   es: 007b   ss: 0068
 __do_softirq+0x4c/0xb1
 [<c01081a3>] do_softirq+0x4f/0x56
 =======================
 [<c01174c8>] smp_apic_timer_interrupt+0x9a/0x9c
 [<c02e1436>] apic_timer_interrupt+0x1a/0x20
 [<c01040e8>] mwait_idle+0x33/0x42
 [<c01040a0>] cpu_idle+0x26/0x3b
Code: 69 04 8b 44 24 04 89 4c 24 04 89 50 04 89 02 89 5b 04 89 5e 10 8b 4d 00 39 e9 0f 84 47 ff ff ff 8b 51 04

Comment 7 Andreas H 2009-03-27 08:41:47 UTC

We also ran into this today:
Linux hostname 2.6.9-78.0.17.ELsmp #1 SMP Thu Mar 5 04:52:17 EST 2009 i686 i686 i386 GNU/Linux, running on a Dell 1850.

Mar 27 07:35:01 hostname kernel: Unable to handle kernel paging request at virtual address 4a542000
Mar 27 07:35:01 hostname kernel:  printing eip:
Mar 27 07:35:01 hostname kernel: c02de6de
Mar 27 07:35:01 hostname kernel: *pde = 00000000
Mar 27 07:35:01 hostname kernel: Oops: 0000 [#1]
Mar 27 07:35:01 hostname kernel: SMP
Mar 27 07:35:01 hostname kernel: Modules linked in: mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler dell_rbu autofs4 i2c_dev i2c_core cpufreq_powersave dm_mirro
r dm_multipath dm_mod button battery ac uhci_hcd ehci_hcd e1000 floppy ata_piix libata sg ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod
Mar 27 07:35:01 hostname kernel: CPU:    2
Mar 27 07:35:01 hostname kernel: EIP:    0060:[<c02de6de>]    Not tainted VLI
Mar 27 07:35:01 hostname kernel: EFLAGS: 00010286   (2.6.9-78.0.17.ELsmp)
Mar 27 07:35:01 hostname kernel: EIP is at schedule+0x8ce/0x8f3
Mar 27 07:35:01 hostname kernel: eax: 00000000   ebx: 4a542000   ecx: ca552500   edx: c5469de0
Mar 27 07:35:01 hostname kernel: esi: 00000020   edi: ca5525b0   ebp: ca542fa8   esp: ca542f50
Mar 27 07:35:01 hostname kernel: ds: 007b   es: 007b   ss: 0068
Mar 27 07:35:01 hostname kernel: Process java (pid: 32551, threadinfo=ca542000 task=ca5525b0)
Mar 27 07:35:01 hostname kernel: Stack: 00000038 c044fb20 c01356c4 ca5525b0 00000019 c5471de0 ca5525b0 00000000
Mar 27 07:35:01 hostname kernel:        c546a740 c5469de0 00000002 00000000 a5df81c0 000fb915 ca5525b0 ca5525b0
Mar 27 07:35:01 hostname kernel:        ca552720 00000008 08065418 ca542000 c5469de0 c546a294 ca542fbc c011f184
Mar 27 07:35:01 hostname kernel: Call Trace:
Mar 27 07:35:01 hostname kernel:  [<c01356c4>] futex_wake+0x9f/0xc5
Mar 27 07:35:01 hostname kernel:  [<c011f184>] sys_sched_yield+0x75/0x7c
Mar 27 07:35:01 hostname kernel:  [<c02e081f>] syscall_call+0x7/0xb
Mar 27 07:35:01 hostname kernel:  [<c02e007b>] __lock_text_end+0xa30/0x1071
Mar 27 07:35:01 hostname kernel: Code: e8 c0 21 e6 ff 8b 5d b0 f0 ff 4b 08 0f 94 c0 84 c0 74 11 89 d8 e8 5f 1c e4 ff eb 08 8b 45 cc e8 cc 0e 00 00 bb 00 f0 ff ff 21
 e3 <8b> 03 83 78 14 00 78 0a b8 00 4b 3a c0 e8 3f 0c 00 00 8b 43 08 
Mar 27 07:35:01 hostname kernel:  <0>Fatal exception: panic in 5 seconds

Comment 8 Eric Sandeen 2009-03-27 13:40:26 UTC

It's interesting that all reports are on x86;  I wonder if this could be corruption due to a stack overflow on 4K stacks.

Andreas, yours was java though, not related to rsync, and looks like a very different panic from the others.

Comment 9 Jiri Pallich 2012-06-20 16:12:22 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.