Description of problem: System crashed rsyncing contents of one partition to another (both ext3). Version-Release number of selected component (if applicable): kernel-2.6.9-42.0.2.EL.i686 (on Scientific Linux 4.3, but I thought you'd also like to know about it). How reproducible: Random. Not reproducable. This is from /var/log/messages: Sep 28 04:02:08 xpc7 kernel: kjournald starting. Commit interval 5 seconds Sep 28 04:02:08 xpc7 kernel: EXT3 FS on hdc1, internal journal Sep 28 04:02:08 xpc7 kernel: EXT3-fs: mounted filesystem with ordered data mode. Sep 28 04:02:19 xpc7 kernel: Unable to handle kernel paging request at virtual address 08000000 Sep 28 04:02:19 xpc7 kernel: printing eip: Sep 28 04:02:19 xpc7 kernel: c01877a1 Sep 28 04:02:19 xpc7 kernel: *pde = 39fd7067 Sep 28 04:02:19 xpc7 kernel: Oops: 0000 [#1] Sep 28 04:02:19 xpc7 kernel: Modules linked in: joydev loop nfsd exportfs nfs lockd nfs_acl autofs4 eeprom lm85 i2c_sensor i2c_i801 i2c_dev i2c_core sunrpc dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd hw_random snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore tg3 e100 mii floppy ext3 jbd sata_sil libata sd_mod scsi_mod Sep 28 04:02:19 xpc7 kernel: CPU: 0 Sep 28 04:02:19 xpc7 kernel: EIP: 0060:[<c01877a1>] Not tainted VLI Sep 28 04:02:19 xpc7 kernel: EFLAGS: 00010206 (2.6.9-42.0.2.EL) Sep 28 04:02:19 xpc7 kernel: EIP is at __d_lookup+0x65/0x1ef Sep 28 04:02:19 xpc7 kernel: eax: c186f8fc ebx: b95798b7 ecx: 00000011 edx: c1822200 Sep 28 04:02:19 xpc7 kernel: esi: e3d43f0c edi: 08000000 ebp: c8f07780 esp: e3d43df0 Sep 28 04:02:19 xpc7 kernel: ds: 007b es: 007b ss: 0068 Sep 28 04:02:19 xpc7 kernel: Process rsync (pid: 9950, threadinfo=e3d43000 task=f642d8f0) Sep 28 04:02:19 xpc7 kernel: Stack: 00000000 c186f8fc f7f57020 b95798b7 0000000d e3d43e60 b95798b7 e3d43f0c Sep 28 04:02:19 xpc7 kernel: 00000000 e3d43e60 c017b783 c18edf80 e3d43e58 b95798b7 c826fbf4 b95798b7 Sep 28 04:02:19 xpc7 kernel: e3d43f0c c017c22c 00000000 ea42ba60 d274c7c8 d929c03d 40d1baa6 00000000 Sep 28 04:02:19 xpc7 kernel: Call Trace: Sep 28 04:02:19 xpc7 kernel: [<c017b783>] do_lookup+0x1f/0x8f Sep 28 04:02:19 xpc7 kernel: [<c017c22c>] __link_path_walk+0xa39/0xd98 Sep 28 04:02:19 xpc7 kernel: [<c017c5cc>] link_path_walk+0x41/0xb9 Sep 28 04:02:19 xpc7 kernel: [<c017c8c4>] path_lookup+0x104/0x135 Sep 28 04:02:19 xpc7 kernel: [<c017ca09>] __user_walk+0x21/0x51 Sep 28 04:02:19 xpc7 kernel: [<c0176c1b>] vfs_lstat+0x11/0x37 Sep 28 04:02:19 xpc7 kernel: [<c0177210>] sys_lstat64+0xf/0x23 Sep 28 04:02:19 xpc7 kernel: [<c0318e57>] syscall_call+0x7/0xb Sep 28 04:02:19 xpc7 kernel: [<c031007b>] build_polexpire+0x81/0xe0 Sep 28 04:02:19 xpc7 kernel: Code: 24 0c 89 c2 81 f2 01 00 37 9e d3 ea 31 d0 8b 15 68 55 43 c0 23 05 60 55 43 c0 8d 04 82 89 44 24 04 8b 38 85 ff 0f 84 7f 01 00 00 <8b> 07 0f 18 00 90 8d 5f 88 8b 44 24 0c 39 43 2c 0f 85 62 01 00 Sep 28 04:02:19 xpc7 kernel: <0>Fatal exception: panic in 5 seconds Sep 28 09:10:59 xpc7 syslogd 1.4.1: restart. CPU is Intel P4 2.8GHz on Intel D845PEBT2 motherboard.
Jeremy, are you still seeing this? Any chance you could gather a dump next time it happens if so?
I'm afraid I haven't seen this since reporting it. I'll report it if it happens again.
Note this is information I have grabbed from an external mail-list: -----------------------------<snip>-------------------------------- I had a RHEL4 system crash a day or two ago. First RedHat system that I've ever seen completely hung, requiring me to hard power cycle it. Felt like my Windows days. But then it happened yesterday as well. So something is wrong with this server. It's fully patched (up2date), and is a Dell 2850. I captured the /var/log/messages right before it panicked and here's the logs: kernel: Unable to handle kernel paging request at virtual address 0f3514db kernel: printing eip: kernel: c01705b8 kernel: *pde = 33c68001 kernel: Oops: 0000 [#1] kernel: SMP kernel: Modules linked in: mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler dell_rbu autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd e1000 floppy ata_piix libata sg ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod kernel: CPU: 0 kernel: EIP: 0060:[<c01705b8>] Not tainted VLI kernel: EFLAGS: 00010206 (2.6.9-55.ELsmp) kernel: EIP is at __d_lookup+0x65/0x109 kernel: eax: c2155c30 ebx: cada98f6 ecx: 00000011 edx: c212e200 kernel: esi: 0f3514db edi: cada98f6 ebp:f43aa50c esp: f3789e0c kernel: ds: 007b es: 007b ss: 0068 kernel: Process bbtest-net (pid: 2942,threadinfo=f3789000 task=f24723b0) kernel: Stack: 00000000 c2155c30 e1cbe00e cada98f6 0000000c f3789e80 cada98f6 00000000 kernel: cada98f6 f3789f50 c0166ba3 f7f1be00 f3789e78 f3789e80 cada98f6 f543b548 kernel: cada98f6 f3789f50 c0167475 00000000 00000000 00000000 fffcf000 c1c18aa0 kernel: Call Trace: kernel: [<c0166ba3>] do_lookup+0x23/0xb1 kernel: [<c0167475>] __link_path_walk+0x844/0xc25 kernel: [<c0167899>] link_path_walk+0x43/0xbe kernel: [<c02d443f>] __cond_resched+0x14/0x39 kernel: [<c01c3e8a>] direct_strncpy_from_user+0x3e/0x5d kernel: [<c011b01b>] do_page_fault+0x1ae/0x5c6 kernel: [<c0167c2e>] path_lookup+0x14b/0x17f kernel: [<c0168309>] open_namei+0x99/0x579 kernel: [<c015a599>] filp_open+0x45/0x70 kernel: [<c02d443f>] __cond_resched+0x14/0x39 kernel: [<c01c3e8a>] direct_strncpy_from_user+0x3e/0x5d kernel: [<c015a8f5>] sys_open+0x31/0x7d kernel: [<c02d5ee3>] syscall_call+0x7/0xb kernel: Code: 24 0c 89 c2 81 f2 01 00 37 9e d3 ea 31 d0 8b 15 e8 a0 44 c0 23 05 e0 a0 44 c0 8d 04 82 89 44 24 04 8b 30 85 f6 0f 84 99 00 00 00 <8b> 06 0f 18 00 90 8d 5e 98 0f ae e8 8d 76 00 8b 44 24 0c 39 43 kernel: <0>Fatal exception: panic in 5 seconds So I'm not sure what to make of that. I noticed one process name in there (bbtest-net) which is part of my BigBrother monitoring system. But that's been running OK for years, and hasn't been changed recently. Not sure where else to look. Could this be a hardware (memory?) problem? Shane -----------------------------</snip>-------------------------------- I have replied to Shane and ask him for additional data
My system is the one reported above to the mailing list. I've run Dell diagnostic tests on disk & memory-- no errors. Here are some additional details: Dell 2850, 2x 2.8GHZ Xeon Processors, 2GB RAM (2 x 1GB DIMMS) PERC 4e/Si (Embedded RAID). 2x76GB SCSI drives in hardware RAID-1 Kernel: 2.6.9-42.0.10.ELsmp Not using Logical Volumes. Partition table: Disk /dev/sda: 73.2 GB, 73274490880 bytes 255 heads, 63 sectors/track, 8908 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 25 200781 83 Linux /dev/sda2 26 547 4192965 82 Linux swap /dev/sda3 548 8908 67159732+ 83 Linux
We experienced a similar oops on our raid-server, although we were running a remote rsync at the time. Not sure if it is directly related, but the log follows: Linux 2.6.9-55.EL #1 Fri Apr 20 16:35:59 EDT 2007 i686 i686 i386 GNU/Linux Intel(R) Xeon(TM) CPU 2.40GHz -- 1GB RAM kernel: Unable to handle kernel paging request at virtual address 76617332 kernel: printing eip: kernel: f89630cf kernel: *pde = 00000000 kernel: Oops: 0000 [#1] kernel: Modules linked in: nls_utf8 loop nfs nfsd exportfs lockd nfs_acl md5 ipv6 autofs4 eeprom i2c_sensor i2c_isa i2c_i801 i2c_dev i2c_core sunrpc iptable_filter ip_tables dm_mirror dm_mod button battery ac uhci_hcd hw_random e1000 floppy ata_piix libata ext3 jbd aic7xxx sd_mod scsi_mod kernel: CPU: 0 kernel: EIP: 0060:[<f89630cf>] Not tainted VLI kernel: EFLAGS: 00010206 (2.6.9-55.EL) kernel: EIP is at cache_clean+0x1ba/0x301 [sunrpc] kernel: eax: 73626f6b ebx: 7661732e ecx: f8b0bfa0 edx: f8b0bfa0 kernel: esi: e8b6a480 edi: 00000000 ebp: c3af7000 esp: f4732d70 kernel: ds: 007b es: 007b ss: 0068 kernel: Process rpc.mountd (pid: 3665,threadinfo=f4732000 task=f47401b0) kernel: Stack: 00000000 00000000 f4732ea0 f8963266 f8ae8121 f4fbd7e0 f4732da1 f4732de2 kernel: f4732e22 f4732e62 f4732eaa f8978d76 f4730030 c017bb47 c0185aaf f547a780 kernel: f4732dec c017ba95 00000000 c0185aaf 00000000 f6ed8d74 f6f9ca20 00000001 kernel: Call Trace: kernel: [<f8963266>] cache_flush+0x1d/0x41 [sunrpc] kernel: [<f8ae8121>] svc_export_parse+0x2f0/0x32c [nfsd] kernel: [<c017bb47>] do_lookup+0x23/0xb1 kernel: [<c0185aaf>] dput+0x33/0x423 kernel: [<c017ba95>] follow_mount+0x4b/0x79 kernel: [<c0185aaf>] dput+0x33/0x423 kernel: [<c0187ee9>] __d_lookup+0x145/0x1ef kernel: [<c017bb47>] do_lookup+0x23/0xb1 kernel: [<c0185aaf>] dput+0x33/0x423 kernel: [<c017c923>] __link_path_walk+0xd4e/0xe06 kernel: [<c0185aaf>] dput+0x33/0x423 kernel: [<c017ca6b>] link_path_walk+0x90/0xb9 kernel: [<f8ae7e31>] svc_export_parse+0x0/0x32c [nfsd] kernel: [<f8963ed3>] cache_write+0xf6/0x112 [sunrpc] kernel: [<c016c520>] vfs_write+0xb6/0xe2 kernel: [<c016c5ea>] sys_write+0x3c/0x62 kernel: [<c031aa0b>] syscall_call+0x7/0xb kernel: [<c031007b>] skb_to_sgvec+0x108/0x1fc kernel: Code: c1 eb df 85 d2 0f 84 f6 00 00 00 8b 0d 04 7d 97 f8 3b 0a 0f 8d e8 00 00 00 8b 42 04 8d 34 88 8b 1e 85 db 74 68 8b 15 00 7d 97 f8 <8b> 43 04 39 42 2c 7e 04 40 89 42 2c 8b 43 04 3b 05 d0 d1 42 c0 kernel: <0>Fatal exception: panic in 5 seconds
I am seeing a similar behavior on one of my systems as well. I am running RHEL ES 4.7 on IBM BladeCenter HS21 Blades. We are using hardware raid to mirror both internal hdisks. I have a server "freezing" with similar kernel output. I have a support ticket open with RH. They pointed me towards physical memory, but all diagnostics on hardware were successful and showed no errors. Output below from netdump-server log in /var/crash: Unable to handle kernel paging request at virtual address a8c283f3 printing eip: c012a90f *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: parport_pc lp parport netconsole netdump autofs4 i2c_dev i2c_core sunrpc cpufreq_powersave ib_srp ib_sdp ib_ipoib rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core md5 ipv6 dm_mirror button battery ac i5000_edac edac_mc hw_random bnx2 sr_mod ext3 jbd dm_mod lpfc scsi_transport_fc ata_piix libata mptscsih mptsas mptspi mptscsi mptbase usb_storage uhci_hcd ohci_hcd ehci_hcdEIP is at run_timer_softirq+0xf9/0x145 eax: a8c283ef ebx: c8c283ef ecx: c283efa9 edx: c0c283ef esi: c0c283ef edi: c283e860 ebp: c03e5fcc esp: c03e5fc8 ds: 007b es: 007b ss: 0068 __do_softirq+0x4c/0xb1 [<c01081a3>] do_softirq+0x4f/0x56 ======================= [<c01174c8>] smp_apic_timer_interrupt+0x9a/0x9c [<c02e1436>] apic_timer_interrupt+0x1a/0x20 [<c01040e8>] mwait_idle+0x33/0x42 [<c01040a0>] cpu_idle+0x26/0x3b Code: 69 04 8b 44 24 04 89 4c 24 04 89 50 04 89 02 89 5b 04 89 5e 10 8b 4d 00 39 e9 0f 84 47 ff ff ff 8b 51 04
We also ran into this today: Linux hostname 2.6.9-78.0.17.ELsmp #1 SMP Thu Mar 5 04:52:17 EST 2009 i686 i686 i386 GNU/Linux, running on a Dell 1850. Mar 27 07:35:01 hostname kernel: Unable to handle kernel paging request at virtual address 4a542000 Mar 27 07:35:01 hostname kernel: printing eip: Mar 27 07:35:01 hostname kernel: c02de6de Mar 27 07:35:01 hostname kernel: *pde = 00000000 Mar 27 07:35:01 hostname kernel: Oops: 0000 [#1] Mar 27 07:35:01 hostname kernel: SMP Mar 27 07:35:01 hostname kernel: Modules linked in: mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler dell_rbu autofs4 i2c_dev i2c_core cpufreq_powersave dm_mirro r dm_multipath dm_mod button battery ac uhci_hcd ehci_hcd e1000 floppy ata_piix libata sg ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod Mar 27 07:35:01 hostname kernel: CPU: 2 Mar 27 07:35:01 hostname kernel: EIP: 0060:[<c02de6de>] Not tainted VLI Mar 27 07:35:01 hostname kernel: EFLAGS: 00010286 (2.6.9-78.0.17.ELsmp) Mar 27 07:35:01 hostname kernel: EIP is at schedule+0x8ce/0x8f3 Mar 27 07:35:01 hostname kernel: eax: 00000000 ebx: 4a542000 ecx: ca552500 edx: c5469de0 Mar 27 07:35:01 hostname kernel: esi: 00000020 edi: ca5525b0 ebp: ca542fa8 esp: ca542f50 Mar 27 07:35:01 hostname kernel: ds: 007b es: 007b ss: 0068 Mar 27 07:35:01 hostname kernel: Process java (pid: 32551, threadinfo=ca542000 task=ca5525b0) Mar 27 07:35:01 hostname kernel: Stack: 00000038 c044fb20 c01356c4 ca5525b0 00000019 c5471de0 ca5525b0 00000000 Mar 27 07:35:01 hostname kernel: c546a740 c5469de0 00000002 00000000 a5df81c0 000fb915 ca5525b0 ca5525b0 Mar 27 07:35:01 hostname kernel: ca552720 00000008 08065418 ca542000 c5469de0 c546a294 ca542fbc c011f184 Mar 27 07:35:01 hostname kernel: Call Trace: Mar 27 07:35:01 hostname kernel: [<c01356c4>] futex_wake+0x9f/0xc5 Mar 27 07:35:01 hostname kernel: [<c011f184>] sys_sched_yield+0x75/0x7c Mar 27 07:35:01 hostname kernel: [<c02e081f>] syscall_call+0x7/0xb Mar 27 07:35:01 hostname kernel: [<c02e007b>] __lock_text_end+0xa30/0x1071 Mar 27 07:35:01 hostname kernel: Code: e8 c0 21 e6 ff 8b 5d b0 f0 ff 4b 08 0f 94 c0 84 c0 74 11 89 d8 e8 5f 1c e4 ff eb 08 8b 45 cc e8 cc 0e 00 00 bb 00 f0 ff ff 21 e3 <8b> 03 83 78 14 00 78 0a b8 00 4b 3a c0 e8 3f 0c 00 00 8b 43 08 Mar 27 07:35:01 hostname kernel: <0>Fatal exception: panic in 5 seconds
It's interesting that all reports are on x86; I wonder if this could be corruption due to a stack overflow on 4K stacks. Andreas, yours was java though, not related to rsync, and looks like a very different panic from the others.
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.