Bug 443621
| Summary: | kernel panic xen cluster. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | makoto nohara <nohara.makoto> |
| Component: | kernel-xen | Assignee: | Xen Maintainance List <xen-maint> |
| Status: | CLOSED NOTABUG | QA Contact: | Martin Jenner <mjenner> |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 5.0 | CC: | clalance, matt.baker, prickett233, tao, xen-maint |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2009-05-04 15:46:58 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 492568 | ||
H/W The environment. ------------------ PowerEdge 860 (cpu intel 3040) memory 4G byte 80G byte HDDx2 (Software RAID1) ----------------- The failover cluster is composed by using machine parts similar to the above- mentioned. RHES4.5 is running on Xen of RHEL5.0. ,in a word Dom0 = RHEL5.0. Domu = RHES4.5. When trouble occurs, VM(DomU) starts with another machine . BIOS of the version that corrects the bug is used though Xeon3040 installed in DELL860 has the microcode bug. I'm sorry that necessary information is added only by the additional information. After it writes, I think, "Is it this information and a necessity?"..... The kernel/hypervisors versions you are using here: Xen hypervisor 3.0.3-25.el5(include RHEL5.0) Linux Kernel 2.6.18-8.el5xen(include RHEL5.0) are still the RHEL-5.0 GA releases. Could you re-test with the RHEL-5.2 erratas applied to the system RHEL-5.2 system now testing . However, the continuous running time is not enough. Time is necessary a little more for the result's turning out. *** Bug 479756 has been marked as a duplicate of this bug. *** RHEL-5.2 system testing ....
However, the panic has occurred.
This panic might be a cause different from the panic that occurred before.
The environment
LinuxKernel 2.6.18-92.1.1.el5xen(download from redhat-network)
drbd-8.2.6-3(compiled from source)
heartbeat-2.1.3-1(compiled from source)
xen-3.0.3-64.el5(include RHEL5.2)
------------
KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.1.1.el5xen/vmlinux
DUMPFILE: /mnt/127.0.0.1-2009-01-08-12:33:06/vmcore
CPUS: 2
DATE: Thu Jan 8 12:32:46 2009
UPTIME: 22 days, 20:09:27
LOAD AVERAGE: 1.73, 1.23, 1.17
TASKS: 226
NODENAME: XXXXX1
RELEASE: 2.6.18-92.1.1.el5xen
VERSION: #1 SMP Thu May 22 09:31:19 EDT 2008
MACHINE: i686 (1866 Mhz)
MEMORY: 520 MB
PANIC: "Oops: 0000 [#1]" (check log for details)
PID: 979
COMMAND: "nautilus"
TASK: de5d5aa0 [THREAD_INFO: cfefc000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
------------
crash log
-------------
BUG: unable to handle kernel paging request at virtual address e071d668
printing eip:
c04e325a
00ecb000 -> *pde = 00000000:c6a49001
197c7000 -> *pme = 00000000:3e0fc067
000fc000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: xt_physdev ip_conntrack_ftp netloop netbk blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_R
EJECT xt_tcpudp iptable_filter ip_tables x_tables drbd(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc bridge dummy 8021q dm_mirror dm_multipath dm
_mod video sbs backlight i2c_ec button battery asus_acpi ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport sg i3000_edac edac_mc ide_cd r8169 i
2c_i801 i2c_core pcspkr serial_core cdrom tg3 serio_raw ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU: 0
EIP: 0061:[<c04e325a>] Tainted: G VLI
EFLAGS: 00010296 (2.6.18-92.1.1.el5xen #1)
EIP is at csum_partial+0xca/0x120
eax: 00000000 ebx: c04e325a ecx: 0000000b edx: 000005a8
esi: e071d690 edi: 000005a8 ebp: 00000034 esp: c071cde8
ds: 007b es: 007b ss: 0069
Process nautilus (pid: 979, ti=c071c000 task=de5d5aa0 task.ti=cfefc000)
Stack: e071d000 00000034 c05a9f12 e071d668 000005a8 00000000 00000010 db73112c
00000000 00000020 000005dc cbea7f14 c05aae25 cbea7e00 000005a8 db73112c
c98aa2cc c98aa2e0 c071cef8 c05ae744 c4d5de4c 00000000 00000003 c071cef8
Call Trace:
[<c05a9f12>] skb_checksum+0x111/0x27b
[<c05aae25>] pskb_expand_head+0xd6/0x11a
[<c05ae744>] skb_checksum_help+0x64/0xb3
[<e14bd2ee>] ip_nat_fn+0x42/0x185 [iptable_nat]
[<c0469c92>] kmem_cache_alloc+0x54/0x5e
[<e14bd628>] ip_nat_local_fn+0x34/0xa4 [iptable_nat]
[<c05cb3c8>] dst_output+0x0/0x7
[<c05c3e3c>] nf_iterate+0x30/0x61
[<c05cb3c8>] dst_output+0x0/0x7
[<c05c3f62>] nf_hook_slow+0x3a/0x90
[<c05cb3c8>] dst_output+0x0/0x7
[<c05cd711>] ip_queue_xmit+0x3cd/0x41e
[<c05cb3c8>] dst_output+0x0/0x7
[<c041b708>] __activate_task+0x1c/0x29
[<c041bfd4>] try_to_wake_up+0x309/0x313
[<e14f3dff>] net_rx_action+0x771/0x7de [netbk]
[<c05db34d>] tcp_transmit_skb+0x5e4/0x612
[<c043190c>] autoremove_wake_function+0xd/0x2d
[<c041a99f>] __wake_up_common+0x2f/0x53
[<c05dc0a6>] tcp_retransmit_skb+0x4c0/0x59e
[<c041b567>] __wake_up+0x2a/0x3d
[<c05d5223>] tcp_enter_loss+0x1a2/0x1ff
[<c05de19c>] tcp_write_timer+0x0/0x5e4
[<c05de5a1>] tcp_write_timer+0x405/0x5e4
[<c0429600>] run_timer_softirq+0x101/0x15c
[<c042613e>] __do_softirq+0x5e/0xc3
[<c0406edf>] do_softirq+0x56/0xaf
[<c0406e80>] do_IRQ+0xa5/0xae
[<c0549b63>] evtchn_do_upcall+0x64/0x9b
[<c04055d9>] hypervisor_callback+0x3d/0x48
=======================
Code: 9c 13 46 a0 13 46 a4 13 46 a8 13 46 ac 13 46 b0 13 46 b4 13 46 b8 13 46 bc 13 46 c0 13 46 c4 13 46 c8 13 46 cc 13 46 d0 13 46 d4 <13> 46 d8
13 46 dc 13 46 e0 13 46 e4 13 46 e8 13 46 ec 13 46 f0
EIP: [<c04e325a>] csum_partial+0xca/0x120 SS:ESP 0069:c071cde8
-------------
crash> bt
PID: 979 TASK: de5d5aa0 CPU: 0 COMMAND: "nautilus"
#0 [c071cd0c] die at c040606e
#1 [c071cd38] do_page_fault at c060abfc
#2 [c071cdb0] error_code (via page_fault) at c0405595
EAX: 00000000 EBX: c04e325a ECX: 0000000b EDX: 000005a8 EBP: 00000034
DS: 007b ESI: e071d690 ES: 007b EDI: 000005a8
CS: 0061 EIP: c04e325a ERR: ffffffff EFLAGS: 00010296
#3 [c071cde4] csum_partial at c04e325a
#4 [c071cdf0] skb_checksum at c05a9f0d
#5 [c071ce34] skb_checksum_help at c05ae73f
#6 [c071ce48] ip_nat_fn at e14bd2e9
#7 [c071ce6c] ip_nat_local_fn at e14bd623
#8 [c071ce80] nf_iterate at c05c3e39
#9 [c071cea0] nf_hook_slow at c05c3f5d
#10 [c071cecc] ip_queue_xmit at c05cd70c
#11 [c071cf60] tcp_transmit_skb at c05db34b
#12 [c071cf94] tcp_retransmit_skb at c05dc0a1
#13 [c071cfbc] tcp_write_timer at c05de59c
#14 [c071cfcc] run_timer_softirq at c04295fe
#15 [c071cfe8] __do_softirq at c042613c
--- <soft IRQ> ---
------------
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 c066f2c0 RU 0.0 0 0 [swapper]
0 1 1 c0d60550 RU 0.0 0 0 [swapper]
1 0 0 c0d60aa0 IN 0.1 2076 680 init
2 1 0 c0d60000 IN 0.0 0 0 [migration/0]
3 1 0 c0198aa0 IN 0.0 0 0 [ksoftirqd/0]
4 1 0 c0198550 IN 0.0 0 0 [watchdog/0]
5 1 1 c0198000 IN 0.0 0 0 [migration/1]
6 1 1 c5568aa0 IN 0.0 0 0 [ksoftirqd/1]
7 1 1 c5568550 IN 0.0 0 0 [watchdog/1]
8 1 0 c5568000 IN 0.0 0 0 [events/0]
9 1 1 c0cfeaa0 IN 0.0 0 0 [events/1]
10 1 1 c0cfe550 IN 0.0 0 0 [khelper]
11 1 0 c0cfe000 IN 0.0 0 0 [kthread]
13 11 0 c0cc2550 IN 0.0 0 0 [xenwatch]
14 11 0 c0cc2000 IN 0.0 0 0 [xenbus]
17 11 0 c0cb8000 IN 0.0 0 0 [kblockd/0]
18 11 1 c0c9eaa0 IN 0.0 0 0 [kblockd/1]
19 11 0 c0c9e550 IN 0.0 0 0 [kacpid]
98 11 0 c0c0b000 IN 0.0 0 0 [cqueue/0]
99 11 1 c0c11aa0 IN 0.0 0 0 [cqueue/1]
103 11 0 c0c1b550 IN 0.0 0 0 [khubd]
105 11 0 c0c20aa0 IN 0.0 0 0 [kseriod]
170 11 0 c07d7aa0 IN 0.0 0 0 [pdflush]
171 11 1 c07d1000 IN 0.0 0 0 [pdflush]
172 11 0 c07d1550 IN 0.0 0 0 [kswapd0]
173 11 0 c07d1aa0 IN 0.0 0 0 [aio/0]
174 11 1 c0c4a000 IN 0.0 0 0 [aio/1]
320 11 1 c0c6aaa0 IN 0.0 0 0 [kpsmoused]
354 11 0 c0c58aa0 IN 0.0 0 0 [ata/0]
355 11 1 c0c58550 IN 0.0 0 0 [ata/1]
356 11 0 c0c58000 IN 0.0 0 0 [ata_aux]
360 11 0 c0c79550 IN 0.0 0 0 [scsi_eh_0]
361 11 0 c0c89550 IN 0.0 0 0 [scsi_eh_1]
364 11 0 c0c2eaa0 RU 0.0 0 0 [md1_raid1]
367 11 0 c0c28aa0 IN 0.0 0 0 [md0_raid1]
368 11 0 c0c28000 IN 0.0 0 0 [kjournald]
396 11 1 c0c28550 IN 0.0 0 0 [kauditd]
430 1 1 c0c1b000 IN 0.2 2440 888 udevd
675 31848 1 de1bd550 IN 1.3 24024 7176 gnome-session
823 675 0 cb8bc550 DE 0.0 0 0 Xsession
826 675 1 d0120550 IN 0.1 6492 604 ssh-agent
855 1 0 d1536aa0 IN 0.1 2824 788 dbus-launch
857 1 0 d0705aa0 IN 0.2 2756 956 dbus-daemon
875 1 1 d1ea1aa0 IN 0.7 8224 3632 gconfd-2
876 1 1 c8f59000 IN 1.5 38228 7844 scim-panel-gtk
877 1 1 d1536000 IN 1.5 38228 7844 scim-panel-gtk
878 1 1 cb0f5550 IN 0.2 9224 808 scim-launcher
903 11 1 df218000 IN 0.0 0 0 [kedac]
905 1 0 c6fa3aa0 IN 0.1 2576 764 gnome-keyring-d
907 1 1 d64b8aa0 IN 1.5 34872 8088 gnome-settings-
946 1 1 de979aa0 ?? 1.5 34872 8088 gnome-settings-
972 1 1 e0754aa0 IN 2.3 28320 12200 metacity
977 1 1 d1c3baa0 IN 3.0 58048 15960 gnome-panel
> 979 1 0 de5d5aa0 RU 11.6 137020 61616 nautilus
983 1 1 de5d5000 IN 0.6 40764 2996 bonobo-activati
984 1 1 c62b4000 IN 0.6 40764 2996 bonobo-activati
985 1 1 d257eaa0 IN 0.9 23556 4880 gnome-volume-ma
987 1 0 deefc000 IN 1.6 45820 8428 eggcups
989 1 0 c62b4550 IN 0.7 12392 3660 gnome-vfs-daemo
1008 1 0 dfd19aa0 IN 1.0 15412 5068 bt-applet
1016 1 1 d480caa0 IN 5.2 119356 27476 xulrunner-bin
1021 1 1 d64b8550 IN 1.9 46208 9992 nm-applet
1023 1 1 de77e550 IN 0.9 16216 4696 pam-panel-icon
1024 1 0 d257e550 RU 1.2 46028 6276 gnome-power-man
1025 1023 1 de92c000 IN 0.1 1856 620 pam_timestamp_c
1060 1 0 d7f31aa0 IN 2.7 57148 14348 wnck-applet
1062 1 0 dc7c3aa0 IN 1.7 76700 8832 trashapplet
1125 1 1 dc7c3000 IN 5.2 119356 27476 xulrunner-bin
1155 1 1 c6fa3000 IN 0.2 8104 1228 scim-bridge
1197 1 1 c6fa3550 IN 1.5 24076 7876 notification-ar
1199 1 1 ccd97aa0 IN 2.6 39724 13968 clock-applet
1201 1 1 d1536550 IN 2.6 56968 13824 mixer_applet2
1203 1 1 d56f4aa0 IN 5.2 119356 27476 xulrunner-bin
1302 1 1 c8949aa0 IN 0.3 43468 1428 pcscd
1303 1 1 d7f95000 IN 5.2 119356 27476 xulrunner-bin
1487 11 0 df157aa0 IN 0.0 0 0 [kmpathd/0]
1488 11 1 dea3a550 IN 0.0 0 0 [kmpathd/1]
1515 11 0 df9eaaa0 IN 0.0 0 0 [kjournald]
1771 1 1 cd969550 IN 0.9 17996 4676 gnome-screensav
2569 1 1 c07f3aa0 IN 0.2 13188 812 auditd
2570 1 1 c0c4a550 IN 0.2 13188 812 auditd
2571 2569 1 dec66aa0 IN 0.2 14112 980 audispd
2572 2569 1 c07f9550 IN 0.2 14112 980 audispd
2594 1 0 c0c11000 IN 0.1 1732 620 syslogd
2597 1 0 deefc550 RU 0.1 1684 408 klogd
2609 1 1 de2e4000 IN 0.1 2444 368 irqbalance
2630 1 0 c0c33aa0 IN 0.1 1820 548 portmap
2659 1 0 c0c2e000 IN 0.1 1832 740 rpc.statd
2699 1 0 c0c20550 IN 0.1 1848 396 mdadm
2729 1 1 de979550 IN 0.1 5452 572 rpc.idmapd
2789 1 1 df157000 IN 0.2 2888 1104 dbus-daemon
2800 1 0 c0cb8aa0 IN 0.1 2160 780 hcid
2806 1 0 c0c89000 IN 0.1 1752 520 sdpd
2829 1 0 c026caa0 IN 0.0 0 0 [krfcommd]
2870 1 0 c07f9aa0 IN 0.3 43468 1428 pcscd
2885 1 0 c0c4aaa0 IN 0.3 43468 1428 pcscd
2891 1 0 de8a6550 IN 0.1 1924 464 hidd
2907 1 0 c0c6a000 IN 0.2 10852 1320 automount
2908 1 1 de5cdaa0 IN 0.2 10852 1320 automount
2909 1 1 c0c0baa0 IN 0.2 10852 1320 automount
2912 1 1 c0c64aa0 IN 0.2 10852 1320 automount
2915 1 0 df97daa0 IN 0.2 10852 1320 automount
2926 1 0 c0cc2aa0 IN 0.1 1684 544 acpid
2937 1 0 c0c9e000 IN 0.1 5084 764 hpiod
2942 1 1 de1bd000 IN 0.9 14568 4788 python
2957 1 0 df9ea550 IN 0.2 7000 1056 sshd
2968 1 1 de2e4550 IN 0.5 10936 2416 cupsd
2980 1 0 df218aa0 IN 0.2 2736 904 xinetd
2995 1 0 df9ea000 RU 0.0 0 0 [drbd0_worker]
3005 1 0 c0c0b550 RU 0.0 0 0 [drbd0_receiver]
3013 1 0 c07f3550 IN 0.0 0 0 [drbd0_asender]
3030 1 0 de5cd000 IN 0.2 4412 1096 ha_logd
3039 3030 0 de2e4aa0 RU 0.1 4412 796 ha_logd
3079 1 1 de5d5550 IN 2.3 12108 12108 heartbeat
3090 1 1 c0c50aa0 IN 0.1 1908 488 gpm
3101 1 0 c07f9000 IN 0.2 6220 1120 crond
3120 3079 1 c0c2e550 ?? 1.0 5512 5512 heartbeat
3121 3079 1 dea3a000 IN 1.0 5508 5508 heartbeat
3122 3079 1 c07d7000 IN 1.0 5508 5508 heartbeat
3141 1 1 dea3aaa0 IN 0.4 4320 2132 xfs
3162 1 0 c0c50550 IN 0.1 2256 440 atd
3185 1 0 c026c000 IN 0.3 5016 1664 libvirtd
3212 1 0 deefcaa0 IN 0.1 4644 412 rhnsd
3236 1 0 dec66550 IN 0.7 5844 3924 hald
3247 3236 0 df218550 IN 0.2 3148 1084 hald-runner
3337 3185 0 c026c550 IN 0.1 1828 748 dnsmasq
3358 3247 0 d1ea1550 IN 0.2 2008 808 hald-addon-keyb
3360 3247 0 df157550 IN 0.2 2012 812 hald-addon-acpi
3365 3247 0 d232a550 IN 0.1 1968 660 hald-addon-stor
3618 1 1 d074c000 IN 0.2 2252 1112 xenstored
3623 1 1 d1c3b550 IN 0.8 13032 4004 python
3624 3623 0 d066b550 IN 1.0 96184 5552 python
3626 1 0 c0c89aa0 IN 0.1 12212 608 xenconsoled
3627 1 1 e0754550 IN 0.1 13544 792 blktapctrl
3628 1 0 d0120aa0 IN 0.1 12212 608 xenconsoled
3629 1 0 c0c6a550 IN 0.1 13544 792 blktapctrl
3630 3623 0 df97d000 IN 1.0 96184 5552 python
3633 3623 0 d06cd550 IN 1.0 96184 5552 python
3634 3623 1 c0c79000 IN 1.0 96184 5552 python
3900 3623 0 e0754000 IN 1.0 96184 5552 python
3901 3623 1 d06e0550 IN 1.0 96184 5552 python
3902 1 0 d06e0aa0 IN 1.9 25632 10208 yum-updatesd
3907 1 0 d0705000 IN 0.2 2696 1224 gam_server
3974 1 1 d0705550 ?? 0.2 5472 1220 livxen.sh
3975 1 1 d071d550 IN 0.2 5472 1224 xentop-logger.s
3976 1 1 d0720000 IN 0.2 5472 1208 swaps-logger.sh
4211 1 0 d074c550 IN 0.1 1996 520 smartd
4223 1 1 cb792000 IN 0.1 1668 456 mingetty
4224 1 0 cbb9b000 IN 0.1 1668 456 mingetty
4227 1 0 dec66000 IN 0.1 1668 452 mingetty
4236 1 0 cb792550 IN 0.1 1668 456 mingetty
4238 1 1 c07f3000 IN 0.1 1664 452 mingetty
4244 1 1 cb8bcaa0 IN 0.1 1668 456 mingetty
4246 1 0 d0720aa0 IN 0.6 16652 3028 gdm-binary
4471 4246 1 d232aaa0 IN 0.5 17256 2764 gdm-binary
4473 1 0 c8110550 IN 0.8 28332 4200 gdm-rh-security
4482 4471 1 cbb9b550 IN 2.9 38816 15204 Xorg
4578 1 0 d073e000 RU 0.3 43468 1428 pcscd
4584 1 1 cb6da550 IN 0.8 28332 4200 gdm-rh-security
4622 11 1 d071daa0 IN 0.0 0 0 [kjournald]
4745 1 1 df97d550 IN 0.1 15012 724 tapdisk
4746 1 1 de8a6aa0 IN 0.1 15012 724 tapdisk
4751 1 1 c872d000 IN 0.1 15012 720 tapdisk
4753 1 1 d06cdaa0 IN 0.1 15012 720 tapdisk
4786 1 1 c0c33000 IN 0.9 32720 4732 qemu-dm
5014 1 0 c6c8aaa0 IN 0.9 32720 4732 qemu-dm
5033 1 0 cb0f5aa0 IN 0.9 32720 4732 qemu-dm
5083 11 1 c0c79aa0 IN 0.0 0 0 [xvd 1]
5084 11 0 c6aacaa0 IN 0.0 0 0 [xvd 1]
5838 3101 0 d232a000 IN 0.3 6796 1504 crond
5839 5838 0 c72c9000 DE 0.0 0 0 python
5840 5838 0 d64b8000 IN 0.4 7916 2268 sendmail
7168 3976 1 d480c550 IN 0.1 4648 488 sleep
7169 1 0 c92fc000 IN 11.6 137020 61616 nautilus
7178 3974 1 d06cd000 RU 0.1 5472 488 livxen.sh
> 7179 7178 1 c07d7550 RU 1.0 12144 5356 python
7180 7178 1 ca73e550 ?? 0.1 4988 768 grep
7181 7178 1 cfb9caa0 RU 0.0 5472 168 livxen.sh
14882 4471 0 d074caa0 IN 1.4 24180 7296 gnome-session
14923 14882 0 d1ea1000 DE 0.0 0 0 Xsession
14926 14882 1 c0835aa0 IN 0.1 6492 640 ssh-agent
14955 1 0 c62b4aa0 IN 0.1 2784 624 dbus-launch
14956 1 1 c0c11550 IN 0.2 2760 980 dbus-daemon
14962 1 0 c0c20000 IN 0.7 8212 3580 gconfd-2
14975 1 1 c872daa0 IN 0.4 27744 1972 scim-launcher
14978 1 0 d06e0000 IN 0.1 2580 764 gnome-keyring-d
14980 1 0 cb0f5000 IN 1.5 34928 8216 gnome-settings-
14982 1 1 dfd19550 ?? 1.5 34928 8216 gnome-settings-
14997 1 1 de77e000 IN 2.5 28884 13152 metacity
15000 1 0 c8110aa0 IN 0.1 7112 768 scim-helper-man
15001 1 0 d1c3b000 IN 1.5 38420 7816 scim-panel-gtk
15002 1 0 c8110000 IN 1.5 38420 7816 scim-panel-gtk
15003 1 1 c6aac550 IN 0.2 9220 812 scim-launcher
15012 1 1 d066baa0 IN 3.1 58344 16488 gnome-panel
15017 1 1 c0c33550 IN 3.9 96664 20820 nautilus
15021 1 1 cbb9baa0 IN 0.6 39736 3020 bonobo-activati
15023 1 1 c872d550 IN 1.6 45816 8428 eggcups
15025 1 1 c6aac000 IN 0.7 12384 3692 gnome-vfs-daemo
15028 1 1 c0cb8550 IN 0.6 39736 3020 bonobo-activati
15029 1 1 d0120000 IN 0.9 23556 4948 gnome-volume-ma
15039 1 0 de8a6000 IN 0.9 15400 5044 bt-applet
15046 1 0 c6c8a550 IN 4.0 39544 21388 puplet
15048 1 0 de77eaa0 IN 1.9 46208 9976 nm-applet
15060 1 1 cb8bc000 IN 0.9 16216 4692 pam-panel-icon
15062 15060 1 d073e550 IN 0.1 1856 620 pam_timestamp_c
15064 1 0 c6645aa0 IN 0.5 18240 2636 escd
15065 1 1 de979000 IN 0.3 43468 1428 pcscd
15067 1 1 d0720550 IN 1.2 46032 6360 gnome-power-man
15068 1 1 dfd19000 IN 0.5 18240 2636 escd
15111 1 1 c6645550 IN 2.8 57336 14672 wnck-applet
15113 1 1 c0835550 IN 2.6 87652 13928 trashapplet
15141 1 1 de1bdaa0 IN 0.2 2480 880 mapping-daemon
15267 1 1 db67faa0 IN 1.5 24076 7836 notification-ar
15270 1 1 c0c64550 IN 2.6 39740 13976 clock-applet
15272 1 1 d066b000 IN 3.0 59112 15884 mixer_applet2
15414 1 0 dac5baa0 IN 3.8 79896 20144 gnome-terminal
15417 1 1 db070aa0 IN 0.2 8104 1284 scim-bridge
15418 15414 1 c0c1baa0 IN 0.1 2484 712 gnome-pty-helpe
15424 15414 1 c6c8a000 IN 0.3 5476 1516 bash
15425 1 0 db67f000 ?? 3.8 79896 20144 gnome-terminal
15907 1 1 dac5b000 IN 0.9 18012 4944 gnome-screensav
15963 15424 1 cfef5550 IN 1.0 7840 5276 vncviewer
25896 1771 0 d7f31000 RU 2.5 28268 13456 floaters
26928 3975 1 c0c50000 IN 0.1 4652 488 sleep
31836 2980 1 c6645000 IN 2.7 18844 14416 Xvnc
31848 4246 1 c0c64000 IN 0.5 17248 2684 gdm-binary
-------------
Does this trouble relate to the trouble that occurs by RHEL5.0? Or, is it a new trouble?
The stack trace looks to be about the same to me, so it looks like the same crash. Chris Lalancette It is an additional information.
> #0 [c071cd0c] die at c040606e
> #1 [c071cd38] do_page_fault at c060abfc
> #2 [c071cdb0] error_code (via page_fault) at c0405595
c040606e ?
It is made to display.
----------
crash> kmem c040606e
c040606e (T) die+568 ../debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/kernel/traps-xen.c: 469
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c10080c0 406000 0 0 1 400
crash>
----------
Traps-xen.c was examined.
traps-xen.c: 469
----------
if (in_interrupt())
panic("Fatal exception in interrupt"); # <- 469
----------
The following things are being written in the 382 .
----------
/* This is gone through when something in the kernel
* has done something bad and is about to be terminated.
*/
----------
something bad?
How can I examine "something bad"?
That's not the interesting part of the trace; that's just showing you that something happened that we didn't like, and now we are going to panic. Where you want to start looking is right before the error_code, namely at stack point #3. The instruction there is the one that caused the crash; you have to look at it and figure out what was going on at that point to cause it. Chris Lalancette
I understand neither C language nor the assembler.
Therefore, I begin to pick it up expecting to think that there is
relations.
It is a limit of my ability.
The ability to understand my English is a limit.
I'm sorry in strange sentences.
-----------------
crash> kmem c04e325a
c04e325a (T) csum_partial+202 include/asm/atomic.h: 165
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c1009c60 4e3000 0 0 1 400
crash>
/**
* atomic_add_negative - add and test if negative
* @v: pointer of type atomic_t
* @i: integer value to add
*
* Atomically adds @i to @v and returns true
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
static __inline__ int atomic_add_negative(int i, atomic_t *v)
{
unsigned char c;
__asm__ __volatile__( #<-165
LOCK_PREFIX "addl %2,%0; sets %1"
:"+m" (v->counter), "=qm" (c)
:"ir" (i) : "memory");
return c;
}
-----------------
-----------------
crash> kmem c060abfc
c060abfc (T) do_page_fault+2688 ../debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/mm/fault-xen.c: 698
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c100c140 60a000 0 0 1 400
/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/
bust_spinlocks(1);
if (oops_may_print()) {
#ifdef CONFIG_X86_PAE
if (error_code & 16) {
pte_t *pte = lookup_address(address);
if (pte && pte_present(*pte) && !pte_exec_kernel(*pte))
printk(KERN_CRIT "kernel tried to execute "
"NX-protected page - exploit attempt? "
"(uid: %d)\n", current->uid);
}
#endif
if (address < PAGE_SIZE)
printk(KERN_ALERT "BUG: unable to handle kernel NULL "
"pointer dereference");
else
printk(KERN_ALERT "BUG: unable to handle kernel paging"
" request");
printk(" at virtual address %08lx\n",address);
printk(KERN_ALERT " printing eip:\n");
printk("%08lx\n", regs->eip);
dump_fault_path(address);
}
tsk->thread.cr2 = address;
tsk->thread.trap_no = 14;
tsk->thread.error_code = error_code;
die("Oops", regs, error_code); #<-698
bust_spinlocks(0);
do_exit(SIGKILL);
-----------------
-----------------
crash> kmem c0405595
c0405595 (t) error_code+41 ../debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/kernel/entry.S
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c10080a0 405000 0 0 1 400
-----------------
/usr/src/debug/debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/kernel/entry.S
The file was not found.
However, there was a file as follows.
/usr/src/debug/kernel-2.6.18/xen/arch/x86/x86_32/entry.S
----------
.Lft16: movl %eax,%gs:8(%esi)
test $TBF_EXCEPTION_ERRCODE,%cl
jz 1f
subl $4,%esi # push error_code onto guest frame
movl TRAPBOUNCE_error_code(%edx),%eax
----------
.Lfx1: sti
SAVE_ALL_GPRS
mov UREGS_error_code(%esp),%esi
pushfl # EFLAGS
movl $__HYPERVISOR_CS,%eax
pushl %eax # CS
movl $.Ldf1,%eax
pushl %eax # EIP
pushl %esi # error_code/entry_vector
jmp handle_exception
----------
exception_with_ints_disabled:
movl UREGS_eflags(%esp),%eax
movb UREGS_cs(%esp),%al
testl $(3|X86_EFLAGS_VM),%eax # interrupts disabled outside Xen?
jnz FATAL_exception_with_ints_disabled
pushl %esp
call search_pre_exception_table
addl $4,%esp
testl %eax,%eax # no fixup code for faulting EIP?
jz 1b
movl %eax,UREGS_eip(%esp)
movl %esp,%esi
subl $4,%esp
movl %esp,%edi
movl $UREGS_kernel_sizeof/4,%ecx
rep; movsl # make room for error_code/entry_vector
movl UREGS_error_code(%esp),%eax # error_code/entry_vector
movl %eax,UREGS_kernel_sizeof(%esp)
jmp restore_all_xen # return to fixup code
----------
Does necessary information suffice in the above?
It seems to use "atomic_add_negative" that is a part of function of "Csum_partial
function".
Is "atomic_add_negative" used to switch the processing of host OS and guest OS?
It is imagined that PANIC was generated because the specified execution address reached a value not correct after the switch of the processing of host OS and guest OS.
When an illegal address is generated by the processing of guest OS, does the value of EIP of the PANIC function become "csum_partial"?
So the problem is that someone has unmapped the memory behind the packet that is still being retransmitted. Could you try to determine the socket of the packet (skb->sk) and its IP/port numbers? That should help you find the application (which is probably not the process in which it crashed since it's in softirq context) that owns the socket and perhaps we can have a chance in reproducing it then. Thanks! Hi, we've been experiencing a similar problem recently with debian etch and a 2.6.18 kernel. For us the workaround was to turn off rx/tx checksumming on the relevant network interface, like so: ethtool -K eth0 rx off tx off Cheers, Matt p.s. here's our oops message: BUG: unable to handle kernel paging request at virtual address c081b000 printing eip: c01bb497 0e5a2000 -> *pde = 00000000:c4871001 0e5a3000 -> *pme = 00000000:06fa3067 00fa3000 -> *pte = 00000000:00000000 Oops: 0000 [#1] SMP Modules linked in: netloop button ac battery ip6table_filter ip6_tables iptablen CPU: 0 EIP: 0061:[<c01bb497>] Not tainted VLI EFLAGS: 00010282 (2.6.18-6-xen-686 #1) EIP is at csum_partial+0xd3/0x120 eax: 00000000 ebx: c01bb497 ecx: 0000000b edx: 0000059c esi: c081b01c edi: 0000059c ebp: 00000040 esp: c88dfd84 ds: 007b es: 007b ss: 0069 Process python (pid: 9966, ti=c88de000 task=cf760000 task.ti=c88de000) Stack: c081b000 00000040 c022da42 c081b000 0000059c 00000000 00000018 c6b808ac 00000001 0000002c 000005dc cfe33b3c c022e94e cfe33a00 0000059c c6b808ac ce7b54f8 ce7b550c c88dfe84 c02323fb aedd2abb cdc1dce0 00000003 c88dfe84 Call Trace: [<c022da42>] skb_checksum+0x112/0x27e [<c022e94e>] pskb_expand_head+0xce/0x112 [<c02323fb>] skb_checksum_help+0x5d/0xac [<d13d52ea>] ip_nat_fn+0x42/0x184 [iptable_nat] [<d13d8092>] ipt_local_hook+0x76/0xcc [iptable_mangle] [<d13d561e>] ip_nat_local_fn+0x34/0xaa [iptable_nat] [<c024e3b8>] dst_output+0x0/0x7 [<c02472f0>] nf_iterate+0x30/0x61 [<c024e3b8>] dst_output+0x0/0x7 [<c0247416>] nf_hook_slow+0x3a/0x90 [<c024e3b8>] dst_output+0x0/0x7 [<c02505b0>] ip_queue_xmit+0x35f/0x3b3 [<c024e3b8>] dst_output+0x0/0x7 [<c0155fcd>] kmem_cache_alloc+0x4a/0x54 [<c022edb1>] alloc_skb_from_cache+0x48/0x110 [<c025df78>] tcp_transmit_skb+0x604/0x632 [<c025ecd4>] tcp_retransmit_skb+0x4e2/0x5c7 [<c0257e28>] tcp_enter_loss+0x1a1/0x1fd [<c0260dab>] tcp_write_timer+0x0/0x5c9 [<c02611a3>] tcp_write_timer+0x3f8/0x5c9 [<c0123376>] run_timer_softirq+0x101/0x15c [<c011f346>] __do_softirq+0x5e/0xc3 [<c011f3e5>] do_softirq+0x3a/0x4a [<c0106125>] do_IRQ+0x48/0x53 [<c020c614>] evtchn_do_upcall+0x64/0x9b [<c0104a51>] hypervisor_callback+0x3d/0x48 Code: a8 13 46 ac 13 46 b0 13 46 b4 13 46 b8 13 46 bc 13 46 c0 13 46 c4 13 46 c EIP: [<c01bb497>] csum_partial+0xd3/0x120 SS:ESP 0069:c88dfd84 <0>Kernel panic - not syncing: Fatal exception in interrupt (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Matthew, are you using anything like drbd? That seems to be a common thread in the other two reports. In any case, the culprit here is the owner of the socket. So it would really help if you can pin-point the port number and PID of the socket whose retransmitted packet triggered this. Other things like drbd would be iscsi, NFS, or anything that does TCP in the kernel. Hi Herbert, yes we are using drbd. We are no longer experiencing the issue and I'm reluctant to remove the workaround as the problem only occurred on production services. I would be happy to help if I can, though. Is there anything I can do in hindsight? Matt Based on this information my conclusion is that there is a bug in drbd where it frees pages that are still owned by the TCP socket. Since this looks like a drbd issue (see comment #18, and the common thread that all of the reported stacks are using drbd), and since we don't support drbd in RHEL-5, I'm going to close this as NOTABUG. If this can be reproduced without drbd, or someone finds other evidence to the contrary, please feel free to re-open the bug. Chris Lalancette *** Bug 666005 has been marked as a duplicate of this bug. *** |
I am not good at English. I'm sorry in strange sentences. ;-) The following kernel panics occurred. ------------------ BUG: unable to handle kernel paging request at virtual address e07ba040 printing eip: c04d70a2 1027c000 -> *pde = 00000000:d7f6a001 0896a000 -> *pme = 00000000:030fb067 000fb000 -> *pte = 00000000:00000000 Oops: 0000 [#1] SMP last sysfs file: /class/misc/evtchn/dev Modules linked in: xt_physdev ipt_MASQUERADE ip_conntrack_ftp iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables netloop netbk blktap blkbk drbd(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc bridge dummy 8021q ipv6 dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport sg ide_cd i2c_i801 i2c_core cdrom pcspkr tg3 serio_raw 8250_pnp 8250 serial_core r8169 ata_piix libata sd_mod scsi_mod raid1 ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0061:[<c04d70a2>] Not tainted VLI EFLAGS: 00210282 (2.6.18-8.1.8.el5xen #1) EIP is at csum_partial+0xca/0x120 eax: 00000000 ebx: c04d70a2 ecx: 0000000b edx: 000005a8 esi: e07ba068 edi: 000005a8 ebp: 00000034 esp: c06fbdfc ds: 007b es: 007b ss: 0069 Process at-spi-registry (pid: 11203, ti=c06fb000 task=ce1b1000 task.ti=c9ace000) Stack: e07ba000 00000034 c059778e e07ba040 000005a8 00000000 00000010 d226d76c 00000000 00000020 000005dc c08d6d14 c0598693 c08d6c00 000005a8 d226d76c d784bccc d784bce0 c06fbef8 c059c0c3 4c738383 c8eec060 00000003 c06fbef8 Call Trace: [<c059778e>] skb_checksum+0x111/0x27b [<c0598693>] pskb_expand_head+0xcf/0x113 [<c059c0c3>] skb_checksum_help+0x64/0xb3 [<e14462ee>] ip_nat_fn+0x42/0x185 [iptable_nat] [<e1446628>] ip_nat_local_fn+0x34/0xa4 [iptable_nat] [<c05b89a4>] dst_output+0x0/0x7 [<c05b1738>] nf_iterate+0x30/0x61 [<c05b89a4>] dst_output+0x0/0x7 [<c05b185e>] nf_hook_slow+0x3a/0x90 [<c05b89a4>] dst_output+0x0/0x7 [<c05babc3>] ip_queue_xmit+0x37e/0x3cf [<c05b89a4>] dst_output+0x0/0x7 [<e1068a97>] scsi_dispatch_cmd+0x21f/0x28c [scsi_mod] [<c041556a>] enqueue_task+0x29/0x39 [<c045e7cd>] kmem_cache_alloc+0x54/0x5e [<c05c8527>] tcp_transmit_skb+0x5e4/0x612 [<c05c9265>] tcp_retransmit_skb+0x4b7/0x595 [<c05c247b>] tcp_enter_loss+0x1a2/0x1ff [<c05cb311>] tcp_write_timer+0x0/0x5d3 [<c05cb710>] tcp_write_timer+0x3ff/0x5d3 [<c0424a25>] run_timer_softirq+0x101/0x15c [<c041ffa7>] __do_softirq+0x5e/0xc3 [<c040679c>] do_softirq+0x56/0xae [<c040673d>] do_IRQ+0xa5/0xae [<c053a0ad>] evtchn_do_upcall+0x64/0x9b [<c0404ec5>] hypervisor_callback+0x3d/0x48 [<c05399fc>] force_evtchn_callback+0xa/0xc [<c05ee1fd>] unix_write_space+0x3f/0x69 [<c0596ccb>] sock_wfree+0x21/0x36 [<c059849b>] __kfree_skb+0x97/0xe3 [<c05ecc9f>] unix_stream_recvmsg+0x33f/0x4a4 [<c0592eb4>] do_sock_read+0xae/0xb7 [<c0593411>] sock_aio_read+0x53/0x61 [<c0461fa7>] do_sync_read+0xb6/0xf1 [<c042cc1d>] autoremove_wake_function+0x0/0x2d [<c059346c>] sock_ioctl+0x0/0x1b3 [<c04628c1>] vfs_read+0xb0/0x141 [<c0462cfe>] sys_read+0x3c/0x63 [<c0404cff>] syscall_call+0x7/0xb ======================= Code: 9c 13 46 a0 13 46 a4 13 46 a8 13 46 ac 13 46 b0 13 46 b4 13 46 b8 13 46 bc 13 46 c0 13 46 c4 13 46 c8 13 46 cc 13 46 d0 13 46 d4 <13> 46 d8 13 46 dc 13 46 e0 13 46 e4 13 46 e8 13 46 ec 13 46 f0 EIP: [<c04d70a2>] csum_partial+0xca/0x120 SS:ESP 0069:c06fbdfc <0>Kernel panic - not syncing: Fatal exception in interrupt BUG: warning at arch/i386/kernel/smp-xen.c:529/smp_call_function() (Not tainted) [<c040db7f>] smp_call_function+0x59/0xfe [<c040dc37>] smp_send_stop+0x13/0x1e [<c041b470>] panic+0x45/0x16d [<c040595a>] die+0x24e/0x282 [<c05f6812>] do_page_fault+0xa7a/0xbeb [<c04d70a2>] csum_partial+0xca/0x120 [<c05f5d98>] do_page_fault+0x0/0xbeb [<c0404e83>] error_code+0x2b/0x30 [<c04d70a2>] csum_partial+0xca/0x120 [<c04d70a2>] csum_partial+0xca/0x120 [<c059778e>] skb_checksum+0x111/0x27b [<c0598693>] pskb_expand_head+0xcf/0x113 [<c059c0c3>] skb_checksum_help+0x64/0xb3 [<e14462ee>] ip_nat_fn+0x42/0x185 [iptable_nat] [<e1446628>] ip_nat_local_fn+0x34/0xa4 [iptable_nat] [<c05b89a4>] dst_output+0x0/0x7 [<c05b1738>] nf_iterate+0x30/0x61 [<c05b89a4>] dst_output+0x0/0x7 [<c05b185e>] nf_hook_slow+0x3a/0x90 [<c05b89a4>] dst_output+0x0/0x7 [<c05babc3>] ip_queue_xmit+0x37e/0x3cf [<c05b89a4>] dst_output+0x0/0x7 [<e1068a97>] scsi_dispatch_cmd+0x21f/0x28c [scsi_mod] [<c041556a>] enqueue_task+0x29/0x39 [<c045e7cd>] kmem_cache_alloc+0x54/0x5e [<c05c8527>] tcp_transmit_skb+0x5e4/0x612 [<c05c9265>] tcp_retransmit_skb+0x4b7/0x595 [<c05c247b>] tcp_enter_loss+0x1a2/0x1ff [<c05cb311>] tcp_write_timer+0x0/0x5d3 [<c05cb710>] tcp_write_timer+0x3ff/0x5d3 [<c0424a25>] run_timer_softirq+0x101/0x15c [<c041ffa7>] __do_softirq+0x5e/0xc3 [<c040679c>] do_softirq+0x56/0xae [<c040673d>] do_IRQ+0xa5/0xae [<c053a0ad>] evtchn_do_upcall+0x64/0x9b [<c0404ec5>] hypervisor_callback+0x3d/0x48 [<c05399fc>] force_evtchn_callback+0xa/0xc [<c05ee1fd>] unix_write_space+0x3f/0x69 [<c0596ccb>] sock_wfree+0x21/0x36 [<c059849b>] __kfree_skb+0x97/0xe3 [<c05ecc9f>] unix_stream_recvmsg+0x33f/0x4a4 [<c0592eb4>] do_sock_read+0xae/0xb7 [<c0593411>] sock_aio_read+0x53/0x61 [<c0461fa7>] do_sync_read+0xb6/0xf1 [<c042cc1d>] autoremove_wake_function+0x0/0x2d [<c059346c>] sock_ioctl+0x0/0x1b3 [<c04628c1>] vfs_read+0xb0/0x141 [<c0462cfe>] sys_read+0x3c/0x63 [<c0404cff>] syscall_call+0x7/0xb ======================= (XEN) Domain 0 crashed: rebooting machine in 5 seconds. ----------- Additional info: 20-30 days running. A certain day kernel panic suddenly. It is unbelievable though the problem seems to occur by csum_partial in the panic message. csum_partial_copy_generic is a function used very often. The kernel panic will more frequently be done if there is a problem. However, it running for 20-30 days. Why ? The environment DRBD 8.0.5.1(compiled from source) heartbeat 2.1.2(compiled from source) Xen hypervisor 3.0.3-25.el5(include RHEL5.0) LinuxKernel 2.6.18-8.el5xen(include RHEL5.0) The failover cluster was made by using these packages.