Bug 443621
Summary: | kernel panic xen cluster. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | makoto nohara <nohara.makoto> |
Component: | kernel-xen | Assignee: | Xen Maintainance List <xen-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Martin Jenner <mjenner> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 5.0 | CC: | clalance, matt.baker, prickett233, tao, xen-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-05-04 15:46:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 492568 |
Description
makoto nohara
2008-04-22 15:49:33 UTC
H/W The environment. ------------------ PowerEdge 860 (cpu intel 3040) memory 4G byte 80G byte HDDx2 (Software RAID1) ----------------- The failover cluster is composed by using machine parts similar to the above- mentioned. RHES4.5 is running on Xen of RHEL5.0. ,in a word Dom0 = RHEL5.0. Domu = RHES4.5. When trouble occurs, VM(DomU) starts with another machine . BIOS of the version that corrects the bug is used though Xeon3040 installed in DELL860 has the microcode bug. I'm sorry that necessary information is added only by the additional information. After it writes, I think, "Is it this information and a necessity?"..... The kernel/hypervisors versions you are using here: Xen hypervisor 3.0.3-25.el5(include RHEL5.0) Linux Kernel 2.6.18-8.el5xen(include RHEL5.0) are still the RHEL-5.0 GA releases. Could you re-test with the RHEL-5.2 erratas applied to the system RHEL-5.2 system now testing . However, the continuous running time is not enough. Time is necessary a little more for the result's turning out. *** Bug 479756 has been marked as a duplicate of this bug. *** RHEL-5.2 system testing .... However, the panic has occurred. This panic might be a cause different from the panic that occurred before. The environment LinuxKernel 2.6.18-92.1.1.el5xen(download from redhat-network) drbd-8.2.6-3(compiled from source) heartbeat-2.1.3-1(compiled from source) xen-3.0.3-64.el5(include RHEL5.2) ------------ KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.1.1.el5xen/vmlinux DUMPFILE: /mnt/127.0.0.1-2009-01-08-12:33:06/vmcore CPUS: 2 DATE: Thu Jan 8 12:32:46 2009 UPTIME: 22 days, 20:09:27 LOAD AVERAGE: 1.73, 1.23, 1.17 TASKS: 226 NODENAME: XXXXX1 RELEASE: 2.6.18-92.1.1.el5xen VERSION: #1 SMP Thu May 22 09:31:19 EDT 2008 MACHINE: i686 (1866 Mhz) MEMORY: 520 MB PANIC: "Oops: 0000 [#1]" (check log for details) PID: 979 COMMAND: "nautilus" TASK: de5d5aa0 [THREAD_INFO: cfefc000] CPU: 0 STATE: TASK_RUNNING (PANIC) ------------ crash log ------------- BUG: unable to handle kernel paging request at virtual address e071d668 printing eip: c04e325a 00ecb000 -> *pde = 00000000:c6a49001 197c7000 -> *pme = 00000000:3e0fc067 000fc000 -> *pte = 00000000:00000000 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: xt_physdev ip_conntrack_ftp netloop netbk blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_R EJECT xt_tcpudp iptable_filter ip_tables x_tables drbd(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc bridge dummy 8021q dm_mirror dm_multipath dm _mod video sbs backlight i2c_ec button battery asus_acpi ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport sg i3000_edac edac_mc ide_cd r8169 i 2c_i801 i2c_core pcspkr serial_core cdrom tg3 serio_raw ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0061:[<c04e325a>] Tainted: G VLI EFLAGS: 00010296 (2.6.18-92.1.1.el5xen #1) EIP is at csum_partial+0xca/0x120 eax: 00000000 ebx: c04e325a ecx: 0000000b edx: 000005a8 esi: e071d690 edi: 000005a8 ebp: 00000034 esp: c071cde8 ds: 007b es: 007b ss: 0069 Process nautilus (pid: 979, ti=c071c000 task=de5d5aa0 task.ti=cfefc000) Stack: e071d000 00000034 c05a9f12 e071d668 000005a8 00000000 00000010 db73112c 00000000 00000020 000005dc cbea7f14 c05aae25 cbea7e00 000005a8 db73112c c98aa2cc c98aa2e0 c071cef8 c05ae744 c4d5de4c 00000000 00000003 c071cef8 Call Trace: [<c05a9f12>] skb_checksum+0x111/0x27b [<c05aae25>] pskb_expand_head+0xd6/0x11a [<c05ae744>] skb_checksum_help+0x64/0xb3 [<e14bd2ee>] ip_nat_fn+0x42/0x185 [iptable_nat] [<c0469c92>] kmem_cache_alloc+0x54/0x5e [<e14bd628>] ip_nat_local_fn+0x34/0xa4 [iptable_nat] [<c05cb3c8>] dst_output+0x0/0x7 [<c05c3e3c>] nf_iterate+0x30/0x61 [<c05cb3c8>] dst_output+0x0/0x7 [<c05c3f62>] nf_hook_slow+0x3a/0x90 [<c05cb3c8>] dst_output+0x0/0x7 [<c05cd711>] ip_queue_xmit+0x3cd/0x41e [<c05cb3c8>] dst_output+0x0/0x7 [<c041b708>] __activate_task+0x1c/0x29 [<c041bfd4>] try_to_wake_up+0x309/0x313 [<e14f3dff>] net_rx_action+0x771/0x7de [netbk] [<c05db34d>] tcp_transmit_skb+0x5e4/0x612 [<c043190c>] autoremove_wake_function+0xd/0x2d [<c041a99f>] __wake_up_common+0x2f/0x53 [<c05dc0a6>] tcp_retransmit_skb+0x4c0/0x59e [<c041b567>] __wake_up+0x2a/0x3d [<c05d5223>] tcp_enter_loss+0x1a2/0x1ff [<c05de19c>] tcp_write_timer+0x0/0x5e4 [<c05de5a1>] tcp_write_timer+0x405/0x5e4 [<c0429600>] run_timer_softirq+0x101/0x15c [<c042613e>] __do_softirq+0x5e/0xc3 [<c0406edf>] do_softirq+0x56/0xaf [<c0406e80>] do_IRQ+0xa5/0xae [<c0549b63>] evtchn_do_upcall+0x64/0x9b [<c04055d9>] hypervisor_callback+0x3d/0x48 ======================= Code: 9c 13 46 a0 13 46 a4 13 46 a8 13 46 ac 13 46 b0 13 46 b4 13 46 b8 13 46 bc 13 46 c0 13 46 c4 13 46 c8 13 46 cc 13 46 d0 13 46 d4 <13> 46 d8 13 46 dc 13 46 e0 13 46 e4 13 46 e8 13 46 ec 13 46 f0 EIP: [<c04e325a>] csum_partial+0xca/0x120 SS:ESP 0069:c071cde8 ------------- crash> bt PID: 979 TASK: de5d5aa0 CPU: 0 COMMAND: "nautilus" #0 [c071cd0c] die at c040606e #1 [c071cd38] do_page_fault at c060abfc #2 [c071cdb0] error_code (via page_fault) at c0405595 EAX: 00000000 EBX: c04e325a ECX: 0000000b EDX: 000005a8 EBP: 00000034 DS: 007b ESI: e071d690 ES: 007b EDI: 000005a8 CS: 0061 EIP: c04e325a ERR: ffffffff EFLAGS: 00010296 #3 [c071cde4] csum_partial at c04e325a #4 [c071cdf0] skb_checksum at c05a9f0d #5 [c071ce34] skb_checksum_help at c05ae73f #6 [c071ce48] ip_nat_fn at e14bd2e9 #7 [c071ce6c] ip_nat_local_fn at e14bd623 #8 [c071ce80] nf_iterate at c05c3e39 #9 [c071cea0] nf_hook_slow at c05c3f5d #10 [c071cecc] ip_queue_xmit at c05cd70c #11 [c071cf60] tcp_transmit_skb at c05db34b #12 [c071cf94] tcp_retransmit_skb at c05dc0a1 #13 [c071cfbc] tcp_write_timer at c05de59c #14 [c071cfcc] run_timer_softirq at c04295fe #15 [c071cfe8] __do_softirq at c042613c --- <soft IRQ> --- ------------ crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 c066f2c0 RU 0.0 0 0 [swapper] 0 1 1 c0d60550 RU 0.0 0 0 [swapper] 1 0 0 c0d60aa0 IN 0.1 2076 680 init 2 1 0 c0d60000 IN 0.0 0 0 [migration/0] 3 1 0 c0198aa0 IN 0.0 0 0 [ksoftirqd/0] 4 1 0 c0198550 IN 0.0 0 0 [watchdog/0] 5 1 1 c0198000 IN 0.0 0 0 [migration/1] 6 1 1 c5568aa0 IN 0.0 0 0 [ksoftirqd/1] 7 1 1 c5568550 IN 0.0 0 0 [watchdog/1] 8 1 0 c5568000 IN 0.0 0 0 [events/0] 9 1 1 c0cfeaa0 IN 0.0 0 0 [events/1] 10 1 1 c0cfe550 IN 0.0 0 0 [khelper] 11 1 0 c0cfe000 IN 0.0 0 0 [kthread] 13 11 0 c0cc2550 IN 0.0 0 0 [xenwatch] 14 11 0 c0cc2000 IN 0.0 0 0 [xenbus] 17 11 0 c0cb8000 IN 0.0 0 0 [kblockd/0] 18 11 1 c0c9eaa0 IN 0.0 0 0 [kblockd/1] 19 11 0 c0c9e550 IN 0.0 0 0 [kacpid] 98 11 0 c0c0b000 IN 0.0 0 0 [cqueue/0] 99 11 1 c0c11aa0 IN 0.0 0 0 [cqueue/1] 103 11 0 c0c1b550 IN 0.0 0 0 [khubd] 105 11 0 c0c20aa0 IN 0.0 0 0 [kseriod] 170 11 0 c07d7aa0 IN 0.0 0 0 [pdflush] 171 11 1 c07d1000 IN 0.0 0 0 [pdflush] 172 11 0 c07d1550 IN 0.0 0 0 [kswapd0] 173 11 0 c07d1aa0 IN 0.0 0 0 [aio/0] 174 11 1 c0c4a000 IN 0.0 0 0 [aio/1] 320 11 1 c0c6aaa0 IN 0.0 0 0 [kpsmoused] 354 11 0 c0c58aa0 IN 0.0 0 0 [ata/0] 355 11 1 c0c58550 IN 0.0 0 0 [ata/1] 356 11 0 c0c58000 IN 0.0 0 0 [ata_aux] 360 11 0 c0c79550 IN 0.0 0 0 [scsi_eh_0] 361 11 0 c0c89550 IN 0.0 0 0 [scsi_eh_1] 364 11 0 c0c2eaa0 RU 0.0 0 0 [md1_raid1] 367 11 0 c0c28aa0 IN 0.0 0 0 [md0_raid1] 368 11 0 c0c28000 IN 0.0 0 0 [kjournald] 396 11 1 c0c28550 IN 0.0 0 0 [kauditd] 430 1 1 c0c1b000 IN 0.2 2440 888 udevd 675 31848 1 de1bd550 IN 1.3 24024 7176 gnome-session 823 675 0 cb8bc550 DE 0.0 0 0 Xsession 826 675 1 d0120550 IN 0.1 6492 604 ssh-agent 855 1 0 d1536aa0 IN 0.1 2824 788 dbus-launch 857 1 0 d0705aa0 IN 0.2 2756 956 dbus-daemon 875 1 1 d1ea1aa0 IN 0.7 8224 3632 gconfd-2 876 1 1 c8f59000 IN 1.5 38228 7844 scim-panel-gtk 877 1 1 d1536000 IN 1.5 38228 7844 scim-panel-gtk 878 1 1 cb0f5550 IN 0.2 9224 808 scim-launcher 903 11 1 df218000 IN 0.0 0 0 [kedac] 905 1 0 c6fa3aa0 IN 0.1 2576 764 gnome-keyring-d 907 1 1 d64b8aa0 IN 1.5 34872 8088 gnome-settings- 946 1 1 de979aa0 ?? 1.5 34872 8088 gnome-settings- 972 1 1 e0754aa0 IN 2.3 28320 12200 metacity 977 1 1 d1c3baa0 IN 3.0 58048 15960 gnome-panel > 979 1 0 de5d5aa0 RU 11.6 137020 61616 nautilus 983 1 1 de5d5000 IN 0.6 40764 2996 bonobo-activati 984 1 1 c62b4000 IN 0.6 40764 2996 bonobo-activati 985 1 1 d257eaa0 IN 0.9 23556 4880 gnome-volume-ma 987 1 0 deefc000 IN 1.6 45820 8428 eggcups 989 1 0 c62b4550 IN 0.7 12392 3660 gnome-vfs-daemo 1008 1 0 dfd19aa0 IN 1.0 15412 5068 bt-applet 1016 1 1 d480caa0 IN 5.2 119356 27476 xulrunner-bin 1021 1 1 d64b8550 IN 1.9 46208 9992 nm-applet 1023 1 1 de77e550 IN 0.9 16216 4696 pam-panel-icon 1024 1 0 d257e550 RU 1.2 46028 6276 gnome-power-man 1025 1023 1 de92c000 IN 0.1 1856 620 pam_timestamp_c 1060 1 0 d7f31aa0 IN 2.7 57148 14348 wnck-applet 1062 1 0 dc7c3aa0 IN 1.7 76700 8832 trashapplet 1125 1 1 dc7c3000 IN 5.2 119356 27476 xulrunner-bin 1155 1 1 c6fa3000 IN 0.2 8104 1228 scim-bridge 1197 1 1 c6fa3550 IN 1.5 24076 7876 notification-ar 1199 1 1 ccd97aa0 IN 2.6 39724 13968 clock-applet 1201 1 1 d1536550 IN 2.6 56968 13824 mixer_applet2 1203 1 1 d56f4aa0 IN 5.2 119356 27476 xulrunner-bin 1302 1 1 c8949aa0 IN 0.3 43468 1428 pcscd 1303 1 1 d7f95000 IN 5.2 119356 27476 xulrunner-bin 1487 11 0 df157aa0 IN 0.0 0 0 [kmpathd/0] 1488 11 1 dea3a550 IN 0.0 0 0 [kmpathd/1] 1515 11 0 df9eaaa0 IN 0.0 0 0 [kjournald] 1771 1 1 cd969550 IN 0.9 17996 4676 gnome-screensav 2569 1 1 c07f3aa0 IN 0.2 13188 812 auditd 2570 1 1 c0c4a550 IN 0.2 13188 812 auditd 2571 2569 1 dec66aa0 IN 0.2 14112 980 audispd 2572 2569 1 c07f9550 IN 0.2 14112 980 audispd 2594 1 0 c0c11000 IN 0.1 1732 620 syslogd 2597 1 0 deefc550 RU 0.1 1684 408 klogd 2609 1 1 de2e4000 IN 0.1 2444 368 irqbalance 2630 1 0 c0c33aa0 IN 0.1 1820 548 portmap 2659 1 0 c0c2e000 IN 0.1 1832 740 rpc.statd 2699 1 0 c0c20550 IN 0.1 1848 396 mdadm 2729 1 1 de979550 IN 0.1 5452 572 rpc.idmapd 2789 1 1 df157000 IN 0.2 2888 1104 dbus-daemon 2800 1 0 c0cb8aa0 IN 0.1 2160 780 hcid 2806 1 0 c0c89000 IN 0.1 1752 520 sdpd 2829 1 0 c026caa0 IN 0.0 0 0 [krfcommd] 2870 1 0 c07f9aa0 IN 0.3 43468 1428 pcscd 2885 1 0 c0c4aaa0 IN 0.3 43468 1428 pcscd 2891 1 0 de8a6550 IN 0.1 1924 464 hidd 2907 1 0 c0c6a000 IN 0.2 10852 1320 automount 2908 1 1 de5cdaa0 IN 0.2 10852 1320 automount 2909 1 1 c0c0baa0 IN 0.2 10852 1320 automount 2912 1 1 c0c64aa0 IN 0.2 10852 1320 automount 2915 1 0 df97daa0 IN 0.2 10852 1320 automount 2926 1 0 c0cc2aa0 IN 0.1 1684 544 acpid 2937 1 0 c0c9e000 IN 0.1 5084 764 hpiod 2942 1 1 de1bd000 IN 0.9 14568 4788 python 2957 1 0 df9ea550 IN 0.2 7000 1056 sshd 2968 1 1 de2e4550 IN 0.5 10936 2416 cupsd 2980 1 0 df218aa0 IN 0.2 2736 904 xinetd 2995 1 0 df9ea000 RU 0.0 0 0 [drbd0_worker] 3005 1 0 c0c0b550 RU 0.0 0 0 [drbd0_receiver] 3013 1 0 c07f3550 IN 0.0 0 0 [drbd0_asender] 3030 1 0 de5cd000 IN 0.2 4412 1096 ha_logd 3039 3030 0 de2e4aa0 RU 0.1 4412 796 ha_logd 3079 1 1 de5d5550 IN 2.3 12108 12108 heartbeat 3090 1 1 c0c50aa0 IN 0.1 1908 488 gpm 3101 1 0 c07f9000 IN 0.2 6220 1120 crond 3120 3079 1 c0c2e550 ?? 1.0 5512 5512 heartbeat 3121 3079 1 dea3a000 IN 1.0 5508 5508 heartbeat 3122 3079 1 c07d7000 IN 1.0 5508 5508 heartbeat 3141 1 1 dea3aaa0 IN 0.4 4320 2132 xfs 3162 1 0 c0c50550 IN 0.1 2256 440 atd 3185 1 0 c026c000 IN 0.3 5016 1664 libvirtd 3212 1 0 deefcaa0 IN 0.1 4644 412 rhnsd 3236 1 0 dec66550 IN 0.7 5844 3924 hald 3247 3236 0 df218550 IN 0.2 3148 1084 hald-runner 3337 3185 0 c026c550 IN 0.1 1828 748 dnsmasq 3358 3247 0 d1ea1550 IN 0.2 2008 808 hald-addon-keyb 3360 3247 0 df157550 IN 0.2 2012 812 hald-addon-acpi 3365 3247 0 d232a550 IN 0.1 1968 660 hald-addon-stor 3618 1 1 d074c000 IN 0.2 2252 1112 xenstored 3623 1 1 d1c3b550 IN 0.8 13032 4004 python 3624 3623 0 d066b550 IN 1.0 96184 5552 python 3626 1 0 c0c89aa0 IN 0.1 12212 608 xenconsoled 3627 1 1 e0754550 IN 0.1 13544 792 blktapctrl 3628 1 0 d0120aa0 IN 0.1 12212 608 xenconsoled 3629 1 0 c0c6a550 IN 0.1 13544 792 blktapctrl 3630 3623 0 df97d000 IN 1.0 96184 5552 python 3633 3623 0 d06cd550 IN 1.0 96184 5552 python 3634 3623 1 c0c79000 IN 1.0 96184 5552 python 3900 3623 0 e0754000 IN 1.0 96184 5552 python 3901 3623 1 d06e0550 IN 1.0 96184 5552 python 3902 1 0 d06e0aa0 IN 1.9 25632 10208 yum-updatesd 3907 1 0 d0705000 IN 0.2 2696 1224 gam_server 3974 1 1 d0705550 ?? 0.2 5472 1220 livxen.sh 3975 1 1 d071d550 IN 0.2 5472 1224 xentop-logger.s 3976 1 1 d0720000 IN 0.2 5472 1208 swaps-logger.sh 4211 1 0 d074c550 IN 0.1 1996 520 smartd 4223 1 1 cb792000 IN 0.1 1668 456 mingetty 4224 1 0 cbb9b000 IN 0.1 1668 456 mingetty 4227 1 0 dec66000 IN 0.1 1668 452 mingetty 4236 1 0 cb792550 IN 0.1 1668 456 mingetty 4238 1 1 c07f3000 IN 0.1 1664 452 mingetty 4244 1 1 cb8bcaa0 IN 0.1 1668 456 mingetty 4246 1 0 d0720aa0 IN 0.6 16652 3028 gdm-binary 4471 4246 1 d232aaa0 IN 0.5 17256 2764 gdm-binary 4473 1 0 c8110550 IN 0.8 28332 4200 gdm-rh-security 4482 4471 1 cbb9b550 IN 2.9 38816 15204 Xorg 4578 1 0 d073e000 RU 0.3 43468 1428 pcscd 4584 1 1 cb6da550 IN 0.8 28332 4200 gdm-rh-security 4622 11 1 d071daa0 IN 0.0 0 0 [kjournald] 4745 1 1 df97d550 IN 0.1 15012 724 tapdisk 4746 1 1 de8a6aa0 IN 0.1 15012 724 tapdisk 4751 1 1 c872d000 IN 0.1 15012 720 tapdisk 4753 1 1 d06cdaa0 IN 0.1 15012 720 tapdisk 4786 1 1 c0c33000 IN 0.9 32720 4732 qemu-dm 5014 1 0 c6c8aaa0 IN 0.9 32720 4732 qemu-dm 5033 1 0 cb0f5aa0 IN 0.9 32720 4732 qemu-dm 5083 11 1 c0c79aa0 IN 0.0 0 0 [xvd 1] 5084 11 0 c6aacaa0 IN 0.0 0 0 [xvd 1] 5838 3101 0 d232a000 IN 0.3 6796 1504 crond 5839 5838 0 c72c9000 DE 0.0 0 0 python 5840 5838 0 d64b8000 IN 0.4 7916 2268 sendmail 7168 3976 1 d480c550 IN 0.1 4648 488 sleep 7169 1 0 c92fc000 IN 11.6 137020 61616 nautilus 7178 3974 1 d06cd000 RU 0.1 5472 488 livxen.sh > 7179 7178 1 c07d7550 RU 1.0 12144 5356 python 7180 7178 1 ca73e550 ?? 0.1 4988 768 grep 7181 7178 1 cfb9caa0 RU 0.0 5472 168 livxen.sh 14882 4471 0 d074caa0 IN 1.4 24180 7296 gnome-session 14923 14882 0 d1ea1000 DE 0.0 0 0 Xsession 14926 14882 1 c0835aa0 IN 0.1 6492 640 ssh-agent 14955 1 0 c62b4aa0 IN 0.1 2784 624 dbus-launch 14956 1 1 c0c11550 IN 0.2 2760 980 dbus-daemon 14962 1 0 c0c20000 IN 0.7 8212 3580 gconfd-2 14975 1 1 c872daa0 IN 0.4 27744 1972 scim-launcher 14978 1 0 d06e0000 IN 0.1 2580 764 gnome-keyring-d 14980 1 0 cb0f5000 IN 1.5 34928 8216 gnome-settings- 14982 1 1 dfd19550 ?? 1.5 34928 8216 gnome-settings- 14997 1 1 de77e000 IN 2.5 28884 13152 metacity 15000 1 0 c8110aa0 IN 0.1 7112 768 scim-helper-man 15001 1 0 d1c3b000 IN 1.5 38420 7816 scim-panel-gtk 15002 1 0 c8110000 IN 1.5 38420 7816 scim-panel-gtk 15003 1 1 c6aac550 IN 0.2 9220 812 scim-launcher 15012 1 1 d066baa0 IN 3.1 58344 16488 gnome-panel 15017 1 1 c0c33550 IN 3.9 96664 20820 nautilus 15021 1 1 cbb9baa0 IN 0.6 39736 3020 bonobo-activati 15023 1 1 c872d550 IN 1.6 45816 8428 eggcups 15025 1 1 c6aac000 IN 0.7 12384 3692 gnome-vfs-daemo 15028 1 1 c0cb8550 IN 0.6 39736 3020 bonobo-activati 15029 1 1 d0120000 IN 0.9 23556 4948 gnome-volume-ma 15039 1 0 de8a6000 IN 0.9 15400 5044 bt-applet 15046 1 0 c6c8a550 IN 4.0 39544 21388 puplet 15048 1 0 de77eaa0 IN 1.9 46208 9976 nm-applet 15060 1 1 cb8bc000 IN 0.9 16216 4692 pam-panel-icon 15062 15060 1 d073e550 IN 0.1 1856 620 pam_timestamp_c 15064 1 0 c6645aa0 IN 0.5 18240 2636 escd 15065 1 1 de979000 IN 0.3 43468 1428 pcscd 15067 1 1 d0720550 IN 1.2 46032 6360 gnome-power-man 15068 1 1 dfd19000 IN 0.5 18240 2636 escd 15111 1 1 c6645550 IN 2.8 57336 14672 wnck-applet 15113 1 1 c0835550 IN 2.6 87652 13928 trashapplet 15141 1 1 de1bdaa0 IN 0.2 2480 880 mapping-daemon 15267 1 1 db67faa0 IN 1.5 24076 7836 notification-ar 15270 1 1 c0c64550 IN 2.6 39740 13976 clock-applet 15272 1 1 d066b000 IN 3.0 59112 15884 mixer_applet2 15414 1 0 dac5baa0 IN 3.8 79896 20144 gnome-terminal 15417 1 1 db070aa0 IN 0.2 8104 1284 scim-bridge 15418 15414 1 c0c1baa0 IN 0.1 2484 712 gnome-pty-helpe 15424 15414 1 c6c8a000 IN 0.3 5476 1516 bash 15425 1 0 db67f000 ?? 3.8 79896 20144 gnome-terminal 15907 1 1 dac5b000 IN 0.9 18012 4944 gnome-screensav 15963 15424 1 cfef5550 IN 1.0 7840 5276 vncviewer 25896 1771 0 d7f31000 RU 2.5 28268 13456 floaters 26928 3975 1 c0c50000 IN 0.1 4652 488 sleep 31836 2980 1 c6645000 IN 2.7 18844 14416 Xvnc 31848 4246 1 c0c64000 IN 0.5 17248 2684 gdm-binary ------------- Does this trouble relate to the trouble that occurs by RHEL5.0? Or, is it a new trouble? The stack trace looks to be about the same to me, so it looks like the same crash. Chris Lalancette It is an additional information.
> #0 [c071cd0c] die at c040606e
> #1 [c071cd38] do_page_fault at c060abfc
> #2 [c071cdb0] error_code (via page_fault) at c0405595
c040606e ?
It is made to display.
----------
crash> kmem c040606e
c040606e (T) die+568 ../debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/kernel/traps-xen.c: 469
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c10080c0 406000 0 0 1 400
crash>
----------
Traps-xen.c was examined.
traps-xen.c: 469
----------
if (in_interrupt())
panic("Fatal exception in interrupt"); # <- 469
----------
The following things are being written in the 382 .
----------
/* This is gone through when something in the kernel
* has done something bad and is about to be terminated.
*/
----------
something bad?
How can I examine "something bad"?
That's not the interesting part of the trace; that's just showing you that something happened that we didn't like, and now we are going to panic. Where you want to start looking is right before the error_code, namely at stack point #3. The instruction there is the one that caused the crash; you have to look at it and figure out what was going on at that point to cause it. Chris Lalancette I understand neither C language nor the assembler. Therefore, I begin to pick it up expecting to think that there is relations. It is a limit of my ability. The ability to understand my English is a limit. I'm sorry in strange sentences. ----------------- crash> kmem c04e325a c04e325a (T) csum_partial+202 include/asm/atomic.h: 165 PAGE PHYSICAL MAPPING INDEX CNT FLAGS c1009c60 4e3000 0 0 1 400 crash> /** * atomic_add_negative - add and test if negative * @v: pointer of type atomic_t * @i: integer value to add * * Atomically adds @i to @v and returns true * if the result is negative, or false when * result is greater than or equal to zero. */ static __inline__ int atomic_add_negative(int i, atomic_t *v) { unsigned char c; __asm__ __volatile__( #<-165 LOCK_PREFIX "addl %2,%0; sets %1" :"+m" (v->counter), "=qm" (c) :"ir" (i) : "memory"); return c; } ----------------- ----------------- crash> kmem c060abfc c060abfc (T) do_page_fault+2688 ../debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/mm/fault-xen.c: 698 PAGE PHYSICAL MAPPING INDEX CNT FLAGS c100c140 60a000 0 0 1 400 /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice. */ bust_spinlocks(1); if (oops_may_print()) { #ifdef CONFIG_X86_PAE if (error_code & 16) { pte_t *pte = lookup_address(address); if (pte && pte_present(*pte) && !pte_exec_kernel(*pte)) printk(KERN_CRIT "kernel tried to execute " "NX-protected page - exploit attempt? " "(uid: %d)\n", current->uid); } #endif if (address < PAGE_SIZE) printk(KERN_ALERT "BUG: unable to handle kernel NULL " "pointer dereference"); else printk(KERN_ALERT "BUG: unable to handle kernel paging" " request"); printk(" at virtual address %08lx\n",address); printk(KERN_ALERT " printing eip:\n"); printk("%08lx\n", regs->eip); dump_fault_path(address); } tsk->thread.cr2 = address; tsk->thread.trap_no = 14; tsk->thread.error_code = error_code; die("Oops", regs, error_code); #<-698 bust_spinlocks(0); do_exit(SIGKILL); ----------------- ----------------- crash> kmem c0405595 c0405595 (t) error_code+41 ../debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/kernel/entry.S PAGE PHYSICAL MAPPING INDEX CNT FLAGS c10080a0 405000 0 0 1 400 ----------------- /usr/src/debug/debug/kernel-2.6.18/linux-2.6.18.i686/arch/i386/kernel/entry.S The file was not found. However, there was a file as follows. /usr/src/debug/kernel-2.6.18/xen/arch/x86/x86_32/entry.S ---------- .Lft16: movl %eax,%gs:8(%esi) test $TBF_EXCEPTION_ERRCODE,%cl jz 1f subl $4,%esi # push error_code onto guest frame movl TRAPBOUNCE_error_code(%edx),%eax ---------- .Lfx1: sti SAVE_ALL_GPRS mov UREGS_error_code(%esp),%esi pushfl # EFLAGS movl $__HYPERVISOR_CS,%eax pushl %eax # CS movl $.Ldf1,%eax pushl %eax # EIP pushl %esi # error_code/entry_vector jmp handle_exception ---------- exception_with_ints_disabled: movl UREGS_eflags(%esp),%eax movb UREGS_cs(%esp),%al testl $(3|X86_EFLAGS_VM),%eax # interrupts disabled outside Xen? jnz FATAL_exception_with_ints_disabled pushl %esp call search_pre_exception_table addl $4,%esp testl %eax,%eax # no fixup code for faulting EIP? jz 1b movl %eax,UREGS_eip(%esp) movl %esp,%esi subl $4,%esp movl %esp,%edi movl $UREGS_kernel_sizeof/4,%ecx rep; movsl # make room for error_code/entry_vector movl UREGS_error_code(%esp),%eax # error_code/entry_vector movl %eax,UREGS_kernel_sizeof(%esp) jmp restore_all_xen # return to fixup code ---------- Does necessary information suffice in the above? It seems to use "atomic_add_negative" that is a part of function of "Csum_partial function". Is "atomic_add_negative" used to switch the processing of host OS and guest OS? It is imagined that PANIC was generated because the specified execution address reached a value not correct after the switch of the processing of host OS and guest OS. When an illegal address is generated by the processing of guest OS, does the value of EIP of the PANIC function become "csum_partial"? So the problem is that someone has unmapped the memory behind the packet that is still being retransmitted. Could you try to determine the socket of the packet (skb->sk) and its IP/port numbers? That should help you find the application (which is probably not the process in which it crashed since it's in softirq context) that owns the socket and perhaps we can have a chance in reproducing it then. Thanks! Hi, we've been experiencing a similar problem recently with debian etch and a 2.6.18 kernel. For us the workaround was to turn off rx/tx checksumming on the relevant network interface, like so: ethtool -K eth0 rx off tx off Cheers, Matt p.s. here's our oops message: BUG: unable to handle kernel paging request at virtual address c081b000 printing eip: c01bb497 0e5a2000 -> *pde = 00000000:c4871001 0e5a3000 -> *pme = 00000000:06fa3067 00fa3000 -> *pte = 00000000:00000000 Oops: 0000 [#1] SMP Modules linked in: netloop button ac battery ip6table_filter ip6_tables iptablen CPU: 0 EIP: 0061:[<c01bb497>] Not tainted VLI EFLAGS: 00010282 (2.6.18-6-xen-686 #1) EIP is at csum_partial+0xd3/0x120 eax: 00000000 ebx: c01bb497 ecx: 0000000b edx: 0000059c esi: c081b01c edi: 0000059c ebp: 00000040 esp: c88dfd84 ds: 007b es: 007b ss: 0069 Process python (pid: 9966, ti=c88de000 task=cf760000 task.ti=c88de000) Stack: c081b000 00000040 c022da42 c081b000 0000059c 00000000 00000018 c6b808ac 00000001 0000002c 000005dc cfe33b3c c022e94e cfe33a00 0000059c c6b808ac ce7b54f8 ce7b550c c88dfe84 c02323fb aedd2abb cdc1dce0 00000003 c88dfe84 Call Trace: [<c022da42>] skb_checksum+0x112/0x27e [<c022e94e>] pskb_expand_head+0xce/0x112 [<c02323fb>] skb_checksum_help+0x5d/0xac [<d13d52ea>] ip_nat_fn+0x42/0x184 [iptable_nat] [<d13d8092>] ipt_local_hook+0x76/0xcc [iptable_mangle] [<d13d561e>] ip_nat_local_fn+0x34/0xaa [iptable_nat] [<c024e3b8>] dst_output+0x0/0x7 [<c02472f0>] nf_iterate+0x30/0x61 [<c024e3b8>] dst_output+0x0/0x7 [<c0247416>] nf_hook_slow+0x3a/0x90 [<c024e3b8>] dst_output+0x0/0x7 [<c02505b0>] ip_queue_xmit+0x35f/0x3b3 [<c024e3b8>] dst_output+0x0/0x7 [<c0155fcd>] kmem_cache_alloc+0x4a/0x54 [<c022edb1>] alloc_skb_from_cache+0x48/0x110 [<c025df78>] tcp_transmit_skb+0x604/0x632 [<c025ecd4>] tcp_retransmit_skb+0x4e2/0x5c7 [<c0257e28>] tcp_enter_loss+0x1a1/0x1fd [<c0260dab>] tcp_write_timer+0x0/0x5c9 [<c02611a3>] tcp_write_timer+0x3f8/0x5c9 [<c0123376>] run_timer_softirq+0x101/0x15c [<c011f346>] __do_softirq+0x5e/0xc3 [<c011f3e5>] do_softirq+0x3a/0x4a [<c0106125>] do_IRQ+0x48/0x53 [<c020c614>] evtchn_do_upcall+0x64/0x9b [<c0104a51>] hypervisor_callback+0x3d/0x48 Code: a8 13 46 ac 13 46 b0 13 46 b4 13 46 b8 13 46 bc 13 46 c0 13 46 c4 13 46 c EIP: [<c01bb497>] csum_partial+0xd3/0x120 SS:ESP 0069:c88dfd84 <0>Kernel panic - not syncing: Fatal exception in interrupt (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Matthew, are you using anything like drbd? That seems to be a common thread in the other two reports. In any case, the culprit here is the owner of the socket. So it would really help if you can pin-point the port number and PID of the socket whose retransmitted packet triggered this. Other things like drbd would be iscsi, NFS, or anything that does TCP in the kernel. Hi Herbert, yes we are using drbd. We are no longer experiencing the issue and I'm reluctant to remove the workaround as the problem only occurred on production services. I would be happy to help if I can, though. Is there anything I can do in hindsight? Matt Based on this information my conclusion is that there is a bug in drbd where it frees pages that are still owned by the TCP socket. Since this looks like a drbd issue (see comment #18, and the common thread that all of the reported stacks are using drbd), and since we don't support drbd in RHEL-5, I'm going to close this as NOTABUG. If this can be reproduced without drbd, or someone finds other evidence to the contrary, please feel free to re-open the bug. Chris Lalancette *** Bug 666005 has been marked as a duplicate of this bug. *** |