Description of problem: If Xen Domain 0 Kernel or hypervisor crashes while CPUs is handling IRQs, the generated vmcore could not be analysed with bt -a command in Xen hypervisor mode. crash 4.0-7.2.3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. NOTE: stdin: not a tty GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... KERNEL: /boot/xen-syms-2.6.18-118.el5 DEBUGINFO: /usr/lib/debug/boot/xen-syms-2.6.18-118.el5.debug DUMPFILE: /var/crash/127.0.0.1-2008-10-13-04:05:45/vmcore CPUS: 8 DOMAINS: 4 UPTIME: 00:10:00 MACHINE: Intel(R) Xeon(TM) CPU 3.73GHz (3724 Mhz) MEMORY: 4 GB PCPU-ID: 6 PCPU: ffbeffb4 VCPU-ID: 6 VCPU: ffbc1080 (VCPU_RUNNING) DOMAIN-ID: 0 DOMAIN: ffbd8080 (DOMAIN_RUNNING) STATE: CRASH crash> bt -a PCPU: 0 VCPU: ffbc7080 bt: cannot resolve stack trace: #0 [ff1d3ebc] elf_core_save_regs at ff10a810 #1 [ff1d3ec4] common_interrupt at ff1222ed #2 [ff1d3ed0] do_nmi at ff1335bb #3 [ff1d3ef0] handle_nmi_mce at ff17442e #4 [ff1d3f24] csched_tick at ff110aa7 #5 [ff1d3f80] timer_softirq_action at ff1155d2 #6 [ff1d3fa0] do_softirq at ff1143fe #7 [ff1d3fb0] process_softirqs at ff173f61 bt: text symbols on stack: bt: invalid structure size: task_struct FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search() [ff1d3ebc] disable_local_APIC at ff11db75 [ff1d3ec0] crash_nmi_callback at ff13cc96 [ff1d3ec4] common_interrupt at ff1222f2 [ff1d3ed0] do_nmi at ff1335c1 [ff1d3ef0] handle_nmi_mce at ff174435 [ff1d3f18] csched_tick at ff110aa7 [ff1d3f80] timer_softirq_action at ff1155d4 [ff1d3fa0] do_softirq at ff114405 [ff1d3fb0] process_softirqs at ff173f66 [/usr/bin/crash] error trace: 81637af => 816450b => 810c544 => 813eebc 813eebc: SIZE_verify+126 810c544: (undetermined) 816450b: (undetermined) 81637af: lkcd_x86_back_trace+2370 bt: invalid structure size: task_struct FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search() Version-Release number of selected component (if applicable): crash-7.2.3 kernel-xen-2.6.18-118.el5 kernel-PAE-2.6.18-118.el5 How reproducible: always Steps to Reproduce: 1. configure Kdump on Xen with crashkernel=128M@32M 2. use jprobe to trigger BUG() in __do_IRQ(). 3. crash xen-syms vmcore 4. bt -a Actual results: See errors. Expected results: No error.
There is another failure, crash 4.0-7.2.3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. NOTE: stdin: not a tty GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... KERNEL: /boot/xen-syms-2.6.18-118.el5 DEBUGINFO: /usr/lib/debug/boot/xen-syms-2.6.18-118.el5.debug DUMPFILE: /var/crash/127.0.0.1-2008-10-10-07:08:55/vmcore CPUS: 8 DOMAINS: 4 UPTIME: 00:02:08 MACHINE: Dual-Core AMD Opteron(tm) Processor 8216 (2411 Mhz) MEMORY: 14 GB PCPU-ID: 0 PCPU: ff1d3fb4 VCPU-ID: 0 VCPU: ff2ab080 (VCPU_RUNNING) DOMAIN-ID: 0 DOMAIN: ff2bc080 (DOMAIN_RUNNING) STATE: CRASH crash> bt -a PCPU: 0 VCPU: ffbfc080 #0 [ff1d3f40] elf_core_save_regs at ff10a810 #1 [ff1d3f44] crash_nmi_callback at ff13cc91 #2 [ff1d3f54] do_nmi at ff1335bb #3 [ff1d3f74] handle_nmi_mce at ff17442e #4 [ff1d3fa8] idle_loop at ff11f975 PCPU: 1 VCPU: ff1b6080 #0 [ff1bff40] elf_core_save_regs at ff10a810 #1 [ff1bff44] crash_nmi_callback at ff13cc91 #2 [ff1bff54] do_nmi at ff1335bb #3 [ff1bff74] handle_nmi_mce at ff17442e #4 [ff1bffa8] idle_loop at ff11f975 PCPU: 2 VCPU: ff1cc080 #0 [ff1c7f60] elf_core_save_regs at ff10a810 #1 [ff1c7f64] kexec_crash at ff10abe0 #2 [ff1c7f74] do_crashdump_trigger at ff10b388 #3 [ff1c7f84] keypress_softirq at ff10a024 #4 [ff1c7f94] do_softirq at ff1143fe #5 [ff1c7fa4] idle_loop at ff11f975 PCPU: 3 VCPU: ff1b9080 #0 [ff1c3f40] elf_core_save_regs at ff10a810 #1 [ff1c3f44] crash_nmi_callback at ff13cc91 #2 [ff1c3f54] do_nmi at ff1335bb #3 [ff1c3f74] handle_nmi_mce at ff17442e #4 [ff1c3fa8] idle_loop at ff11f975 PCPU: 4 VCPU: ff23f080 #0 [ff23bf40] elf_core_save_regs at ff10a810 #1 [ff23bf44] crash_nmi_callback at ff13cc91 #2 [ff23bf54] do_nmi at ff1335bb #3 [ff23bf74] handle_nmi_mce at ff17442e #4 [ff23bfa8] idle_loop at ff11f975 PCPU: 5 VCPU: ff23d080 #0 [ff237f40] elf_core_save_regs at ff10a810 #1 [ff237f44] crash_nmi_callback at ff13cc91 #2 [ff237f54] do_nmi at ff1335bb #3 [ff237f74] handle_nmi_mce at ff17442e #4 [ff237fa8] idle_loop at ff11f975 PCPU: 6 VCPU: ff233080 #0 [ffbeff40] elf_core_save_regs at ff10a810 #1 [ffbeff44] crash_nmi_callback at ff13cc91 #2 [ffbeff54] do_nmi at ff1335bb #3 [ffbeff74] handle_nmi_mce at ff17442e #4 [ffbeffa8] idle_loop at ff11f975 PCPU: 7 VCPU: ff231080 bt: cannot resolve stack trace: #0 [ffbebebc] elf_core_save_regs at ff10a810 #1 [ffbebec0] crash_nmi_callback at ff13cc91 #2 [ffbebed0] do_nmi at ff1335bb #3 [ffbebef0] handle_nmi_mce at ff17442e #4 [ffbebf24] ns_read_reg at ff11cb0a #5 [ffbebf24] ns16550_interrupt at ff11cc79 #6 [ffbebf44] do_IRQ at ff1262c1 #7 [ffbebf74] common_interrupt at ff1222ed bt: text symbols on stack: bt: invalid structure size: task_struct FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search() [ffbebebc] disable_local_APIC at ff11db75 [ffbebec0] crash_nmi_callback at ff13cc96 [ffbebed0] do_nmi at ff1335c1 [ffbebef0] handle_nmi_mce at ff174435 [ffbebf18] ns_read_reg at ff11cb0a [ffbebf24] ns16550_interrupt at ff11cc7e [ffbebf44] do_IRQ at ff1262c3 [ffbebf74] common_interrupt at ff1222f2 [ffbebf9c] idle_loop at ff11f975 [/usr/bin/crash] error trace: 81637af => 816450b => 810c544 => 813eebc 813eebc: SIZE_verify+126 810c544: (undetermined) 816450b: (undetermined) 81637af: lkcd_x86_back_trace+2370 bt: invalid structure size: task_struct FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search()
Don't know if this is related, but I have seen crash even failed to analyse Xen Domain 0 Kernel, crash 4.0-7.2.3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. NOTE: stdin: not a tty GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/2.6.18-118.el5/vmlinux DUMPFILE: /var/crash/2008-10-13-06:05/vmcore CPUS: 1 DATE: Mon Oct 13 06:04:21 2008 UPTIME: 00:01:53 LOAD AVERAGE: 1.10, 0.50, 0.19 TASKS: 77 NODENAME: dellgx240.rhts.bos.redhat.com RELEASE: 2.6.18-118.el5 VERSION: #1 SMP Sat Oct 4 00:21:41 EDT 2008 MACHINE: i686 (1694 Mhz) MEMORY: 1 GB PANIC: "kernel BUG at /mnt/tests/kernel/kdump/crash-lkdtm/lkdtm/lkdtm.c:258!" PID: 330 COMMAND: "udevd" TASK: f7f50000 [THREAD_INFO: f7fad000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt -t PID: 330 TASK: f7f50000 CPU: 0 COMMAND: "udevd" START: crash_kexec at c0443d42 [c075dea8] __do_IRQ at c044e22c [c075dec0] lkdtm_handler at f89b40a4 [c075dedc] die at c04064cb [c075df04] do_invalid_op at c0406b98 [c075df0c] do_invalid_op at c0406c29 [c075df2c] lkdtm_handler at f89b40a4 [c075df48] release_console_sem at c042544c [c075df70] __do_IRQ at c044e22c [c075df74] kprobe_exceptions_notify at c06153a9 [c075df90] notifier_call_chain at c0615ebb [c075df9c] __do_IRQ at c044e22c [c075dfbc] error_code at c0405a89 [c075dfd0] __do_IRQ at c044e22c [c075dfe8] lkdtm_handler at f89b40a4 [c075dff8] jp_do_irq at f89b4148 [c075dffc] do_IRQ at c04074ce --- <hard IRQ> --- bt: invalid stack address for this task: f89b4148 (valid range: f7fad000 - f7fae000) START: do_IRQ at c0407435 crash> bt -r PID: 330 TASK: f7f50000 CPU: 0 COMMAND: "udevd" f7fad000: f7f50000 default_exec_domain 00000000 00000000 f7fad010: 00000000 00010000 c0000000 00b0e410 f7fad020: do_no_restart_syscall 00000000 00000000 00000000 f7fad030: 00000000 00000000 00000000 00000000 f7fad040: 00000000 00000000 00000000 00000000 f7fad050: 00000000 00000000 00000000 00000000 f7fad060: 00000000 00000000 00000000 00000000 f7fad070: 00000000 00000000 00000000 00000000 f7fad080: 00000000 00000000 00000000 00000000 f7fad090: 00000000 00000000 00000000 00000000 f7fad0a0: 00000000 00000000 00000000 00000000 f7fad0b0: 00000000 00000000 00000000 00000000 f7fad0c0: 00000000 00000000 00000000 00000000 f7fad0d0: 00000000 00000000 00000000 00000000 f7fad0e0: 00000000 00000000 00000000 00000000 f7fad0f0: 00000000 00000000 00000000 00000000 f7fad100: 00000000 00000000 00000000 00000000 f7fad110: 00000000 00000000 00000000 00000000 f7fad120: 00000000 00000000 00000000 00000000 f7fad130: 00000000 00000000 00000000 00000000 f7fad140: 00000000 00000000 00000000 00000000 f7fad150: 00000000 00000000 00000000 00000000 f7fad160: 00000000 00000000 00000000 00000000 f7fad170: 00000000 00000000 00000000 00000000 f7fad180: 00000000 00000000 00000000 00000000 f7fad190: 00000000 00000000 00000000 00000000 f7fad1a0: 00000000 00000000 00000000 00000000 f7fad1b0: 00000000 00000000 00000000 00000000 f7fad1c0: 00000000 00000000 00000000 00000000 f7fad1d0: 00000000 00000000 00000000 00000000 f7fad1e0: 00000000 00000000 00000000 00000000 f7fad1f0: 00000000 00000000 00000000 00000000 f7fad200: 00000000 00000000 00000000 00000000 f7fad210: 00000000 00000000 00000000 00000000 f7fad220: 00000000 00000000 00000000 00000000 f7fad230: 00000000 00000000 00000000 00000000 f7fad240: 00000000 00000000 00000000 00000000 f7fad250: 00000000 00000000 00000000 00000000 f7fad260: 00000000 00000000 00000000 00000000 f7fad270: 00000000 00000000 00000000 00000000 f7fad280: 00000000 00000000 00000000 00000000 f7fad290: 00000000 00000000 00000000 00000000 f7fad2a0: 00000000 00000000 00000000 00000000 f7fad2b0: 00000000 00000000 00000000 00000000 f7fad2c0: 00000000 00000000 00000000 00000000 f7fad2d0: 00000000 00000000 00000000 00000000 f7fad2e0: 00000000 00000000 00000000 00000000 f7fad2f0: 00000000 00000000 00000000 00000000 f7fad300: 00000000 00000000 00000000 00000000 f7fad310: 00000000 00000000 00000000 00000000 f7fad320: 00000000 00000000 00000000 00000000 f7fad330: 00000000 00000000 00000000 00000000 f7fad340: 00000000 00000000 00000000 00000000 f7fad350: 00000000 00000000 00000000 00000000 f7fad360: 00000000 00000000 00000000 00000000 f7fad370: 00000000 00000000 00000000 00000000 f7fad380: 00000000 00000000 00000000 00000000 f7fad390: 00000000 00000000 00000000 00000000 f7fad3a0: 00000000 00000000 00000000 00000000 f7fad3b0: 00000000 00000000 00000000 00000000 f7fad3c0: 00000000 00000000 00000000 00000000 f7fad3d0: 00000000 00000000 00000000 00000000 f7fad3e0: 00000000 00000000 00000000 00000000 f7fad3f0: 00000000 00000000 00000000 00000000 f7fad400: 00000000 00000000 00000000 00000000 f7fad410: 00000000 00000000 00000000 00000000 f7fad420: 00000000 00000000 00000000 00000000 f7fad430: 00000000 00000000 00000000 00000000 f7fad440: 00000000 00000000 00000000 00000000 f7fad450: 00000000 00000000 00000000 00000000 f7fad460: 00000000 00000000 00000000 00000000 f7fad470: 00000000 00000000 00000000 00000000 f7fad480: 00000000 00000000 00000000 00000000 f7fad490: 00000000 00000000 00000000 00000000 f7fad4a0: 00000000 00000000 00000000 00000000 f7fad4b0: 00000000 00000000 00000000 00000000 f7fad4c0: 00000000 00000000 00000000 00000000 f7fad4d0: 00000000 00000000 00000000 00000000 f7fad4e0: 00000000 00000000 00000000 00000000 f7fad4f0: 00000000 00000000 00000000 00000000 f7fad500: 00000000 00000000 00000000 00000000 f7fad510: 00000000 00000000 00000000 00000000 f7fad520: 00000000 00000000 00000000 00000000 f7fad530: 00000000 00000000 00000000 00000000 f7fad540: 00000000 00000000 00000000 00000000 f7fad550: 00000000 00000000 00000000 00000000 f7fad560: 00000000 00000000 00000000 00000000 f7fad570: 00000000 00000000 00000000 00000000 f7fad580: 00000000 00000000 00000000 00000000 f7fad590: 00000000 00000000 00000000 00000000 f7fad5a0: 00000000 00000000 00000000 00000000 f7fad5b0: 00000000 00000000 00000000 00000000 f7fad5c0: 00000000 00000000 00000000 00000000 f7fad5d0: 00000000 00000000 00000000 00000000 f7fad5e0: 00000000 00000000 00000000 00000000 f7fad5f0: 00000000 00000000 00000000 00000000 f7fad600: 00000000 00000000 00000000 00000000 f7fad610: 00000000 00000000 00000000 00000000 f7fad620: 00000000 00000000 00000000 00000000 f7fad630: 00000000 00000000 00000000 00000000 f7fad640: 00000000 00000000 00000000 00000000 f7fad650: 00000000 00000000 00000000 00000000 f7fad660: 00000000 00000000 00000000 00000000 f7fad670: 00000000 00000000 00000000 00000000 f7fad680: 00000000 00000000 00000000 00000000 f7fad690: 00000000 00000000 00000000 00000000 f7fad6a0: 00000000 00000000 00000000 00000000 f7fad6b0: 00000000 00000000 00000000 00000000 f7fad6c0: 00000000 00000000 00000000 00000000 f7fad6d0: 00000000 00000000 00000000 00000000 f7fad6e0: 00000000 00000000 00000000 00000000 f7fad6f0: 00000000 00000000 00000000 00000000 f7fad700: 00000000 00000000 00000000 00000000 f7fad710: 00000000 00000000 00000000 00000000 f7fad720: 00000000 00000000 00000000 00000000 f7fad730: 00000000 00000000 00000000 00000000 f7fad740: 00000000 00000000 00000000 00000000 f7fad750: 00000000 00000000 00000000 00000000 f7fad760: 00000000 00000000 00000000 00000000 f7fad770: 00000000 00000000 00000000 00000000 f7fad780: 00000000 00000000 00000000 00000000 f7fad790: 00000000 00000000 00000000 00000000 f7fad7a0: 00000000 00000000 00000000 00000000 f7fad7b0: 00000000 00000000 00000000 00000000 f7fad7c0: 00000000 00000000 00000000 00000000 f7fad7d0: 00000000 00000000 00000000 00000000 f7fad7e0: 00000000 00000000 00000000 00000000 f7fad7f0: 00000000 00000000 00000000 00000000 f7fad800: 00000000 00000000 00000000 00000000 f7fad810: 00000000 00000000 00000000 00000000 f7fad820: 00000000 00000000 00000000 00000000 f7fad830: 00000000 00000000 00000000 00000000 f7fad840: 00000000 00000000 00000000 00000000 f7fad850: 00000000 00000000 00000000 00000000 f7fad860: 00000000 00000000 00000000 00000000 f7fad870: 00000000 00000000 00000000 00000000 f7fad880: 00000000 00000000 00000000 00000000 f7fad890: 00000000 00000000 00000000 00000000 f7fad8a0: 00000000 00000000 00000000 00000000 f7fad8b0: 00000000 00000000 00000000 00000000 f7fad8c0: 00000000 00000000 00000000 00000000 f7fad8d0: 00000000 00000000 00000000 00000000 f7fad8e0: 00000000 00000000 00000000 00000000 f7fad8f0: 00000000 00000000 00000000 00000000 f7fad900: 00000000 00000000 00000000 00000000 f7fad910: 00000000 00000000 00000000 00000000 f7fad920: 00000000 00000000 00000000 00000000 f7fad930: 00000000 00000000 00000000 00000000 f7fad940: 00000000 00000000 00000000 00000000 f7fad950: 00000000 00000000 00000000 00000000 f7fad960: 00000000 00000000 00000000 00000000 f7fad970: 00000000 00000000 00000000 00000000 f7fad980: 00000000 00000000 00000000 00000000 f7fad990: 00000000 00000000 00000000 00000000 f7fad9a0: 00000000 00000000 00000000 00000000 f7fad9b0: 00000000 00000000 00000000 00000000 f7fad9c0: 00000000 00000000 00000000 00000000 f7fad9d0: 00000000 00000000 00000000 00000000 f7fad9e0: 00000000 00000000 00000000 00000000 f7fad9f0: 00000000 00000000 00000000 00000000 f7fada00: 00000000 00000000 00000000 00000000 f7fada10: 00000000 00000000 00000000 00000000 f7fada20: 00000000 00000000 00000000 00000000 f7fada30: 00000000 00000000 00000000 00000000 f7fada40: 00000000 00000000 00000000 00000000 f7fada50: 00000000 00000000 00000000 f7d86204 f7fada60: c9806de0 f7d86200 f7fadaf4 __next_cpu+18 f7fada70: 00000000 find_busiest_group+375 00000031 00000031 f7fada80: f7fadb58 00000000 c9807940 00000031 f7fada90: 00000000 f7d86200 00000000 00000280 f7fadaa0: 00000005 00000005 00000080 00000000 f7fadab0: 00000000 00000000 00000000 00000002 f7fadac0: 00000001 00000000 00000000 ffffffff f7fadad0: 00000000 00000000 ffffffff 00000000 f7fadae0: f7f50000 f4758700 c9803d00 f7f50000 f7fadaf0: f4758550 00000000 00400000 f741b040 f7fadb00: f7fadb68 schedule+2505 88b27e00 0000001a f7fadb10: f7fadb48 00000031 00000031 00000001 f7fadb20: f4758550 init_task 88b58f5e 0000001a f7fadb30: 0003115e 00000000 f7f5010c c9806de0 f7fadb40: 00000001 00000000 f7fadbf0 00000203 f7fadb50: ffffffff 00000000 00000000 7fffffff f7fadb60: f7fadc48 remove_wait_queue+22 f7fadc44 f7fadbe0 f7fadb70: 00000203 free_poll_entry+14 f7fadc44 poll_freewait+24 f7fadb80: 00000001 f7cc35c0 00000008 00000008 f7fadb90: do_select+942 f7fadfa0 f7fadf4c 00000000 f7fadba0: 00000008 f7fade5c f7fade60 f7fade64 f7fadbb0: f7fade50 f7fade54 f7fade58 000000b8 f7fadbc0: 00000000 00000000 000000b8 00000010 f7fadbd0: 00000000 00000000 00000001 00000082 f7fadbe0: __pollwait 00000000 00000000 f7f50000 f7fadbf0: 00000000 00000003 fffd27ae 00000029 f7fadc00: 00000050 00000100 00000029 f7f50000 f7fadc10: 00000000 c9806de0 f7f50000 00000000 f7fadc20: f7fadc4c 00000000 kprobe_exceptions_nb f7fadc4c f7fadc30: 0000000c notifier_call_chain+25 00000021 f7fadc74 f7fadc40: 00000021 00000021 do_nmi+163 f7fadc74 f7fadc50: __func__.18214+6516 00000021 do_IRQ+181 f7aebf9c f7fadc60: f7a645dc f7aebf9c constraint_expr_eval+934 00000000 f7fadc70: f7aebf84 f7a645c4 00000000 00000001 f7fadc80: 00000001 f7fadd34 f7fadd34 f7fadcc4 f7fadc90: c9908380 f7a446e0 f7fadd34 00000280 f7fadca0: context_struct_compute_av+548 c9908360 000728c0 f7aebf84 f7fadcb0: f7a645c4 f7cbf9a0 f7a32928 f7a312a8 f7fadcc0: 00000700 025606db 00070007 f7fadd34 f7fadcd0: f7a645c4 00000246 f7fadd28 00100000 f7fadce0: f7fadd68 avc_alloc_node+22 00072622 0000005b f7fadcf0: c96d3020 c96d3020 contig_page_data+9600 00000286 f7fadd00: 00000286 contig_page_data+9472 00000000 f7fadd2c f7fadd10: f7faddac 00000001 __pagevec_free+20 c96d3020 f7fadd20: contig_page_data+9472 release_pages+287 00000001 f7fadda8 f7fadd30: f7b299d0 00000000 00000000 f7fadd50 f7fadd40: 0000000e f7fadda8 00000000 f7fadda8 f7fadd50: 00000000 f7b299cc find_get_pages+37 0000000e f7fadd60: 00000000 f7fadda0 shmem_free_blocks+32 20080010 f7fadd70: f7b29924 f7b298c4 shmem_truncate_range+1555 00000000 f7fadd80: 00000000 00000096 f7a312a8 00000000 f7fadd90: 00000000 f7f50000 00000000 00000003 f7fadda0: fffd27af 00000000 kprobe_exceptions_nb f7faddcc f7faddb0: 0000000c notifier_call_chain+25 00000021 f7faddf4 f7faddc0: 00000021 00000021 do_nmi+163 f7faddf4 f7faddd0: __func__.18214+6516 00000021 00000002 f7faddf4 f7fadde0: f7a446e0 f7a44880 f7fadeb4 00000700 f7faddf0: common_interrupt+26 f7a446e0 00000013 00000000 f7fade00: f7a44880 f7fadeb4 00000700 08000000 f7fade10: f7fa007b 0000007b ffffffff context_struct_compute_av+262 f7fade20: 00000060 00000246 00041d25 f7a645c4 f7fade30: f7a645c4 c9aeeb20 f7a32928 f7a32928 f7fade40: 00000253 06db0232 00070004 f7fadeb4 f7fade50: f7a645c4 00000056 00000056 security_compute_av+152 f7fade60: 00200000 f7fadeb4 00040000 f7fadea8 f7fade70: 00000001 00000000 avc_cache+2636 avc_has_perm_noaudit+282 f7fade80: 00200000 f7fadeb4 00000056 00000056 f7fade90: 00000004 0000010e 00000004 0004ffff f7fadea0: 00000000 f7fadee0 f7b298c4 f7b298c4 f7fadeb0: shmem_swp_entry+42 00000000 ffffffff 00000000 f7fadec0: ffffffff 00000001 00000003 f7a61ac0 f7faded0: 00000001 f7ca1ac0 00000000 selinux_vm_enough_memory_mm+62 f7fadee0: 00200000 00000000 f7d7ce40 f7b298c4 f7fadef0: f7b298dc shmem_getpage+732 f7d77890 00000000 f7fadf00: f7fadf6c 00000000 f7b29924 01a61ac0 f7fadf10: f7b299cc f7fadf30 file_has_perm+127 f7fadf30 f7fadf20: 00000000 00000000 00000000 bffaff78 f7fadf30: 00000000 shmem_file_write+291 00000003 00000000 f7fadf40: f7b29998 f7b29924 00000000 00000000 f7fadf50: 00000000 00000004 48f31d25 3b35eb1e f7fadf60: 00000004 00000000 00000000 00000000 f7fadf70: f7b29924 f7f47ec0 shmem_file_write bffaff78 f7fadf80: 00000004 vfs_write+161 f7fadfa4 f7f47ec0 f7fadf90: fffffff7 0911d410 f7fad000 sys_write+60 f7fadfa0: f7fadfa4 00000000 00000000 00000000 f7fadfb0: 00000008 00000008 syscall_call+7 00000008 f7fadfc0: bffaff78 00000004 00000008 0911d410 f7fadfd0: bffaffa8 00000004 0000007b 0000007b f7fadfe0: 00000004 00b0e402 00000073 00000246 f7fadff0: bffafe44 0000007b 00000000 00000000 crash> bt -T PID: 330 TASK: f7f50000 CPU: 0 COMMAND: "udevd" [c075dbe4] notifier_call_chain at c0615ebb [c075dbf8] do_nmi at c04068a0 [c075dc20] nmi_stack_correct at c0405b2e [c075dc4c] serial_in at c054f115 [c075dc64] notifier_call_chain at c0615ebb [c075dc78] do_nmi at c04068a0 [c075dca0] nmi_stack_correct at c0405b2e [c075dccc] serial_in at c054f115 [c075dcd8] __delay at c04ea970 [c075dcdc] serial8250_console_putchar at c05516e6 [c075dcf0] uart_console_write at c054cb7b [c075dd18] serial8250_console_write at c0551035 [c075dd24] __call_console_drivers at c0425177 [c075dd4c] notifier_call_chain at c0615ebb [c075dd88] nmi_stack_correct at c0405b2e [c075ddb8] vsnprintf at c04ea4db [c075de40] __do_IRQ at c044e22c [c075de50] machine_kexec at c04199c5 [c075de6c] relocate_kernel at c041a000 [c075de94] crash_kexec at c0443d42 [c075dea8] __do_IRQ at c044e22c [c075dec0] lkdtm_handler at f89b40a4 [c075dedc] die at c04064cb [c075df04] do_invalid_op at c0406b98 [c075df0c] do_invalid_op at c0406c29 [c075df2c] lkdtm_handler at f89b40a4 [c075df48] release_console_sem at c042544c [c075df70] __do_IRQ at c044e22c [c075df74] kprobe_exceptions_notify at c06153a9 [c075df90] notifier_call_chain at c0615ebb [c075df9c] __do_IRQ at c044e22c [c075dfbc] error_code at c0405a89 [c075dfd0] __do_IRQ at c044e22c [c075dfe8] lkdtm_handler at f89b40a4 [c075dff8] jp_do_irq at f89b4148 [c075dffc] do_IRQ at c04074ce --- <hard IRQ> --- bt: invalid stack address for this task: f89b4148 (valid range: f7fad000 - f7fae000) [f7fada6c] __next_cpu at c04e6c9c [f7fada74] find_busiest_group at c041dfe3 [f7fadb04] schedule at c0613405 [f7fadb64] remove_wait_queue at c0435803 [f7fadb74] free_poll_entry at c0483996 [f7fadb7c] poll_freewait at c04839b6 [f7fadb90] do_select at c0484115 [f7fadbe0] __pollwait at c048465b [f7fadc34] notifier_call_chain at c0615ebb [f7fadc48] do_nmi at c04068a0 [f7fadc58] do_IRQ at c04074ea [f7fadc68] constraint_expr_eval at c04ce7d1 [f7fadca0] context_struct_compute_av at c04cea79 [f7fadce4] avc_alloc_node at c04c2367 [f7fadd18] __pagevec_free at c045baad [f7fadd24] release_pages at c045daf0 [f7fadd58] find_get_pages at c045a011 [f7fadd68] shmem_free_blocks at c046c925 [f7fadd78] shmem_truncate_range at c046d0f3 [f7faddb4] notifier_call_chain at c0615ebb [f7faddc8] do_nmi at c04068a0 [f7faddf0] common_interrupt at c0405946 [f7fade1c] context_struct_compute_av at c04ce95b [f7fade5c] security_compute_av at c04cf311 [f7fade7c] avc_has_perm_noaudit at c04c25bb [f7fadeb0] shmem_swp_entry at c046ca12 [f7fadedc] selinux_vm_enough_memory_mm at c04c33f5 [f7fadef4] shmem_getpage at c046dca1 [f7fadf18] file_has_perm at c04c38f7 [f7fadf34] shmem_file_write at c046e793 [f7fadf78] shmem_file_write at c046e670 [f7fadf84] vfs_write at c0473c7b [f7fadf9c] sys_write at c047426d [f7fadfb8] syscall_call at c0404f17 Dave, please let me know if you think we need to create a new BZ for this case and make the vmcore available for you.
(In reply to comment #3) > Don't know if this is related, but I have seen crash even failed to analyse Xen > Domain 0 Kernel, > Sorry, I mean I have seen crash even failed to analyse bare metal Kernel.
> Dave, please let me know if you think we need to create a new BZ for this > case and make the vmcore available for you. Yes, please file a new bugzilla for the vmlinux-related bug. Also, it appears that you are referencing 3 different vmcore files, so please make them all available. Also, when making xen hypervisor dumps available, please save all *three* files available instead of the two file like you have been doing. For example: > KERNEL: /boot/xen-syms-2.6.18-118.el5 > DEBUGINFO: /usr/lib/debug/boot/xen-syms-2.6.18-118.el5.debug > DUMPFILE: /var/crash/127.0.0.1-2008-10-10-07:08:55/vmcore Please copy the vmcore, xen-syms-2.6.18-118.el5 *and* the xen-syms-2.6.18-118.el5.debug to the saved directory.
(In reply to comment #5) > Yes, please file a new bugzilla for the vmlinux-related bug. > > Also, it appears that you are referencing 3 different vmcore files, > so please make them all available. > > Also, when making xen hypervisor dumps available, please save all *three* > files available instead of the two file like you have been doing. For > example: > > > KERNEL: /boot/xen-syms-2.6.18-118.el5 > > DEBUGINFO: /usr/lib/debug/boot/xen-syms-2.6.18-118.el5.debug > > DUMPFILE: /var/crash/127.0.0.1-2008-10-10-07:08:55/vmcore > > Please copy the vmcore, xen-syms-2.6.18-118.el5 *and* the > xen-syms-2.6.18-118.el5.debug to the saved directory. Done copying files. I'll file other bugs shortly.
(In reply to comment #5) > Also, it appears that you are referencing 3 different vmcore files, > so please make them all available. > The one in comment #2 was generated by a automated test, and I have not found a permanent place to save vmcores generated from those tests yet. In addition, I don't know how to reproduce that one, which need CPUs to run certain tasks while crashing. However, I could re-create other two vmcores reliably.
Also, I am not interested in any crash dump that was generated with a kprobe that in turn purposely generated the crash. The kprobe mechanism itself violates the architecture's calling convention, and therefore the crash utility backtrace code cannot follow the trace. Please do not file bugs against the crash backtrace capability when the crash was generated by a kprobe.
With respect to the "bt: invalid structure size: task_struct" seen on the xen-syms vmcore analysis, it is clear that the backtrace error-handling code is incorrectly using a vmlinux-related function that searches for exception frames. And that is because the backtrace runs into an assembly-language entry point that the upstream maintainers had apparently never seen. However, I don't know whether the only reason it is being seen in this case is because of the type of trap that was generated by the bogus kprobe operation done on the dom0 kernel or not. In any case, I have posted a suggested patch and a query to the upstream maintainers of the xen hypervisor parts of the crash utility: [Crash-utility] Question re: xen hypervisor backtrace problem https://www.redhat.com/archives/crash-utility/2008-October/msg00063.html
The upstream maintainer has agreed with my fixes. Here is his response, which also contains my full initial query: * From: Itsuro ODA <oda valinux co jp> * To: Dave Anderson <anderson redhat com> * Cc: crash-utility redhat com * Subject: [Crash-utility] Re: Question re: xen hypervisor backtrace problem * Date: Wed, 15 Oct 2008 08:22:39 +0900 Hi Dave, > Do you agree with these changes? Yes. Thank you. Itsuro Oda On Tue, 14 Oct 2008 16:30:18 -0400 (EDT) Dave Anderson <anderson redhat com> wrote: > > Hello Oda-san, > > I have a xen-syms vmcore that finds a path that the hypervisor-related > changes in lkcd_x86_trace.c cannot handle. When the back trace runs > into the "process_softirqs" text return address reference from > "xen/arch/x86/x86_32/entry.S", it cannot go any further. Therefore > the backtrace fails, and in the recovery code it incorrectly searches > for a (vmlinux) eframe: > > crash> bt -a > PCPU: 0 VCPU: ffbc7080 > bt: cannot resolve stack trace: > #0 [ff1d3ebc] elf_core_save_regs at ff10a810 > #1 [ff1d3ec4] common_interrupt at ff1222ed > #2 [ff1d3ed0] do_nmi at ff1335bb > #3 [ff1d3ef0] handle_nmi_mce at ff17442e > #4 [ff1d3f24] csched_tick at ff110aa7 > #5 [ff1d3f80] timer_softirq_action at ff1155d2 > #6 [ff1d3fa0] do_softirq at ff1143fe > #7 [ff1d3fb0] process_softirqs at ff173f61 > bt: text symbols on stack: > [ff1d3ebc] disable_local_APIC at ff11db75 > [ff1d3ec0] crash_nmi_callback at ff13cc96 > [ff1d3ec4] common_interrupt at ff1222f2 > [ff1d3ed0] do_nmi at ff1335c1 > [ff1d3ef0] handle_nmi_mce at ff174435 > [ff1d3f18] csched_tick at ff110aa7 > [ff1d3f80] timer_softirq_action at ff1155d4 > [ff1d3fa0] do_softirq at ff114405 > [ff1d3fb0] process_softirqs at ff173f66 > > bt: invalid structure size: task_struct > FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search() > > [/usr/bin/crash] error trace: 816373b => 8164497 => 810c40c => 813ed94 > > 813ed94: SIZE_verify+126 > 810c40c: x86_eframe_search+1075 > 8164497: handle_trace_error+692 > 816373b: lkcd_x86_back_trace+2370 > > bt: invalid structure size: task_struct > FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search() > > crash> > > Now, the bogus vmlinux eframe search can be avoided by doing this in > handle_trace_error(): > > --- lkcd_x86_trace.c.orig 2008-10-14 15:46:33.000000000 -0400 > +++ lkcd_x86_trace.c 2008-10-14 16:09:26.000000000 -0400 > @@ -2440,12 +2441,14 @@ handle_trace_error(struct bt_info *bt, i > bt->flags |= BT_TEXT_SYMBOLS_PRINT|BT_ERROR_MASK; > back_trace(bt); > > - bt->flags = BT_EFRAME_COUNT; > - if ((cnt = machdep->eframe_search(bt))) { > - error(INFO, "possible exception frame%s:\n", > - cnt > 1 ? "s" : ""); > - bt->flags &= ~(ulonglong)BT_EFRAME_COUNT; > - machdep->eframe_search(bt); > + if (!XEN_HYPER_MODE()) { > + bt->flags = BT_EFRAME_COUNT; > + if ((cnt = machdep->eframe_search(bt))) { > + error(INFO, "possible exception frame%s:\n", > + cnt > 1 ? "s" : ""); > + bt->flags &= ~(ulonglong)BT_EFRAME_COUNT; > + machdep->eframe_search(bt); > + } > } > } > > After doing the above, the bt -a shows this, and therefore does > not fail prematurely: > > crash> bt -a > PCPU: 0 VCPU: ffbc7080 > bt: cannot resolve stack trace: > #0 [ff1d3ebc] elf_core_save_regs at ff10a810 > #1 [ff1d3ec4] common_interrupt at ff1222ed > #2 [ff1d3ed0] do_nmi at ff1335bb > #3 [ff1d3ef0] handle_nmi_mce at ff17442e > #4 [ff1d3f24] csched_tick at ff110aa7 > #5 [ff1d3f80] timer_softirq_action at ff1155d2 > #6 [ff1d3fa0] do_softirq at ff1143fe > #7 [ff1d3fb0] process_softirqs at ff173f61 > bt: text symbols on stack: > [ff1d3ebc] disable_local_APIC at ff11db75 > [ff1d3ec0] crash_nmi_callback at ff13cc96 > [ff1d3ec4] common_interrupt at ff1222f2 > [ff1d3ed0] do_nmi at ff1335c1 > [ff1d3ef0] handle_nmi_mce at ff174435 > [ff1d3f18] csched_tick at ff110aa7 > [ff1d3f80] timer_softirq_action at ff1155d4 > [ff1d3fa0] do_softirq at ff114405 > [ff1d3fb0] process_softirqs at ff173f66 > > PCPU: 1 VCPU: ff1b6080 > ... > > Carrying it one step further, and given that the relevant part > of the stack from above looks like this: > > crash> rd -s ff1d3ebc 84 > ff1d3ebc: disable_local_APIC+5 crash_nmi_callback+38 common_interrupt+82 cpu0_stack+16076 > ff1d3ecc: 0003d027 do_nmi+49 cpu0_stack+16120 00000000 > ff1d3edc: ffbca000 ffbcbeb0 00000030 cpu0_stack+16308 > ff1d3eec: 0000e010 handle_nmi_mce+91 cpu0_stack+16120 00000100 > ff1d3efc: 00000005 000000ff 000005dc ffbdee88 > ff1d3f0c: 00000000 00000960 00020000 csched_tick+1239 > ff1d3f1c: 0000e008 00000083 ffbc7080 00000030 > ff1d3f2c: 0003d027 80000003 000583a8 per_cpu__schedule_data > ff1d3f3c: c840ceb2 00000000 ffbfda80 00000000 > ff1d3f4c: 00000000 00000000 00000100 00000960 > ff1d3f5c: ffbdee80 00000246 000000ff csched_priv+4 > ff1d3f6c: 00000000 ffbfda8c __per_cpu_data_end+54972 e4c5d8d9 > ff1d3f7c: 0000008b timer_softirq_action+132 00000000 ffbc7080 > ff1d3f8c: per_cpu__timers 00000000 cpu0_stack+16308 0000007b > ff1d3f9c: eaed7700 do_softirq+53 00000000 ffbc7080 > ff1d3fac: 0000007b process_softirqs+6 eb396d84 00000002 > ff1d3fbc: c0678470 c0678470 00000002 eaed7700 > ff1d3fcc: 00000000 000d0000 c04011a7 00000061 > ff1d3fdc: 00000202 eb396d48 00000069 0000007b > ff1d3fec: 0000007b 00000000 00000000 00000000 > ff1d3ffc: ffbc7080 ffffffff ffffffff ffffffff > crash> > > Clearly "process_softirqs" is the last text return address > reference that the backtrace code can work with. So to try > to clean up the backtrace, I added this: > > --- lkcd_x86_trace.c.orig 2008-10-14 15:46:33.000000000 -0400 > +++ lkcd_x86_trace.c 2008-10-14 16:09:26.000000000 -0400 > @@ -1423,6 +1423,7 @@ find_trace( > if (XEN_HYPER_MODE()) { > func_name = kl_funcname(pc); > if (STREQ(func_name, "idle_loop") || STREQ(func_name, "hypercall") > + || STREQ(func_name, "process_softirqs") > || STREQ(func_name, "tracing_off") > || STREQ(func_name, "handle_exception")) { > UPDATE_FRAME(func_name, pc, 0, sp, bp, asp, 0, 0, bp - sp, 0); > > which shows: > > crash> bt -a > PCPU: 0 VCPU: ffbc7080 > #0 [ff1d3ebc] elf_core_save_regs at ff10a810 > #1 [ff1d3ec4] common_interrupt at ff1222ed > #2 [ff1d3ed0] do_nmi at ff1335bb > #3 [ff1d3ef0] handle_nmi_mce at ff17442e > #4 [ff1d3f24] csched_tick at ff110aa7 > #5 [ff1d3f80] timer_softirq_action at ff1155d2 > #6 [ff1d3fa0] do_softirq at ff1143fe > #7 [ff1d3fb0] process_softirqs at ff173f61 > > PCPU: 1 VCPU: ff1b6080 > ... > > The patch to avoid eframe search can be avoided entirely by applying > the second patch, but it seems that it should be left in place for > other unforeseen possibilities in the future. > > Do you agree with these changes? > > Thanks, > Dave > -- Itsuro ODA <oda valinux co jp>
With this patch, the backtrace terminates with no complaints: --- lkcd_x86_trace.c.orig 2008-10-14 15:46:33.000000000 -0400 +++ lkcd_x86_trace.c 2008-10-14 16:09:26.000000000 -0400 @@ -1423,6 +1423,7 @@ if (XEN_HYPER_MODE()) { func_name = kl_funcname(pc); if (STREQ(func_name, "idle_loop") || STREQ(func_name, "hypercall") + || STREQ(func_name, "process_softirqs") || STREQ(func_name, "tracing_off") || STREQ(func_name, "handle_exception")) { UPDATE_FRAME(func_name, pc, 0, sp, bp, asp, 0, 0, bp - sp, 0); @@ -2440,12 +2441,14 @@ bt->flags |= BT_TEXT_SYMBOLS_PRINT|BT_ERROR_MASK; back_trace(bt); - bt->flags = BT_EFRAME_COUNT; - if ((cnt = machdep->eframe_search(bt))) { - error(INFO, "possible exception frame%s:\n", - cnt > 1 ? "s" : ""); - bt->flags &= ~(ulonglong)BT_EFRAME_COUNT; - machdep->eframe_search(bt); + if (!XEN_HYPER_MODE()) { + bt->flags = BT_EFRAME_COUNT; + if ((cnt = machdep->eframe_search(bt))) { + error(INFO, "possible exception frame%s:\n", + cnt > 1 ? "s" : ""); + bt->flags &= ~(ulonglong)BT_EFRAME_COUNT; + machdep->eframe_search(bt); + } } } With the patch above: crash> bt -a PCPU: 0 VCPU: ffbc7080 #0 [ff1d3ebc] elf_core_save_regs at ff10a810 #1 [ff1d3ec4] common_interrupt at ff1222ed #2 [ff1d3ed0] do_nmi at ff1335bb #3 [ff1d3ef0] handle_nmi_mce at ff17442e #4 [ff1d3f24] csched_tick at ff110aa7 #5 [ff1d3f80] timer_softirq_action at ff1155d2 #6 [ff1d3fa0] do_softirq at ff1143fe #7 [ff1d3fb0] process_softirqs at ff173f61 PCPU: 1 VCPU: ff1b6080 #0 [ff1bff40] elf_core_save_regs at ff10a810 #1 [ff1bff44] crash_nmi_callback at ff13cc91 #2 [ff1bff54] do_nmi at ff1335bb #3 [ff1bff74] handle_nmi_mce at ff17442e #4 [ff1bffa8] idle_loop at ff11f975 PCPU: 2 VCPU: ff1cc080 #0 [ff1c7f40] elf_core_save_regs at ff10a810 #1 [ff1c7f44] crash_nmi_callback at ff13cc91 #2 [ff1c7f54] do_nmi at ff1335bb #3 [ff1c7f74] handle_nmi_mce at ff17442e #4 [ff1c7fa8] idle_loop at ff11f975 PCPU: 3 VCPU: ffbc4080 #0 [ff1c3f78] elf_core_save_regs at ff10a810 #1 [ff1c3f7c] crash_nmi_callback at ff13cc91 #2 [ff1c3f8c] do_nmi at ff1335bb #3 [ff1c3fac] handle_nmi_mce at ff17442e PCPU: 4 VCPU: ffbc3080 #0 [ff23bedc] elf_core_save_regs at ff10a810 #1 [ff23bee0] crash_nmi_callback at ff13cc91 #2 [ff23bef0] do_nmi at ff1335bb #3 [ff23bf10] handle_nmi_mce at ff17442e #4 [ff23bf44] get_s_time at ff131cb4 #5 [ff23bf70] reprogram_timer at ff11de46 #6 [ff23bf80] timer_softirq_action at ff115619 #7 [ff23bfa0] do_softirq at ff1143fe #8 [ff23bfb0] process_softirqs at ff173f61 PCPU: 5 VCPU: ff23d080 #0 [ff237f40] elf_core_save_regs at ff10a810 #1 [ff237f44] crash_nmi_callback at ff13cc91 #2 [ff237f54] do_nmi at ff1335bb #3 [ff237f74] handle_nmi_mce at ff17442e #4 [ff237fa8] idle_loop at ff11f975 PCPU: 6 VCPU: ffbc1080 #0 [ffbefee4] elf_core_save_regs at ff10a810 #1 [ffbefee8] kexec_crash at ff10abe0 #2 [ffbefef8] do_kexec_op at ff10adbf #3 [ffbeff98] hypercall at ff173efb PCPU: 7 VCPU: ff231080 #0 [ffbebf40] elf_core_save_regs at ff10a810 #1 [ffbebf44] crash_nmi_callback at ff13cc91 #2 [ffbebf54] do_nmi at ff1335bb #3 [ffbebf74] handle_nmi_mce at ff17442e #4 [ffbebfa8] idle_loop at ff11f975 crash>
(In reply to comment #9) > With respect to the "bt: invalid structure size: task_struct" seen on > the xen-syms vmcore analysis, it is clear that the backtrace error-handling > code is incorrectly using a vmlinux-related function that searches for > exception frames. And that is because the backtrace runs into an > assembly-language entry point that the upstream maintainers had > apparently never seen. However, I don't know whether the only reason it is > being seen in this case is because of the type of trap that was generated > by the bogus kprobe operation done on the dom0 kernel or not. > It is not. The same error happens even without using jprobe(). crash 4.0-7.2.3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. NOTE: stdin: not a tty GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... KERNEL: /boot/xen-syms-2.6.18-120.el5 DEBUGINFO: /usr/lib/debug/boot/xen-syms-2.6.18-120.el5.debug DUMPFILE: /var/crash/127.0.0.1-2008-10-22-03:10:54/vmcore CPUS: 8 DOMAINS: 4 UPTIME: 00:01:33 MACHINE: Intel(R) Xeon(R) CPU E5450 @ 3.00GHz (3000 Mhz) MEMORY: 4 GB PCPU-ID: 0 PCPU: ff1d3fb4 VCPU-ID: 0 VCPU: ffbd0080 (VCPU_RUNNING) DOMAIN-ID: 0 DOMAIN: ffbdc080 (DOMAIN_RUNNING) STATE: CRASH crash> bt PCPU: 0 VCPU: ffbd0080 #0 [ff1d3ee4] elf_core_save_regs at ff10a810 #1 [ff1d3ee8] kexec_crash at ff10abe0 #2 [ff1d3ef8] do_kexec_op at ff10adbf #3 [ff1d3f98] hypercall at ff173efb crash> bt -a PCPU: 0 VCPU: ffbd0080 #0 [ff1d3ee4] elf_core_save_regs at ff10a810 #1 [ff1d3ee8] kexec_crash at ff10abe0 #2 [ff1d3ef8] do_kexec_op at ff10adbf #3 [ff1d3f98] hypercall at ff173efb PCPU: 1 VCPU: ff1b7080 #0 [ffbf3f40] elf_core_save_regs at ff10a810 #1 [ffbf3f44] crash_nmi_callback at ff13cc91 #2 [ffbf3f54] do_nmi at ff1335bb #3 [ffbf3f74] handle_nmi_mce at ff17442e #4 [ffbf3fa8] idle_loop at ff11f975 PCPU: 2 VCPU: ff1cd080 #0 [ff1bff40] elf_core_save_regs at ff10a810 #1 [ff1bff44] crash_nmi_callback at ff13cc91 #2 [ff1bff54] do_nmi at ff1335bb #3 [ff1bff74] handle_nmi_mce at ff17442e #4 [ff1bffa8] idle_loop at ff11f975 PCPU: 3 VCPU: ff1ba080 #0 [ff1c7f40] elf_core_save_regs at ff10a810 #1 [ff1c7f44] crash_nmi_callback at ff13cc91 #2 [ff1c7f54] do_nmi at ff1335bb #3 [ff1c7f74] handle_nmi_mce at ff17442e #4 [ff1c7fa8] idle_loop at ff11f975 PCPU: 4 VCPU: ffbc8080 bt: cannot resolve stack trace: #0 [ff1c3eac] elf_core_save_regs at ff10a810 #1 [ff1c3eb0] crash_nmi_callback at ff13cc91 #2 [ff1c3ec0] do_nmi at ff1335bb #3 [ff1c3ee0] handle_nmi_mce at ff17442e #4 [ff1c3f14] read_platform_stime at ff131a08 #5 [ff1c3f20] local_time_calibration at ff1325c2 #6 [ff1c3f80] timer_softirq_action at ff1155d2 #7 [ff1c3fa0] do_softirq at ff1143fe #8 [ff1c3fb0] process_softirqs at ff173f61 bt: text symbols on stack: bt: invalid structure size: task_struct FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search() [ff1c3eac] disable_local_APIC at ff11db75 [ff1c3eb0] crash_nmi_callback at ff13cc96 [ff1c3ec0] do_nmi at ff1335c1 [ff1c3ee0] handle_nmi_mce at ff174435 [ff1c3f08] read_platform_stime at ff131a08 [ff1c3f20] local_time_calibration at ff1325c7 [ff1c3f80] timer_softirq_action at ff1155d4 [ff1c3fa0] do_softirq at ff114405 [ff1c3fb0] process_softirqs at ff173f66 [/usr/bin/crash] error trace: 81637af => 816450b => 810c544 => 813eebc 813eebc: SIZE_verify+126 810c544: (undetermined) 816450b: (undetermined) 81637af: lkcd_x86_back_trace+2370 bt: invalid structure size: task_struct FILE: x86.c LINE: 1576 FUNCTION: x86_eframe_search() crash> bt -f PCPU: 0 VCPU: ffbd0080 #0 [ff1d3ee4] elf_core_save_regs at ff10a810 [RA: ff10abe5 SP: ff1d3ee4 FP: ff1d3ee8 SIZE: 8] ff1d3ee4: 0240a498 ff10abe5 #1 [ff1d3ee8] kexec_crash at ff10abe0 [RA: ff10adc4 SP: ff1d3eec FP: ff1d3ef8 SIZE: 16] ff1d3eec: ff1dfe58 c0993e30 00000002 ff10adc4 #2 [ff1d3ef8] do_kexec_op at ff10adbf [RA: ff173f02 SP: ff1d3efc FP: ff1d3f98 SIZE: 160] ff1d3efc: ff1d3f30 c0993e30 00000004 3a530020 ff1d3f0c: ffffffda ffffffda 0000007b ff1050ab ff1d3f1c: 0000000d c0993cd8 00000004 3063005d ff1d3f2c: 30333939 00000001 3d6b7361 32313163 ff1d3f3c: 30303064 73617420 69742e6b 3930633d ff1d3f4c: 30303339 00002930 ffbd0080 ffbdc080 ff1d3f5c: 00000000 0b80ffff 1da4c067 00000000 ff1d3f6c: ff1ae024 00000000 00000007 04a640db ff1d3f7c: 0000000d 00000005 00000002 ffbd0080 ff1d3f8c: 0000007b 0000007b 00000000 ff173f02 #3 [ff1d3f98] hypercall at ff173efb [RA: 0 SP: ff1d3f9c FP: ff1d3fd0 SIZE: 52] ff1d3f9c: 00000000 c0993e30 c75bec14 c0993f68 ff1d3fac: c0993e78 00000000 00000000 c0993e30 ff1d3fbc: c75bec14 c0993f68 c0993e78 00000000 ff1d3fcc: 00000025
(In reply to comment #12) > (In reply to comment #9) > > With respect to the "bt: invalid structure size: task_struct" seen on > > the xen-syms vmcore analysis, it is clear that the backtrace error-handling > > code is incorrectly using a vmlinux-related function that searches for > > exception frames. And that is because the backtrace runs into an > > assembly-language entry point that the upstream maintainers had > > apparently never seen. However, I don't know whether the only reason it is > > being seen in this case is because of the type of trap that was generated > > by the bogus kprobe operation done on the dom0 kernel or not. > > > > It is not. The same error happens even without using jprobe(). Right -- it has nothing to do with kprobes/jprobes, it's just a matter of not "stopping" the hypervisor backtrace at the "process_softirqs" entry point. This was the discsussion on the crash utility mailing list: https://www.redhat.com/archives/crash-utility/2008-October/msg00063.html
Cai, Do you think that this BZ is necessary for RHEL5.3? (i.e., requiring a respin of the current crash package errata). And if so, why is it any more important than these others that you filed?: https://bugzilla.redhat.com/show_bug.cgi?id=462819 (rhel5.4.0 +) https://bugzilla.redhat.com/show_bug.cgi?id=464116 (no flags) https://bugzilla.redhat.com/show_bug.cgi?id=464288 (no flags) https://bugzilla.redhat.com/show_bug.cgi?id=466797 (no flags) Seems like they all could be deferred to RHEL5.4. Thanks, Dave
OK. I am fine to defer those to RHEL5.4, since they are either Xen Domain 0 related or look like a few corner cases. I'll adjust the flag.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When run on a Xen hypervisor in which the backtrace leads to either "process_softirqs" or "page_fault", the "bt" command backtrace would indicate: "bt: cannot resolve stack trace". The recovery code would then terminate the command with the nonsensical error message: "bt: invalid structure size: task_struct". The command now properly terminates the backtrace.
Release note looks fine.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1283.html