Hide Forgot
Description of problem: Host kernel panic happened twice during launching 5.8 guest. I collected 3 vmcore files when panic happened for 3 times. Version-Release number of selected component (if applicable): kernel-2.6.18-308.el5.x86_64.rpm kvm-83-249.el5_8 How reproducible: not always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: will attach vmcore files later
(In reply to comment #0) > Description of problem: > Host kernel panic happened twice during launching 5.8 guest. I collected 3 > vmcore files when panic happened for 3 times. > > Version-Release number of selected component (if applicable): > kernel-2.6.18-308.el5.x86_64.rpm > kvm-83-249.el5_8 sorry, should be kvm-83-249.el5.x86_64.rpm
This issue is hardware specific, cause I can't reproduce this bug on another host with same cli and scenario. Reproducible host info: # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU W3520 @ 2.67GHz stepping : 5 cpu MHz : 1596.000 cache size : 8192 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm bogomips : 5333.49 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: [8] # dmidecode -s system-product-name HP Z400 Workstation
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<0000000000000000>] PGD 1be43b067 PUD 1be43c067 PMD 0 Oops: 0010 [1] SMP last sysfs file: /class/net/lo/ifindex CPU 1 Modules linked in: tun nls_utf8 loop autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy ksm(U) kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer sr_mod cdrom snd_page_alloc tg3 snd_hwdep i7core_edac sg snd edac_mc pcspkr serio_raw shpchp soundcore tpm_tis tpm tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 4395, comm: qemu-kvm Tainted: G ---- 2.6.18-308.el5 #1 RIP: 0010:[<0000000000000000>] [<0000000000000000>] RSP: 0018:ffff8101be293d90 EFLAGS: 00010246 RAX: ffff8101bea3ec68 RBX: ffff8101ed34b3c0 RCX: 0000000000000001 RDX: 0000000000000001 RSI: ffff8101bea3c000 RDI: ffff8101ed34b3c0 RBP: ffff8101bea3c000 R08: ffff8101be293d78 R09: 0000000000000000 R10: ffff81021fce4008 R11: ffffffff8841c65b R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000041790000 FS: 000000004178f940(0063) GS:ffff81021fc097c0(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000001be43a000 CR4: 00000000000026a0 Process qemu-kvm (pid: 4395, threadinfo ffff8101be292000, task ffff8102187c9080) list_add corruption. next->prev should be ffff81010774e4d0, but was 0000000000000000 Stack: ffffffff883fb6bc ffff81020f648460 ffff8101bea3c000 ffff8101da330000 0000000000000001 0000000000000010 ffffffff883fa192 ffff8101bdcd0040 ffff81021e2edac0 ffff8101bdcd0040 ffffffff883eaad4 fffffffe7ffbfeff Call Trace: [<ffffffff883fb6bc>] :kvm:kvm_set_irq+0x65/0xa3 [<ffffffff883fa192>] :kvm:kvm_inject_pit_timer_irqs+0x8c/0xd7 [<ffffffff883eaad4>] :kvm:kvm_arch_vcpu_ioctl_run+0x473/0x61e [<ffffffff883e5f95>] :kvm:kvm_vcpu_ioctl+0xf2/0x448 [<ffffffff8008ee72>] default_wake_function+0x0/0xe [<ffffffff80041ea0>] do_ioctl+0x21/0x6b [<ffffffff8002ff2d>] vfs_ioctl+0x457/0x4b9 [<ffffffff8004c26c>] sys_ioctl+0x59/0x78 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Examining the 'struct kvm' pointer in %rbp (copied from %rdi): crash> x/30x 0xffff8101bea3c000 0xffff8101bea3c000: 0x00000000 0x00000001 0xbea3c008 0xffff8101 0xffff8101bea3c010: 0xbea3c008 0xffff8101 0x00000001 0x00000001 0xffff8101bea3c020: 0x00000001 0x00000001 0xbea3c028 0xffff8101 0xffff8101bea3c030: 0xbea3c028 0xffff8101 0x1cbf9480 0xffff8102 0xffff8101bea3c040: 0x00000023 0x00000000 0x00000000 0x00000000 0xffff8101bea3c050: 0x000000a0 0x00000000 0x00000000 0x00000000 0xffff8101bea3c060: 0x100de000 0xffffc200 0x00000000 0x00000000 0xffff8101bea3c070: 0x100e0000 0xffffc200 Looks corrupted. Looks like the oops handler encountered it own corruption while showing the oops: list_add corruption. next->prev should be ffff81010774e4d0, but was 0000000000000000 In short, the machine is totally trashed.
Please try booting with the ftrace_dump_on_oops kernel parameter, a serial console or netconsole, and running qemu under trace-cmd: trace-cmd -p function -e kvm -b 100000 /usr/libexec/qemu-kvm ... and send the console output that results.
(In reply to comment #6) > Please try booting with the ftrace_dump_on_oops kernel parameter, a serial > console or netconsole, and running qemu under trace-cmd: > > trace-cmd -p function -e kvm -b 100000 /usr/libexec/qemu-kvm ... > > and send the console output that results. Hi Avi, The latest trace-cmd I could find in brew is trace-cmd-2.0-2.el5rt, and I indeed installed into my rhel5.8 host. But I don't see the tracing directory under /sys/kernel/debug after I mounted debugfs by mount -t debugfs nodev /sys/kernel/debug Any suggestion?
Sorry, tracing isn't supported under RHEL 5. I thought I posted a comment about it but it was for another bug. Are you running with ksm enabled? Please try disabling it.
(In reply to comment #8) > Sorry, tracing isn't supported under RHEL 5. I thought I posted a comment > about it but it was for another bug. > > Are you running with ksm enabled? Please try disabling it. Sorry for the late response. I remember I did not turn on ksm on my rhel5 host when crash happened. So far, I have kept VM running for several days, and no crash happens, seems it is not reproducible easily for me now.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Not reproducible, only happens on one machine; not opened by a customer. Closing. If it reproduces, please reopen.