Hide Forgot
Description of problem:Trying to run openstack inside a KVM VM, but the ssh key insertion into a new VM image fails. I then ran guestfs-test-tool and that fails as well. the following error is shown at the end of the run: KVM internal error. Suberror: 2 I am running this inside a KVM VM on top of Fedora 19 (up to date). Inside the guest, RHEL6.4 and openstack is running (up to date, kernel ). The libguestfs-test-tool is also failing when using a 'plain' centos 6 VM with the latest updates Version-Release number of selected component (if applicable): # rpm -q libguestfs libguestfs-1.16.34-2.el6.x86_64 How reproducible: every time Steps to Reproduce: 1. run libguestfs-test-tool 2. 3. Actual results: failure KVM internal error. Suberror: 2 Expected results: success Additional info: # libguestfs-test-tool ************************************************************ * IMPORTANT NOTICE * * When reporting bugs, include the COMPLETE, UNEDITED * output below in your bug report. * ************************************************************ ===== Test starts here ===== PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin library version: 1.16.34rhel=6,release=2.el6 guestfs_get_append: (null) guestfs_get_attach_method: appliance guestfs_get_autosync: 1 guestfs_get_direct: 0 guestfs_get_memsize: 500 guestfs_get_network: 0 guestfs_get_path: /usr/lib64/guestfs guestfs_get_pgroup: 0 guestfs_get_qemu: /usr/libexec/qemu-kvm guestfs_get_recovery_proc: 1 guestfs_get_selinux: 0 guestfs_get_smp: 1 guestfs_get_trace: 0 guestfs_get_verbose: 1 host_cpu: x86_64 Launching appliance, timeout set to 600 seconds. libguestfs: [00000ms] febootstrap-supermin-helper --verbose -f checksum '/usr/lib64/guestfs/supermin.d' x86_64 supermin helper [00000ms] whitelist = (not specified), host_cpu = x86_64, kernel = (null), initrd = (null), appliance = (null) supermin helper [00000ms] inputs[0] = /usr/lib64/guestfs/supermin.d checking modpath /lib/modules/2.6.32-358.el6.x86_64 is a directory picked vmlinuz-2.6.32-358.el6.x86_64 because modpath /lib/modules/2.6.32-358.el6.x86_64 exists checking modpath /lib/modules/2.6.32-358.114.1.openstack.el6.x86_64 is a directory picked vmlinuz-2.6.32-358.114.1.openstack.el6.x86_64 because modpath /lib/modules/2.6.32-358.114.1.openstack.el6.x86_64 exists supermin helper [00001ms] finished creating kernel supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/base.img supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/daemon.img supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/hostfiles supermin helper [00033ms] visiting /usr/lib64/guestfs/supermin.d/init.img supermin helper [00034ms] adding kernel modules supermin helper [00090ms] finished creating appliance libguestfs: [00095ms] begin testing qemu features libguestfs: [00109ms] finished testing qemu features libguestfs: accept_from_daemon: 0x1d2a690 g->state = 1 [00109ms] /usr/libexec/qemu-kvm \ -global virtio-blk-pci.scsi=off \ -nodefconfig \ -nodefaults \ -nographic \ -drive file=/tmp/libguestfs-test-tool-sda-uJwNbR,cache=none,format=raw,if=virtio \ -nodefconfig \ -machine accel=kvm:tcg \ -m 500 \ -no-reboot \ -device virtio-serial \ -serial stdio \ -device sga \ -chardev socket,path=/tmp/libguestfsvnx5Fm/guestfsd.sock,id=channel0 \ -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ -kernel /var/tmp/.guestfs-0/kernel.19637 \ -initrd /var/tmp/.guestfs-0/initrd.19637 \ -append 'panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color ' \ -drive file=/var/tmp/.guestfs-0/root.19637,snapshot=on,if=virtio,cache=unsafe\x1b[1;256r\x1b[256;256H\x1b[6n Google, Inc. Serial Graphics Adapter 07/26/11 SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild.redhat.com) Tue Jul 26 15:05:08 UTC 2011 Term: 80x24 4 0 SeaBIOS (version seabios-0.6.1.2-26.el6) Probing EDD (edd=off to disable)... ok \x1b[2JInitializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.32-358.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:47:41 EST 2013 Command line: panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls Disabled fast string operations BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009b800 (usable) BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001f3fd000 (usable) BIOS-e820: 000000001f3fd000 - 000000001f400000 (reserved) BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved) DMI 2.4 present. SMBIOS version 2.4 @ 0xFDA30 last_pfn = 0x1f3fd max_arch_pfn = 0x400000000 PAT not supported by CPU. init_memory_mapping: 0000000000000000-000000001f3fd000 RAMDISK: 1f1ab000 - 1f3ef000 No NUMA configuration found Faking a node at 0000000000000000-000000001f3fd000 Bootmem setup node 0 0000000000000000-000000001f3fd000 NODE_DATA [0000000000009000 - 000000000003cfff] bootmap [000000000003d000 - 0000000000040e7f] pages 4 (7 early reservations) ==> bootmem [0000000000 - 001f3fd000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] #2 [0001000000 - 000201b0a4] TEXT DATA BSS ==> [0001000000 - 000201b0a4] #3 [001f1ab000 - 001f3ef000] RAMDISK ==> [001f1ab000 - 001f3ef000] #4 [000009b800 - 0000100000] BIOS reserved ==> [000009b800 - 0000100000] #5 [000201c000 - 000201c059] BRK ==> [000201c000 - 000201c059] #6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 - 0000009000] found SMP MP-table at [ffff8800000fda50] fda50 kvm-clock: Using msrs 4b564d01 and 4b564d00 kvm-clock: cpu 0, msr 0:1c25681, boot clock Zone PFN ranges: DMA 0x00000001 -> 0x00001000 DMA32 0x00001000 -> 0x00100000 Normal 0x00100000 -> 0x00100000 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0: 0x00000001 -> 0x0000009b 0: 0x00000100 -> 0x0001f3fd SFI: Simple Firmware Interface v0.7 http://simplefirmware.org Intel MultiProcessor Specification v1.4 MPTABLE: OEM ID: BOCHSCPU MPTABLE: Product ID: 0.1 MPTABLE: APIC at: 0xFEE00000 Processor #0 (Bootup-CPU) I/O APIC #0 Version 17 at 0xFEC00000. Processors: 1 SMP: Allowing 1 CPUs, 0 hotplug CPUs PM: Registered nosave memory: 000000000009b000 - 000000000009c000 PM: Registered nosave memory: 000000000009c000 - 00000000000a0000 PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000 PM: Registered nosave memory: 00000000000f0000 - 0000000000100000 Allocating PCI resources starting at 1f400000 (gap: 1f400000:e0bbc000) Booting paravirtualized kernel on KVM NR_CPUS:4096 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1 PERCPU: Embedded 31 pages/cpu @ffff880002200000 s94552 r8192 d24232 u2097152 pcpu-alloc: s94552 r8192 d24232 u2097152 alloc=1*2097152 pcpu-alloc: [0] 0 kvm-clock: cpu 0, msr 0:2216681, primary cpu clock kvm-stealtime: cpu 0, msr 220e840 Built 1 zonelists in Node order, mobility grouping on. Total pages: 126041 Policy zone: DMA32 Kernel command line: panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color [ 0.000000] Disabling memory control group subsystem [ 0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes) [ 0.000000] Checking aperture... [ 0.000000] No AGP bridge found [ 0.000000] Memory: 483960k/511988k available (5220k kernel code, 408k absent, 27620k reserved, 7121k data, 1264k init) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] NR_IRQS:33024 nr_irqs:256 [ 0.000000] Console: colour dummy device 80x25 [ 0.000000] console [ttyS0] enabled [ 0.000000] Detected 2194.974 MHz processor. [ 0.003999] Calibrating delay loop (skipped) preset value.. 4389.94 BogoMIPS (lpj=2194974) [ 0.007020] pid_max: default: 32768 minimum: 301 [ 0.009204] Security Framework initialized [ 0.011061] SELinux: Disabled at boot. [ 0.014302] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.019756] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) [ 0.022686] Mount-cache hash table entries: 256 [ 0.026447] Initializing cgroup subsys ns [ 0.028017] Initializing cgroup subsys cpuacct [ 0.030021] Initializing cgroup subsys memory [ 0.032131] Initializing cgroup subsys devices [ 0.034021] Initializing cgroup subsys freezer [ 0.036020] Initializing cgroup subsys net_cls [ 0.039018] Initializing cgroup subsys blkio [ 0.041093] Initializing cgroup subsys perf_event [ 0.043027] Initializing cgroup subsys net_prio [ 0.046592] Disabled fast string operations [ 0.053609] mce: CPU supports 10 MCE banks [ 0.055700] alternatives: switching to unfair spinlock [ 0.102457] SMP alternatives: switching to UP code [ 1.099466] Freeing SMP alternatives: 35k freed [ 1.101016] ftrace: converting mcount calls to 0f 1f 44 00 00 [ 1.102945] ftrace: allocating 21428 entries in 85 pages [ 1.109078] Setting APIC routing to flat [ 1.126227] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 1.129858] CPU0: Intel QEMU Virtual CPU version (cpu64-rhel6) stepping 03 KVM internal error. Suberror: 2 extra data[0]: 80000202 extra data[1]: 80000202 rax ffffffff81a96e20 rbx 00000000005bc312 rcx 0000000000000064 rdx 0000000000000001 rsi 000000000000eae9 rdi 0000000000000046 rsp ffff88001e9bbe40 rbp ffff88001e9bbe80 r8 ffffffff81c07720 r9 0000000000000000 r10 0000000000000000 r11 0000000000000003 r12 ffff88000220e0e0 r13 00000000ffffffff r14 ffff880002200000 r15 0000000000000000 rip ffffffff81c392a7 rflags 00000283 cs 0010 (00000000/ffffffff p 1 dpl 0 db 0 s 1 type b l 1 g 1 avl 0) ds 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0) es 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0) ss 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0) fs 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gs 0000 (ffff880002200000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) tr 0040 (ffff880002214280/00002087 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gdt ffff880002204000/7f idt ffffffff81dde000/fff cr0 8005003b cr2 0 cr3 1a85000 cr4 6f0 cr8 0 efer d01 ^C
I've never seen an error anything like this. Is "KVM internal error. Suberror: 2 [etc]" printed out by the appliance kernel or by the /usr/libexec/qemu-kvm process?
We hits some similar error during system_reset guest, but not always reproduced. Bug 1002794 - KVM internal error. Suberror: 1 when doing system_reset
(In reply to Richard W.M. Jones from comment #1) > I've never seen an error anything like this. Is > "KVM internal error. Suberror: 2 [etc]" printed out by the > appliance kernel or by the /usr/libexec/qemu-kvm process? qemu outputs the error, but it does so due to kvm returning KVM_EXIT_INTERNAL_ERROR for its exit reason. Unfortunately there are many reasons this exit reason could be returned. We need to identify a reliable way to reproduce this, and then trace kvm while reproducing it. It appears to reproduce 100% for the reporter, so maybe it's machine-specific? Klaas, can you please paste the output of /proc/cpuinfo here?
Here is the cpuinfo of the machine I'm running libguestfs in: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel Xeon E312xx (Sandy Bridge) stepping : 1 cpu MHz : 2195.016 cache size : 4096 KB fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 4390.03 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: And this is of it's host (maybe that is relevant as well: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz stepping : 9 microcode : 0x19 cpu MHz : 2574.000 cache size : 6144 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 4389.80 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
Is there anything printed by the kernel (dmesg) when the error occurs? You could also use an alternate qemu, eg. one compiled from upstream sources, and just set LIBGUESTFS_QEMU to point to the alternate qemu. export LIBGUESTFS_QEMU=/path/to/d/qemu/x86_64-softmmu/qemu-system-x86_64 libguestfs-test-tool
No messages are printed by the kernel on host or guest. I have tried with both 1.5.3 and 1.6.0 versions of qemu and they both fail in similar way. When the problem happens the qemu process seems to be stuck and so I killed it with a sigsegv to get a core dump of it. It shows the following stacktrace: Core was generated by `libguestfs-test-tool'. Program terminated with signal 11, Segmentation fault. #0 0x00007f2b730a7513 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) Missing separate debuginfos, use: debuginfo-install libidn-1.18-2.el6.x86_64 yajl-1.0.7-3.el6.x86_64 (gdb) bt #0 0x00007f2b730a7513 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007f2b735eb358 in guestfs___recv_from_daemon (g=0x8f0620, size_rtn=0x7fffa6894f2c, buf_rtn=0x7fffa6894ef0) at proto.c:584 #2 0x00007f2b735e7dd4 in launch_appliance (g=0x8f0620) at launch.c:967 #3 0x00007f2b7359113c in guestfs_launch (g=<value optimized out>) at actions.c:1123 #4 0x000000000040210d in ?? () #5 0x0000007c00000001 in ?? () #6 0x00000000004022f0 in ?? () #7 0x0000000000000000 in ?? () I am planning on also trying a later version of qemu on the host to see if that makes a difference.
Using the latest version of qemu (1.5.3) on the host also does not seem to make difference. libguestfs-test-tool is still failing consistently inside the client.
(In reply to klaas.buist from comment #7) > No messages are printed by the kernel on host or guest. > > I have tried with both 1.5.3 and 1.6.0 versions of qemu and they both fail > in similar way. When the problem happens the qemu process seems to be stuck > and so I killed it with a sigsegv to get a core dump of it. > It shows the following stacktrace: > > Core was generated by `libguestfs-test-tool'. > Program terminated with signal 11, Segmentation fault. > #0 0x00007f2b730a7513 in __select_nocancel () at > ../sysdeps/unix/syscall-template.S:82 > 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) > Missing separate debuginfos, use: debuginfo-install libidn-1.18-2.el6.x86_64 > yajl-1.0.7-3.el6.x86_64 > (gdb) bt > #0 0x00007f2b730a7513 in __select_nocancel () at > ../sysdeps/unix/syscall-template.S:82 > #1 0x00007f2b735eb358 in guestfs___recv_from_daemon (g=0x8f0620, > size_rtn=0x7fffa6894f2c, buf_rtn=0x7fffa6894ef0) at proto.c:584 > #2 0x00007f2b735e7dd4 in launch_appliance (g=0x8f0620) at launch.c:967 > #3 0x00007f2b7359113c in guestfs_launch (g=<value optimized out>) at > actions.c:1123 > #4 0x000000000040210d in ?? () > #5 0x0000007c00000001 in ?? () > #6 0x00000000004022f0 in ?? () > #7 0x0000000000000000 in ?? () That's the stack trace of libguestfs-test-tool which isn't really telling us anything -- it just says that libguestfs is blocked waiting for an answer from qemu. You need to get a stack trace from qemu itself.
Ahh, here it is, unfortunately it does not show much info yet, even though the executable is not stripped. # gdb --core=core.1664 --exec=/usr/local/bin/qemu-system-x86_64 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. warning: core file may not match specified executable file. [New Thread 1664] [New Thread 1667] Missing separate debuginfo for Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/05/4c5697ea4022cf320747aabbf8120fe1246ff6 Reading symbols from /lib64/librt-2.12.so...Reading symbols from /usr/lib/debug/lib64/librt-2.12.so.debug...done. done. Loaded symbols for /lib64/librt-2.12.so Reading symbols from /lib64/libgthread-2.0.so.0.2200.5...Reading symbols from /usr/lib/debug/lib64/libgthread-2.0.so.0.2200.5.debug...done. done. Loaded symbols for /lib64/libgthread-2.0.so.0.2200.5 Reading symbols from /lib64/libglib-2.0.so.0.2200.5...Reading symbols from /usr/lib/debug/lib64/libglib-2.0.so.0.2200.5.debug...done. done. Loaded symbols for /lib64/libglib-2.0.so.0.2200.5 Reading symbols from /lib64/libutil-2.12.so...Reading symbols from /usr/lib/debug/lib64/libutil-2.12.so.debug...done. done. Loaded symbols for /lib64/libutil-2.12.so Reading symbols from /lib64/libz.so.1.2.3...Reading symbols from /usr/lib/debug/lib64/libz.so.1.2.3.debug...done. done. Loaded symbols for /lib64/libz.so.1.2.3 Reading symbols from /lib64/libm-2.12.so...Reading symbols from /usr/lib/debug/lib64/libm-2.12.so.debug...done. done. Loaded symbols for /lib64/libm-2.12.so Reading symbols from /lib64/libpthread-2.12.so...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done. [Thread debugging using libthread_db enabled] done. Loaded symbols for /lib64/libpthread-2.12.so Reading symbols from /lib64/libc-2.12.so...Reading symbols from /usr/lib/debug/lib64/libc-2.12.so.debug...done. done. Loaded symbols for /lib64/libc-2.12.so Reading symbols from /lib64/ld-2.12.so...Reading symbols from /usr/lib/debug/lib64/ld-2.12.so.debug...done. done. Loaded symbols for /lib64/ld-2.12.so Core was generated by `/usr/local/bin/qemu-system-x86_64 -global virtio-blk-pci.scsi=off -nodefconfig'. Program terminated with signal 11, Segmentation fault. #0 0x00007f26a7f26293 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 87 int result = INLINE_SYSCALL (poll, 3, CHECK_N (fds, nfds), nfds, timeout); (gdb) bt #0 0x00007f26a7f26293 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f26a95b41f6 in ?? () #2 0x0000001900000003 in ?? () #3 0xffffffff00000000 in ?? () #4 0x0000001968d41e60 in ?? () #5 0xe0869fe4664878a2 in ?? () #6 0x00007fff68d41e60 in ?? () #7 0x00007f26a95b4299 in ?? () #8 0x00007fff68d41e60 in ?? () #9 0x00000000a9638e63 in ?? () #10 0x00000002ffffffff in ?? () #11 0xe0869fe4664878a2 in ?? () #12 0x00007fff68d41e80 in ?? () #13 0x00007f26a9638ed9 in ?? () #14 0x00007f2600000001 in ?? () #15 0xe0869fe4664878a2 in ?? () #16 0x00007fff68d421e0 in ?? () #17 0x00007f26a96401e4 in ?? () #18 0x00007f2600000017 in ?? () #19 0x00007f26a7e48cec in ?? () from /lib64/libc-2.12.so #20 0x0000000000000000 in ?? ()
(In reply to klaas.buist from comment #10) > # gdb --core=core.1664 --exec=/usr/local/bin/qemu-system-x86_64 This is some random version of qemu? TBH I've no idea what this bug is, but it could be something specific to the CentOS kernel. Have you tried looking for similar reports in the CentOS bug tracker, or seeing if a fresh CentOS install can run 'libguestfs-test-tool'?
This is version 1.5.3 of qemu. version 1.6.0 gives similar traces. During testing with the non-stripped versions I had 1 or 2 occasions of successfull libguestfs-test-tool runs, but most of the time the runs would fail. Could this be indicating some timing issues? I carried out these tests on a freshly installed Centos VM and I did not find anything similar in the cento bug tracker.
After going back from kernel 3.10 to a 3.9 version on the fedora 19 host, the libguestfs-test-tool is running ssuccessfull all the time. So it appears something got broken between kernel 3.9.5-301.fc19 and 3.10.10-200.fc19 on the host.
Same problem here (with regular qemu-kvm not libguestfs): KVM internal error. Suberror: 2 extra data[0]: 80000202 extra data[1]: 80000202 rax 00000000c3300100 rbx 00000000c33080c0 rcx 00000000c33080c0 rdx 00000000c0408995 rsi 0000000000000001 rdi 00000000ffffffff rsp 00000000f70a1ebc rbp 00000000f7086ab0 r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 rip 00000000c0830abb rflags 00000006 cs 0060 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type b l 0 g 1 avl 0) ds 007b (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0) es 007b (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0) ss 0068 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0) fs 00d8 (027f9000/ffffffff p 1 dpl 0 db 0 s 1 type 3 l 0 g 1 avl 0) gs 00e0 (c3307f80/00000018 p 1 dpl 0 db 1 s 1 type 1 l 0 g 0 avl 0) tr 0080 (c3305dc0/0000206b p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gdt c3300000/ff idt c0a2b000/7ff cr0 8005003b cr2 0 cr3 a09000 cr4 6f0 cr8 0 efer 800 Even with a newer kernel: kernel-3.11.0-200.fc19.x86_64
Klaas, thanks for taking the time to enter a bug report with us. We appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, we're not able to guarantee the timeliness or suitability of a resolution for issues entered here because this is not a mechanism for requesting support. If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto
Do I understand correctly that you are running nested guest and this nested guest fails with internal error?
(In reply to Gleb Natapov from comment #16) > Do I understand correctly that you are running nested guest and this nested > guest fails with internal error? qemu-kvm in L1 is reporting: KVM internal error. Suberror: 2 extra data[0]: 80000202 extra data[1]: 80000202 ... L2 just hangs there (frozen). No errors in L0.
(In reply to Gleb Natapov from comment #16) > Do I understand correctly that you are running nested guest and this nested > guest fails with internal error? Yes, but in my case only when using libguestfs. I did not encounter the problem when starting 'normal' KVM VMs (using openstack).
(In reply to klaas.buist from comment #18) > (In reply to Gleb Natapov from comment #16) > > Do I understand correctly that you are running nested guest and this nested > > guest fails with internal error? > > Yes, but in my case only when using libguestfs. I did not encounter the > problem when starting 'normal' KVM VMs (using openstack). Hi Klaas, maybe you can share here the qemu-kvm command line used by openstack, which might help us to identify what's different there and therefore what's the problem. Thanks.
(In reply to klaas.buist from comment #18) > (In reply to Gleb Natapov from comment #16) > > Do I understand correctly that you are running nested guest and this nested > > guest fails with internal error? > > Yes, but in my case only when using libguestfs. I did not encounter the > problem when starting 'normal' KVM VMs (using openstack). Presumably the OpenStack VM is not nested, ie. runs on baremetal, and you're running libguestfs inside the OpenStack VM (hence nested)?
(In reply to Federico Simoncelli from comment #19) > (In reply to klaas.buist from comment #18) > > (In reply to Gleb Natapov from comment #16) > > > Do I understand correctly that you are running nested guest and this nested > > > guest fails with internal error? > > > > Yes, but in my case only when using libguestfs. I did not encounter the > > problem when starting 'normal' KVM VMs (using openstack). > > Hi Klaas, maybe you can share here the qemu-kvm command line used by > openstack, which might help us to identify what's different there and > therefore what's the problem. Thanks. Here is the command as used by openstack to lauch a VM. This VM is running fine: qemu 7945 1 2 11:51 ? 00:01:24 /usr/libexec/qemu-kvm -name instance-0000000e -S -M rhel6.4.0 -no-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -uuid 3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4 -smbios type=1,manufacturer=Red Hat,, Inc.,product=Red Hat OpenStack Nova,version=2013.1.3-3.el6ost,serial=6fcbde20-64bb-074d-8788-8778f826b615,uuid=3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000000e.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:26:13:32,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 192.168.100.20:0 -k en-us -vga cirrus -incoming fd:22 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 For comparing, this is the (stuck) libguestfs qemu-kvm: root 25275 25093 17 12:46 pts/0 00:00:08 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -drive file=/tmp/libguestfs-test-tool-sda-od2bTz,cache=none,format=raw,if=virtio -nodefconfig -machine accel=kvm:tcg -m 500 -no-reboot -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsm1fkfw/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -kernel /var/tmp/.guestfs-0/kernel.25093 -initrd /var/tmp/.guestfs-0/initrd.25093 -append panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color -drive file=/var/tmp/.guestfs-0/root.25093,snapshot=on,if=virtio,cache=unsafe
(In reply to Richard W.M. Jones from comment #20) > (In reply to klaas.buist from comment #18) > > (In reply to Gleb Natapov from comment #16) > > > Do I understand correctly that you are running nested guest and this nested > > > guest fails with internal error? > > > > Yes, but in my case only when using libguestfs. I did not encounter the > > problem when starting 'normal' KVM VMs (using openstack). > > Presumably the OpenStack VM is not nested, ie. runs on > baremetal, and you're running libguestfs inside the > OpenStack VM (hence nested)? I have openstack running inside a VM (for evaluation). The libguestfs is run inside that VM where openstack is installed/runs.
(In reply to klaas.buist from comment #21) > (In reply to Federico Simoncelli from comment #19) > > (In reply to klaas.buist from comment #18) > > > (In reply to Gleb Natapov from comment #16) > > > > Do I understand correctly that you are running nested guest and this nested > > > > guest fails with internal error? > > > > > > Yes, but in my case only when using libguestfs. I did not encounter the > > > problem when starting 'normal' KVM VMs (using openstack). > > > > Hi Klaas, maybe you can share here the qemu-kvm command line used by > > openstack, which might help us to identify what's different there and > > therefore what's the problem. Thanks. > > Here is the command as used by openstack to lauch a VM. This VM is running > fine: > > qemu 7945 1 2 11:51 ? 00:01:24 /usr/libexec/qemu-kvm -name > instance-0000000e -S -M rhel6.4.0 -no-kvm -m 512 -smp You don't see this error happening in openstack because it's not using kvm as it uses the -no-kvm flag.
(In reply to Federico Simoncelli from comment #23) > > > > qemu 7945 1 2 11:51 ? 00:01:24 /usr/libexec/qemu-kvm -name > > instance-0000000e -S -M rhel6.4.0 -no-kvm -m 512 -smp > > You don't see this error happening in openstack because it's not using kvm > as it uses the -no-kvm flag. Humm, overlooked that. After changing from qemu to kvm, the VM fails to start with the same error as well.
This is a bug in the Fedora kernel's support for nested virtualization. Changing product for now, but it's probably best moved to the upstream kernel bug tracker.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those.
(In reply to Justin M. Forbes from comment #27) > *********** MASS BUG UPDATE ************** > > We apologize for the inconvenience. There is a large number of bugs to go > through and several of them have gone stale. Due to this, we are doing a > mass bug update across all of the Fedora 19 kernel bugs. > > Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel > update (or newer) and let us know if you issue has been resolved or if it is > still present with the newer kernel. > > If you have moved on to Fedora 20, and are still experiencing this issue, > please change the version to Fedora 20. > > If you experience different issues, please open a new bug report for those. I am still seeing the issue with the latest fedora 19 kernel 3.12.6-200.fc19.x86_64
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.14.4-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.