Bug 1004167

Summary:	KVM fails with "KVM internal error. Suberror: 2"
Product:	[Fedora] Fedora	Reporter:	klaas.buist
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	20	CC:	acathrow, bsarathy, drjones, fsimonce, gansalmon, hhuang, itamar, jonathan, juzhang, kernel-maint, klaas.buist, madhu.chinakonda, marcelo.barbosa, mbooth, mkenneth, ohadlevy, pbonzini, qzhang, rjones, virt-maint
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-06-18 14:03:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description klaas.buist 2013-09-04 06:49:00 UTC

Description of problem:Trying to run openstack inside a KVM VM, but the ssh key insertion into a new VM image fails.
I then ran guestfs-test-tool and that fails as well.
the following error is shown at the end of the run:

KVM internal error. Suberror: 2

I am running this inside a KVM VM on top of Fedora 19 (up to date). Inside the guest, RHEL6.4 and openstack is running (up to date, kernel ).
The libguestfs-test-tool is also failing when using a 'plain' centos 6 VM with the latest updates

Version-Release number of selected component (if applicable):
# rpm -q libguestfs
libguestfs-1.16.34-2.el6.x86_64

How reproducible:
every time


Steps to Reproduce:
1. run libguestfs-test-tool
2.
3.

Actual results: failure KVM internal error. Suberror: 2


Expected results: success


Additional info:

# libguestfs-test-tool
     ************************************************************
     *                    IMPORTANT NOTICE
     *
     * When reporting bugs, include the COMPLETE, UNEDITED
     * output below in your bug report.
     *
     ************************************************************
===== Test starts here =====
PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
library version: 1.16.34rhel=6,release=2.el6
guestfs_get_append: (null)
guestfs_get_attach_method: appliance
guestfs_get_autosync: 1
guestfs_get_direct: 0
guestfs_get_memsize: 500
guestfs_get_network: 0
guestfs_get_path: /usr/lib64/guestfs
guestfs_get_pgroup: 0
guestfs_get_qemu: /usr/libexec/qemu-kvm
guestfs_get_recovery_proc: 1
guestfs_get_selinux: 0
guestfs_get_smp: 1
guestfs_get_trace: 0
guestfs_get_verbose: 1
host_cpu: x86_64
Launching appliance, timeout set to 600 seconds.
libguestfs: [00000ms] febootstrap-supermin-helper --verbose -f checksum '/usr/lib64/guestfs/supermin.d' x86_64
supermin helper [00000ms] whitelist = (not specified), host_cpu = x86_64, kernel = (null), initrd = (null), appliance = (null)
supermin helper [00000ms] inputs[0] = /usr/lib64/guestfs/supermin.d
checking modpath /lib/modules/2.6.32-358.el6.x86_64 is a directory
picked vmlinuz-2.6.32-358.el6.x86_64 because modpath /lib/modules/2.6.32-358.el6.x86_64 exists
checking modpath /lib/modules/2.6.32-358.114.1.openstack.el6.x86_64 is a directory
picked vmlinuz-2.6.32-358.114.1.openstack.el6.x86_64 because modpath /lib/modules/2.6.32-358.114.1.openstack.el6.x86_64 exists
supermin helper [00001ms] finished creating kernel
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/base.img
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/daemon.img
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/hostfiles
supermin helper [00033ms] visiting /usr/lib64/guestfs/supermin.d/init.img
supermin helper [00034ms] adding kernel modules
supermin helper [00090ms] finished creating appliance
libguestfs: [00095ms] begin testing qemu features
libguestfs: [00109ms] finished testing qemu features
libguestfs: accept_from_daemon: 0x1d2a690 g->state = 1
[00109ms] /usr/libexec/qemu-kvm \
    -global virtio-blk-pci.scsi=off \
    -nodefconfig \
    -nodefaults \
    -nographic \
    -drive file=/tmp/libguestfs-test-tool-sda-uJwNbR,cache=none,format=raw,if=virtio \
    -nodefconfig \
    -machine accel=kvm:tcg \
    -m 500 \
    -no-reboot \
    -device virtio-serial \
    -serial stdio \
    -device sga \
    -chardev socket,path=/tmp/libguestfsvnx5Fm/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -kernel /var/tmp/.guestfs-0/kernel.19637 \
    -initrd /var/tmp/.guestfs-0/initrd.19637 \
    -append 'panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color ' \
    -drive file=/var/tmp/.guestfs-0/root.19637,snapshot=on,if=virtio,cache=unsafe\x1b[1;256r\x1b[256;256H\x1b[6n
Google, Inc.
Serial Graphics Adapter 07/26/11
SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild.redhat.com) Tue Jul 26 15:05:08 UTC 2011
Term: 80x24
4 0
SeaBIOS (version seabios-0.6.1.2-26.el6)

Probing EDD (edd=off to disable)... ok
\x1b[2JInitializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-358.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:47:41 EST 2013
Command line: panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color 
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
Disabled fast string operations
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009b800 (usable)
 BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001f3fd000 (usable)
 BIOS-e820: 000000001f3fd000 - 000000001f400000 (reserved)
 BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
DMI 2.4 present.
SMBIOS version 2.4 @ 0xFDA30
last_pfn = 0x1f3fd max_arch_pfn = 0x400000000
PAT not supported by CPU.
init_memory_mapping: 0000000000000000-000000001f3fd000
RAMDISK: 1f1ab000 - 1f3ef000
No NUMA configuration found
Faking a node at 0000000000000000-000000001f3fd000
Bootmem setup node 0 0000000000000000-000000001f3fd000
  NODE_DATA [0000000000009000 - 000000000003cfff]
  bootmap [000000000003d000 -  0000000000040e7f] pages 4
(7 early reservations) ==> bootmem [0000000000 - 001f3fd000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0001000000 - 000201b0a4]    TEXT DATA BSS ==> [0001000000 - 000201b0a4]
  #3 [001f1ab000 - 001f3ef000]          RAMDISK ==> [001f1ab000 - 001f3ef000]
  #4 [000009b800 - 0000100000]    BIOS reserved ==> [000009b800 - 0000100000]
  #5 [000201c000 - 000201c059]              BRK ==> [000201c000 - 000201c059]
  #6 [0000008000 - 0000009000]          PGTABLE ==> [0000008000 - 0000009000]
found SMP MP-table at [ffff8800000fda50] fda50
kvm-clock: Using msrs 4b564d01 and 4b564d00
kvm-clock: cpu 0, msr 0:1c25681, boot clock
Zone PFN ranges:
  DMA      0x00000001 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000001 -> 0x0000009b
    0: 0x00000100 -> 0x0001f3fd
SFI: Simple Firmware Interface v0.7 http://simplefirmware.org
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: BOCHSCPU
MPTABLE: Product ID: 0.1         
MPTABLE: APIC at: 0xFEE00000
Processor #0 (Bootup-CPU)
I/O APIC #0 Version 17 at 0xFEC00000.
Processors: 1
SMP: Allowing 1 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 000000000009b000 - 000000000009c000
PM: Registered nosave memory: 000000000009c000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
Allocating PCI resources starting at 1f400000 (gap: 1f400000:e0bbc000)
Booting paravirtualized kernel on KVM
NR_CPUS:4096 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1
PERCPU: Embedded 31 pages/cpu @ffff880002200000 s94552 r8192 d24232 u2097152
pcpu-alloc: s94552 r8192 d24232 u2097152 alloc=1*2097152
pcpu-alloc: [0] 0 
kvm-clock: cpu 0, msr 0:2216681, primary cpu clock
kvm-stealtime: cpu 0, msr 220e840
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 126041
Policy zone: DMA32
Kernel command line: panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color 
[    0.000000] Disabling memory control group subsystem
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 483960k/511988k available (5220k kernel code, 408k absent, 27620k reserved, 7121k data, 1264k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:33024 nr_irqs:256
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [ttyS0] enabled
[    0.000000] Detected 2194.974 MHz processor.
[    0.003999] Calibrating delay loop (skipped) preset value.. 4389.94 BogoMIPS (lpj=2194974)
[    0.007020] pid_max: default: 32768 minimum: 301
[    0.009204] Security Framework initialized
[    0.011061] SELinux:  Disabled at boot.
[    0.014302] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.019756] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.022686] Mount-cache hash table entries: 256
[    0.026447] Initializing cgroup subsys ns
[    0.028017] Initializing cgroup subsys cpuacct
[    0.030021] Initializing cgroup subsys memory
[    0.032131] Initializing cgroup subsys devices
[    0.034021] Initializing cgroup subsys freezer
[    0.036020] Initializing cgroup subsys net_cls
[    0.039018] Initializing cgroup subsys blkio
[    0.041093] Initializing cgroup subsys perf_event
[    0.043027] Initializing cgroup subsys net_prio
[    0.046592] Disabled fast string operations
[    0.053609] mce: CPU supports 10 MCE banks
[    0.055700] alternatives: switching to unfair spinlock
[    0.102457] SMP alternatives: switching to UP code
[    1.099466] Freeing SMP alternatives: 35k freed
[    1.101016] ftrace: converting mcount calls to 0f 1f 44 00 00
[    1.102945] ftrace: allocating 21428 entries in 85 pages
[    1.109078] Setting APIC routing to flat
[    1.126227] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    1.129858] CPU0: Intel QEMU Virtual CPU version (cpu64-rhel6) stepping 03
KVM internal error. Suberror: 2
extra data[0]: 80000202
extra data[1]: 80000202
rax ffffffff81a96e20 rbx 00000000005bc312 rcx 0000000000000064 rdx 0000000000000001
rsi 000000000000eae9 rdi 0000000000000046 rsp ffff88001e9bbe40 rbp ffff88001e9bbe80
r8  ffffffff81c07720 r9  0000000000000000 r10 0000000000000000 r11 0000000000000003
r12 ffff88000220e0e0 r13 00000000ffffffff r14 ffff880002200000 r15 0000000000000000
rip ffffffff81c392a7 rflags 00000283
cs 0010 (00000000/ffffffff p 1 dpl 0 db 0 s 1 type b l 1 g 1 avl 0)
ds 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
es 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
ss 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
fs 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
gs 0000 (ffff880002200000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
tr 0040 (ffff880002214280/00002087 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
ldt 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
gdt ffff880002204000/7f
idt ffffffff81dde000/fff
cr0 8005003b cr2 0 cr3 1a85000 cr4 6f0 cr8 0 efer d01
^C

Comment 1 Richard W.M. Jones 2013-09-04 09:24:32 UTC

I've never seen an error anything like this.  Is
"KVM internal error. Suberror: 2 [etc]" printed out by the
appliance kernel or by the /usr/libexec/qemu-kvm process?

Comment 2 Qunfang Zhang 2013-09-04 09:34:35 UTC

We hits some similar error during system_reset guest, but not always reproduced. 

Bug 1002794 - KVM internal error. Suberror: 1 when doing system_reset

Comment 3 Andrew Jones 2013-09-04 09:36:09 UTC

(In reply to Richard W.M. Jones from comment #1)
> I've never seen an error anything like this.  Is
> "KVM internal error. Suberror: 2 [etc]" printed out by the
> appliance kernel or by the /usr/libexec/qemu-kvm process?

qemu outputs the error, but it does so due to kvm returning KVM_EXIT_INTERNAL_ERROR for its exit reason. Unfortunately there are many reasons this exit reason could be returned. We need to identify a reliable way to reproduce this, and then trace kvm while reproducing it. It appears to reproduce 100% for the reporter, so maybe it's machine-specific?

Klaas,
can you please paste the output of /proc/cpuinfo here?

Comment 5 klaas.buist 2013-09-04 12:00:39 UTC

Here is the cpuinfo of the machine I'm running libguestfs in:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel Xeon E312xx (Sandy Bridge)
stepping	: 1
cpu MHz		: 2195.016
cache size	: 4096 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips	: 4390.03
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

And this is of it's host (maybe that is relevant as well:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz
stepping	: 9
microcode	: 0x19
cpu MHz		: 2574.000
cache size	: 6144 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 4389.80
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 6 Richard W.M. Jones 2013-09-05 12:18:52 UTC

Is there anything printed by the kernel (dmesg) when
the error occurs?

You could also use an alternate qemu, eg. one compiled from
upstream sources, and just set LIBGUESTFS_QEMU to point to
the alternate qemu.
export LIBGUESTFS_QEMU=/path/to/d/qemu/x86_64-softmmu/qemu-system-x86_64
libguestfs-test-tool

Comment 7 klaas.buist 2013-09-09 06:59:41 UTC

No messages are printed by the kernel on host or guest.

I have tried with both 1.5.3 and 1.6.0 versions of qemu and they both fail in similar way. When the problem happens the qemu process seems to be stuck and so I killed it with a sigsegv to get a core dump of it.
It shows the following stacktrace:

Core was generated by `libguestfs-test-tool'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f2b730a7513 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Missing separate debuginfos, use: debuginfo-install libidn-1.18-2.el6.x86_64 yajl-1.0.7-3.el6.x86_64
(gdb) bt
#0  0x00007f2b730a7513 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f2b735eb358 in guestfs___recv_from_daemon (g=0x8f0620, size_rtn=0x7fffa6894f2c, buf_rtn=0x7fffa6894ef0) at proto.c:584
#2  0x00007f2b735e7dd4 in launch_appliance (g=0x8f0620) at launch.c:967
#3  0x00007f2b7359113c in guestfs_launch (g=<value optimized out>) at actions.c:1123
#4  0x000000000040210d in ?? ()
#5  0x0000007c00000001 in ?? ()
#6  0x00000000004022f0 in ?? ()
#7  0x0000000000000000 in ?? ()

I am planning on also trying a later version of qemu on the host to see if that makes a difference.

Comment 8 klaas.buist 2013-09-09 07:36:25 UTC

Using the latest version of qemu (1.5.3) on the host also does not seem to make difference. libguestfs-test-tool is still failing consistently inside the client.

Comment 9 Richard W.M. Jones 2013-09-09 07:46:52 UTC

(In reply to klaas.buist from comment #7)
> No messages are printed by the kernel on host or guest.
> 
> I have tried with both 1.5.3 and 1.6.0 versions of qemu and they both fail
> in similar way. When the problem happens the qemu process seems to be stuck
> and so I killed it with a sigsegv to get a core dump of it.
> It shows the following stacktrace:
> 
> Core was generated by `libguestfs-test-tool'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f2b730a7513 in __select_nocancel () at
> ../sysdeps/unix/syscall-template.S:82
> 82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
> Missing separate debuginfos, use: debuginfo-install libidn-1.18-2.el6.x86_64
> yajl-1.0.7-3.el6.x86_64
> (gdb) bt
> #0  0x00007f2b730a7513 in __select_nocancel () at
> ../sysdeps/unix/syscall-template.S:82
> #1  0x00007f2b735eb358 in guestfs___recv_from_daemon (g=0x8f0620,
> size_rtn=0x7fffa6894f2c, buf_rtn=0x7fffa6894ef0) at proto.c:584
> #2  0x00007f2b735e7dd4 in launch_appliance (g=0x8f0620) at launch.c:967
> #3  0x00007f2b7359113c in guestfs_launch (g=<value optimized out>) at
> actions.c:1123
> #4  0x000000000040210d in ?? ()
> #5  0x0000007c00000001 in ?? ()
> #6  0x00000000004022f0 in ?? ()
> #7  0x0000000000000000 in ?? ()

That's the stack trace of libguestfs-test-tool which isn't
really telling us anything -- it just says that libguestfs is
blocked waiting for an answer from qemu.

You need to get a stack trace from qemu itself.

Comment 10 klaas.buist 2013-09-09 14:13:01 UTC

Ahh, here it is, unfortunately it does not show much info yet, even though the executable is not stripped.

# gdb --core=core.1664 --exec=/usr/local/bin/qemu-system-x86_64 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.

warning: core file may not match specified executable file.
[New Thread 1664]
[New Thread 1667]
Missing separate debuginfo for 
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/05/4c5697ea4022cf320747aabbf8120fe1246ff6
Reading symbols from /lib64/librt-2.12.so...Reading symbols from /usr/lib/debug/lib64/librt-2.12.so.debug...done.
done.
Loaded symbols for /lib64/librt-2.12.so
Reading symbols from /lib64/libgthread-2.0.so.0.2200.5...Reading symbols from /usr/lib/debug/lib64/libgthread-2.0.so.0.2200.5.debug...done.
done.
Loaded symbols for /lib64/libgthread-2.0.so.0.2200.5
Reading symbols from /lib64/libglib-2.0.so.0.2200.5...Reading symbols from /usr/lib/debug/lib64/libglib-2.0.so.0.2200.5.debug...done.
done.
Loaded symbols for /lib64/libglib-2.0.so.0.2200.5
Reading symbols from /lib64/libutil-2.12.so...Reading symbols from /usr/lib/debug/lib64/libutil-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libutil-2.12.so
Reading symbols from /lib64/libz.so.1.2.3...Reading symbols from /usr/lib/debug/lib64/libz.so.1.2.3.debug...done.
done.
Loaded symbols for /lib64/libz.so.1.2.3
Reading symbols from /lib64/libm-2.12.so...Reading symbols from /usr/lib/debug/lib64/libm-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libm-2.12.so
Reading symbols from /lib64/libpthread-2.12.so...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
[Thread debugging using libthread_db enabled]
done.
Loaded symbols for /lib64/libpthread-2.12.so
Reading symbols from /lib64/libc-2.12.so...Reading symbols from /usr/lib/debug/lib64/libc-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libc-2.12.so
Reading symbols from /lib64/ld-2.12.so...Reading symbols from /usr/lib/debug/lib64/ld-2.12.so.debug...done.
done.
Loaded symbols for /lib64/ld-2.12.so
Core was generated by `/usr/local/bin/qemu-system-x86_64 -global virtio-blk-pci.scsi=off -nodefconfig'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f26a7f26293 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
87	  int result = INLINE_SYSCALL (poll, 3, CHECK_N (fds, nfds), nfds, timeout);
(gdb) bt
#0  0x00007f26a7f26293 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f26a95b41f6 in ?? ()
#2  0x0000001900000003 in ?? ()
#3  0xffffffff00000000 in ?? ()
#4  0x0000001968d41e60 in ?? ()
#5  0xe0869fe4664878a2 in ?? ()
#6  0x00007fff68d41e60 in ?? ()
#7  0x00007f26a95b4299 in ?? ()
#8  0x00007fff68d41e60 in ?? ()
#9  0x00000000a9638e63 in ?? ()
#10 0x00000002ffffffff in ?? ()
#11 0xe0869fe4664878a2 in ?? ()
#12 0x00007fff68d41e80 in ?? ()
#13 0x00007f26a9638ed9 in ?? ()
#14 0x00007f2600000001 in ?? ()
#15 0xe0869fe4664878a2 in ?? ()
#16 0x00007fff68d421e0 in ?? ()
#17 0x00007f26a96401e4 in ?? ()
#18 0x00007f2600000017 in ?? ()
#19 0x00007f26a7e48cec in ?? () from /lib64/libc-2.12.so
#20 0x0000000000000000 in ?? ()

Comment 11 Richard W.M. Jones 2013-09-09 14:20:15 UTC

(In reply to klaas.buist from comment #10)
> # gdb --core=core.1664 --exec=/usr/local/bin/qemu-system-x86_64 

This is some random version of qemu?

TBH I've no idea what this bug is, but it could be something
specific to the CentOS kernel.  Have you tried looking for
similar reports in the CentOS bug tracker, or seeing if a
fresh CentOS install can run 'libguestfs-test-tool'?

Comment 12 klaas.buist 2013-09-09 14:55:30 UTC

This is version 1.5.3 of qemu. version 1.6.0 gives similar traces.

During testing with the non-stripped versions I had 1 or 2 occasions of successfull libguestfs-test-tool runs, but most of the time the runs would fail. Could this be indicating some timing issues?

I carried out these tests on a freshly installed Centos VM and I did not find anything similar in the cento bug tracker.

Comment 13 klaas.buist 2013-09-09 19:20:13 UTC

After going back from kernel 3.10 to a 3.9 version on the fedora 19 host, the libguestfs-test-tool is running ssuccessfull all the time.
So it appears something got broken between kernel 3.9.5-301.fc19 and 3.10.10-200.fc19 on the host.

Comment 14 Federico Simoncelli 2013-09-14 11:10:22 UTC

Same problem here (with regular qemu-kvm not libguestfs):

KVM internal error. Suberror: 2
extra data[0]: 80000202
extra data[1]: 80000202
rax 00000000c3300100 rbx 00000000c33080c0 rcx 00000000c33080c0 rdx 00000000c0408995
rsi 0000000000000001 rdi 00000000ffffffff rsp 00000000f70a1ebc rbp 00000000f7086ab0
r8  0000000000000000 r9  0000000000000000 r10 0000000000000000 r11 0000000000000000
r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
rip 00000000c0830abb rflags 00000006
cs 0060 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type b l 0 g 1 avl 0)
ds 007b (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0)
es 007b (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0)
ss 0068 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
fs 00d8 (027f9000/ffffffff p 1 dpl 0 db 0 s 1 type 3 l 0 g 1 avl 0)
gs 00e0 (c3307f80/00000018 p 1 dpl 0 db 1 s 1 type 1 l 0 g 0 avl 0)
tr 0080 (c3305dc0/0000206b p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
ldt 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
gdt c3300000/ff
idt c0a2b000/7ff
cr0 8005003b cr2 0 cr3 a09000 cr4 6f0 cr8 0 efer 800


Even with a newer kernel: kernel-3.11.0-200.fc19.x86_64

Comment 15 Ademar Reis 2013-09-16 20:07:04 UTC

Klaas, thanks for taking the time to enter a bug report with us. We appreciate
the feedback and look to use reports such as this to guide our efforts at
improving our products. That being said, we're not able to guarantee the
timeliness or suitability of a resolution for issues entered here because this
is not a mechanism for requesting support.

If this issue is critical or in any way time sensitive, please raise a ticket
through your regular Red Hat support channels to make certain  it receives the
proper attention and prioritization to assure a timely resolution.

For information on how to contact the Red Hat production support team, please
visit: https://www.redhat.com/support/process/production/#howto

Comment 16 Gleb Natapov 2013-09-17 15:00:51 UTC

Do I understand correctly that you are running nested guest and this nested guest fails with internal error?

Comment 17 Federico Simoncelli 2013-09-17 16:12:24 UTC

(In reply to Gleb Natapov from comment #16)
> Do I understand correctly that you are running nested guest and this nested
> guest fails with internal error?

qemu-kvm in L1 is reporting:

KVM internal error. Suberror: 2
extra data[0]: 80000202
extra data[1]: 80000202
...

L2 just hangs there (frozen). No errors in L0.

Comment 18 klaas.buist 2013-09-18 06:24:43 UTC

(In reply to Gleb Natapov from comment #16)
> Do I understand correctly that you are running nested guest and this nested
> guest fails with internal error?

Yes, but in my case only when using libguestfs. I did not encounter the problem when starting 'normal' KVM VMs (using openstack).

Comment 19 Federico Simoncelli 2013-09-18 08:29:58 UTC

(In reply to klaas.buist from comment #18)
> (In reply to Gleb Natapov from comment #16)
> > Do I understand correctly that you are running nested guest and this nested
> > guest fails with internal error?
> 
> Yes, but in my case only when using libguestfs. I did not encounter the
> problem when starting 'normal' KVM VMs (using openstack).

Hi Klaas, maybe you can share here the qemu-kvm command line used by openstack, which might help us to identify what's different there and therefore what's the problem. Thanks.

Comment 20 Richard W.M. Jones 2013-09-18 09:42:38 UTC

(In reply to klaas.buist from comment #18)
> (In reply to Gleb Natapov from comment #16)
> > Do I understand correctly that you are running nested guest and this nested
> > guest fails with internal error?
> 
> Yes, but in my case only when using libguestfs. I did not encounter the
> problem when starting 'normal' KVM VMs (using openstack).

Presumably the OpenStack VM is not nested, ie. runs on
baremetal, and you're running libguestfs inside the
OpenStack VM (hence nested)?

Comment 21 klaas.buist 2013-09-18 10:50:59 UTC

(In reply to Federico Simoncelli from comment #19)
> (In reply to klaas.buist from comment #18)
> > (In reply to Gleb Natapov from comment #16)
> > > Do I understand correctly that you are running nested guest and this nested
> > > guest fails with internal error?
> > 
> > Yes, but in my case only when using libguestfs. I did not encounter the
> > problem when starting 'normal' KVM VMs (using openstack).
> 
> Hi Klaas, maybe you can share here the qemu-kvm command line used by
> openstack, which might help us to identify what's different there and
> therefore what's the problem. Thanks.

Here is the command as used by openstack to lauch a VM. This VM is running fine:

qemu      7945     1  2 11:51 ?        00:01:24 /usr/libexec/qemu-kvm -name instance-0000000e -S -M rhel6.4.0 -no-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -uuid 3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4 -smbios type=1,manufacturer=Red Hat,, Inc.,product=Red Hat OpenStack Nova,version=2013.1.3-3.el6ost,serial=6fcbde20-64bb-074d-8788-8778f826b615,uuid=3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000000e.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:26:13:32,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/3a06ff96-e21d-4b60-b1f2-d8b7d461cdc4/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 192.168.100.20:0 -k en-us -vga cirrus -incoming fd:22 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

For comparing, this is the (stuck) libguestfs qemu-kvm:

root     25275 25093 17 12:46 pts/0    00:00:08 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -drive file=/tmp/libguestfs-test-tool-sda-od2bTz,cache=none,format=raw,if=virtio -nodefconfig -machine accel=kvm:tcg -m 500 -no-reboot -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsm1fkfw/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -kernel /var/tmp/.guestfs-0/kernel.25093 -initrd /var/tmp/.guestfs-0/initrd.25093 -append panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=xterm-256color  -drive file=/var/tmp/.guestfs-0/root.25093,snapshot=on,if=virtio,cache=unsafe

Comment 22 klaas.buist 2013-09-18 10:54:38 UTC

(In reply to Richard W.M. Jones from comment #20)
> (In reply to klaas.buist from comment #18)
> > (In reply to Gleb Natapov from comment #16)
> > > Do I understand correctly that you are running nested guest and this nested
> > > guest fails with internal error?
> > 
> > Yes, but in my case only when using libguestfs. I did not encounter the
> > problem when starting 'normal' KVM VMs (using openstack).
> 
> Presumably the OpenStack VM is not nested, ie. runs on
> baremetal, and you're running libguestfs inside the
> OpenStack VM (hence nested)?

I have openstack running inside a VM (for evaluation). The libguestfs is run inside that VM where openstack is installed/runs.

Comment 23 Federico Simoncelli 2013-09-18 11:37:28 UTC

(In reply to klaas.buist from comment #21)
> (In reply to Federico Simoncelli from comment #19)
> > (In reply to klaas.buist from comment #18)
> > > (In reply to Gleb Natapov from comment #16)
> > > > Do I understand correctly that you are running nested guest and this nested
> > > > guest fails with internal error?
> > > 
> > > Yes, but in my case only when using libguestfs. I did not encounter the
> > > problem when starting 'normal' KVM VMs (using openstack).
> > 
> > Hi Klaas, maybe you can share here the qemu-kvm command line used by
> > openstack, which might help us to identify what's different there and
> > therefore what's the problem. Thanks.
> 
> Here is the command as used by openstack to lauch a VM. This VM is running
> fine:
> 
> qemu      7945     1  2 11:51 ?        00:01:24 /usr/libexec/qemu-kvm -name
> instance-0000000e -S -M rhel6.4.0 -no-kvm -m 512 -smp

You don't see this error happening in openstack because it's not using kvm as it uses the -no-kvm flag.

Comment 24 klaas.buist 2013-09-18 12:23:25 UTC

(In reply to Federico Simoncelli from comment #23)

> > 
> > qemu      7945     1  2 11:51 ?        00:01:24 /usr/libexec/qemu-kvm -name
> > instance-0000000e -S -M rhel6.4.0 -no-kvm -m 512 -smp
> 
> You don't see this error happening in openstack because it's not using kvm
> as it uses the -no-kvm flag.

Humm, overlooked that. After changing from qemu to kvm, the VM fails to start with the same error as well.

Comment 25 Paolo Bonzini 2013-09-19 21:37:50 UTC

This is a bug in the Fedora kernel's support for nested virtualization.  Changing product for now, but it's probably best moved to the upstream kernel bug tracker.

Comment 27 Justin M. Forbes 2014-01-03 22:10:32 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 28 klaas.buist 2014-01-06 15:18:57 UTC

(In reply to Justin M. Forbes from comment #27)
> *********** MASS BUG UPDATE **************
> 
> We apologize for the inconvenience.  There is a large number of bugs to go
> through and several of them have gone stale.  Due to this, we are doing a
> mass bug update across all of the Fedora 19 kernel bugs.
> 
> Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel
> update (or newer) and let us know if you issue has been resolved or if it is
> still present with the newer kernel.
> 
> If you have moved on to Fedora 20, and are still experiencing this issue,
> please change the version to Fedora 20.
> 
> If you experience different issues, please open a new bug report for those.

I am still seeing the issue with the latest fedora 19 kernel 3.12.6-200.fc19.x86_64

Comment 29 Justin M. Forbes 2014-05-21 19:40:20 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.14.4-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 30 Josh Boyer 2014-06-18 14:03:44 UTC

This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.