Bug 1565179

Summary:	qemu-kvm fails when run with -machine accel=kvm:tcg nested KVM
Product:	Red Hat Enterprise Linux 7	Reporter:	Yatin Karel <ykarel>
Component:	qemu-kvm-rhev	Assignee:	Paolo Bonzini <pbonzini>
Status:	CLOSED DEFERRED	QA Contact:	Qinghua Cheng <qcheng>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.4	CC:	apevec, atodorov, bdas, berrange, chayang, choma, dgilbert, jikortus, jinzhao, juzhang, kchamart, knoel, michen, mkolman, mpitt, pbonzini, rjones, virt-maint, xfu, ykarel
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-11-06 14:40:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yatin Karel 2018-04-09 14:37:06 UTC

Description of problem:
In tripleo upstream job while running virt-cusomize it fails with below Error

- Running libguestfs-test-tool also fails with same Error
-  /usr/libexec/qemu-kvm -hdc /dev/null FAILS
-  /usr/libexec/qemu-kvm --no-kvm -hdc /dev/null PASSES
- virt-cusomize run with export LIBGUESTFS_BACKEND_SETTINGS=force_tcg PASSES


Version-Release number of selected component (if applicable):
qemu-kvm-ev-2.9.0-16.el7_4.14.1.x86_64

-bash-4.2# cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core)

[jenkins@undercloud ~]$ uname -a
Linux undercloud 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 94
model name	: Intel Core Processor (Skylake, IBRS)
stepping	: 3
microcode	: 0x1
cpu MHz		: 2599.996
cache size	: 16384 KB
physical id	: 7
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single spec_ctrl ibpb_support tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
bogomips	: 5199.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:


-bash-4.2# lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             8
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel Core Processor (Skylake, IBRS)
Stepping:              3
CPU MHz:               2599.996
BogoMIPS:              5199.99
Virtualization:        VT-x
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
L3 cache:              16384K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single spec_ctrl ibpb_support tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat


How reproducible:
Not happening always but seen many failures till now.

Upstream Bug:- https://bugs.launchpad.net/tripleo/+bug/1762351

Steps to Reproduce:
1.
2.
3.

Actual results:

[jenkins@undercloud ~]$ tail -f .__repo_setup.sh.log 
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 e5 af ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^C
[jenkins@undercloud ~]$ vi .__repo_setup.sh.log 
[jenkins@undercloud ~]$ cat .__repo_setup.sh.log 
[   0.0] Examining the guest ...
libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
libguestfs: trace: set_network true
libguestfs: trace: set_network = 0
libguestfs: trace: add_drive "overcloud-full.qcow2" "readonly:false" "protocol:file" "discard:besteffort"
libguestfs: trace: add_drive = 0
libguestfs: trace: launch
libguestfs: trace: get_tmpdir
libguestfs: trace: get_tmpdir = "/tmp"
libguestfs: trace: version
libguestfs: trace: version = <struct guestfs_version = major: 1, minor: 36, release: 3, extra: rhel=7,release=6.el7_4.3,libvirt, >
libguestfs: trace: get_backend
libguestfs: trace: get_backend = "direct"
libguestfs: launch: program=virt-customize
libguestfs: launch: version=1.36.3rhel=7,release=6.el7_4.3,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt
libguestfs: launch: backend registered: direct
libguestfs: launch: backend=direct
libguestfs: launch: tmpdir=/tmp/libguestfs15JLpA
libguestfs: launch: umask=0022
libguestfs: launch: euid=0
libguestfs: trace: get_backend_setting "force_tcg"
libguestfs: trace: get_backend_setting = NULL (error)
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: begin building supermin appliance
libguestfs: run supermin
libguestfs: command: run: /usr/bin/supermin5
libguestfs: command: run: \ --build
libguestfs: command: run: \ --verbose
libguestfs: command: run: \ --if-newer
libguestfs: command: run: \ --lock /var/tmp/.guestfs-0/lock
libguestfs: command: run: \ --copy-kernel
libguestfs: command: run: \ -f ext2
libguestfs: command: run: \ --host-cpu x86_64
libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d
libguestfs: command: run: \ -o /var/tmp/.guestfs-0/appliance.d
supermin: version: 5.1.16
supermin: rpm: detected RPM version 4.11
supermin: package handler: fedora/rpm
supermin: acquiring lock on /var/tmp/.guestfs-0/lock
supermin: build: /usr/lib64/guestfs/supermin.d
supermin: reading the supermin appliance
supermin: build: visiting /usr/lib64/guestfs/supermin.d/base.tar.gz type gzip base image (tar)
supermin: build: visiting /usr/lib64/guestfs/supermin.d/daemon.tar.gz type gzip base image (tar)
supermin: build: visiting /usr/lib64/guestfs/supermin.d/excludefiles type uncompressed excludefiles
supermin: build: visiting /usr/lib64/guestfs/supermin.d/hostfiles type uncompressed hostfiles
supermin: build: visiting /usr/lib64/guestfs/supermin.d/init.tar.gz type gzip base image (tar)
supermin: build: visiting /usr/lib64/guestfs/supermin.d/packages type uncompressed packages
supermin: build: visiting /usr/lib64/guestfs/supermin.d/udev-rules.tar.gz type gzip base image (tar)
supermin: mapping package names to installed packages
supermin: resolving full list of package dependencies
supermin: build: 189 packages, including dependencies
supermin: build: 31303 files
supermin: build: 7545 files, after matching excludefiles
supermin: build: 7554 files, after adding hostfiles
supermin: build: 7544 files, after removing unreadable files
supermin: build: 7574 files, after munging
supermin: kernel: SUPERMIN_KERNEL environment variable /boot/vmlinuz-3.10.0-693.el7.x86_64
supermin: kernel: SUPERMIN_KERNEL_VERSION environment variable 3.10.0-693.el7.x86_64
supermin: kernel: SUPERMIN_KERNEL version 3.10.0-693.el7.x86_64
supermin: kernel: SUPERMIN_MODULES environment variable = /lib/modules/3.10.0-693.el7.x86_64
supermin: kernel: kernel_version 3.10.0-693.el7.x86_64
supermin: kernel: modules /lib/modules/3.10.0-693.el7.x86_64
supermin: ext2: creating empty ext2 filesystem '/var/tmp/.guestfs-0/appliance.d.soljd01f/root'
supermin: ext2: populating from base image
supermin: ext2: copying files from host filesystem
supermin: ext2: copying kernel modules
supermin: ext2: creating minimal initrd '/var/tmp/.guestfs-0/appliance.d.soljd01f/initrd'
supermin: ext2: wrote 31 modules to minimal initrd
supermin: renaming /var/tmp/.guestfs-0/appliance.d.soljd01f to /var/tmp/.guestfs-0/appliance.d
libguestfs: finished building supermin appliance
libguestfs: begin testing qemu features
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: checking for previously cached test results of /usr/libexec/qemu-kvm, in /var/tmp/.guestfs-0
libguestfs: command: run: /usr/libexec/qemu-kvm
libguestfs: command: run: \ -display none
libguestfs: command: run: \ -help
libguestfs: qemu version 2.9
libguestfs: command: run: /usr/libexec/qemu-kvm
libguestfs: command: run: \ -display none
libguestfs: command: run: \ -machine accel=kvm:tcg
libguestfs: command: run: \ -device ?
libguestfs: saving test results
libguestfs: trace: get_sockdir
libguestfs: trace: get_sockdir = "/tmp"
libguestfs: finished testing qemu features
libguestfs: trace: get_backend_setting "gdb"
libguestfs: trace: get_backend_setting = NULL (error)
[15139ms] /usr/libexec/qemu-kvm \
    -global virtio-blk-pci.scsi=off \
    -nodefconfig \
    -enable-fips \
    -nodefaults \
    -display none \
    -machine accel=kvm:tcg \
    -cpu host \
    -m 500 \
    -no-reboot \
    -rtc driftfix=slew \
    -no-hpet \
    -global kvm-pit.lost_tick_policy=discard \
    -kernel /var/tmp/.guestfs-0/appliance.d/kernel \
    -initrd /var/tmp/.guestfs-0/appliance.d/initrd \
    -object rng-random,filename=/dev/urandom,id=rng0 \
    -device virtio-rng-pci,rng=rng0 \
    -device virtio-scsi-pci,id=scsi \
    -drive file=/home/jenkins/overcloud-full.qcow2,cache=writeback,discard=unmap,id=hd0,if=none \
    -device scsi-hd,drive=hd0 \
    -drive file=/var/tmp/.guestfs-0/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none,format=raw \
    -device scsi-hd,drive=appliance \
    -device virtio-serial-pci \
    -serial stdio \
    -device sga \
    -chardev socket,path=/tmp/libguestfs83klHq/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -netdev user,id=usernet,net=169.254.0.0/16 \
    -device virtio-net-pci,netdev=usernet \
    -append 'panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 guestfs_network=1 TERM=unknown'
KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000506e3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 e5 af ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Expected results:

This should pass

Additional info:
Due to https://bugs.launchpad.net/tripleo/+bug/1743749 bug, we run virt-customize with following environment variables:-

export SUPERMIN_KERNEL_VERSION=3.10.0-693.el7.x86_64
export SUPERMIN_KERNEL=/boot/vmlinuz-$SUPERMIN_KERNEL_VERSION
export SUPERMIN_MODULES=/lib/modules/$SUPERMIN_KERNEL_VERSION

Jobs are running on RDO Cloud.

Comment 2 Dr. David Alan Gilbert 2018-04-09 14:41:04 UTC

Apparently that cat /proc/cpuinfo is from the L1 guest.

We also need the CPU information from the host.
Please also check dmesg on the host and provide detail of the host qemu/kernel etc.

Comment 3 Bandan Das 2018-04-09 14:59:17 UTC

> [jenkins@undercloud ~]$ tail -f .__repo_setup.sh.log 
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000 
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 e5 af ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0
> 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00
> ^C

Do you have the full log when qemu quits ? It seems <ea> is jmp, so I am wildly guessing that could be a EPT violation ? Do you have the dmesg from L0 ?

Comment 4 Yatin Karel 2018-04-10 04:31:08 UTC

(In reply to Bandan Das from comment #3)
> > [jenkins@undercloud ~]$ tail -f .__repo_setup.sh.log 
> > GS =0000 00000000 0000ffff 00009300
> > LDT=0000 00000000 0000ffff 00008200
> > TR =0000 00000000 0000ffff 00008b00
> > GDT=     00000000 0000ffff
> > IDT=     00000000 0000ffff
> > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000 
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000000
> > Code=00 66 89 d8 66 e8 e5 af ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0
> > 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00
> > ^C
> 
> Do you have the full log when qemu quits ? It seems <ea> is jmp, so I am
> wildly guessing that could be a EPT violation ? Do you have the dmesg from
> L0 ?

I only have below logs, virt-customize keeps on running for two hours until job timeouts, i noticed that virt-customize process is never finishing:-

[jenkins@undercloud ~]$ cat .__repo_setup.sh.log
[ 0.0] Examining the guest ...
libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
libguestfs: trace: set_network true
libguestfs: trace: set_network = 0
libguestfs: trace: add_drive "overcloud-full.qcow2" "readonly:false" "protocol:file" "discard:besteffort"
libguestfs: trace: add_drive = 0
libguestfs: trace: launch
libguestfs: trace: get_tmpdir
libguestfs: trace: get_tmpdir = "/tmp"
libguestfs: trace: version
libguestfs: trace: version = <struct guestfs_version = major: 1, minor: 36, release: 3, extra: rhel=7,release=6.el7_4.3,libvirt, >
libguestfs: trace: get_backend
libguestfs: trace: get_backend = "direct"
libguestfs: launch: program=virt-customize
libguestfs: launch: version=1.36.3rhel=7,release=6.el7_4.3,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt
libguestfs: launch: backend registered: direct
libguestfs: launch: backend=direct
libguestfs: launch: tmpdir=/tmp/libguestfs15JLpA
libguestfs: launch: umask=0022
libguestfs: launch: euid=0
libguestfs: trace: get_backend_setting "force_tcg"
libguestfs: trace: get_backend_setting = NULL (error)
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: begin building supermin appliance
libguestfs: run supermin
libguestfs: command: run: /usr/bin/supermin5
libguestfs: command: run: \ --build
libguestfs: command: run: \ --verbose
libguestfs: command: run: \ --if-newer
libguestfs: command: run: \ --lock /var/tmp/.guestfs-0/lock
libguestfs: command: run: \ --copy-kernel
libguestfs: command: run: \ -f ext2
libguestfs: command: run: \ --host-cpu x86_64
libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d
libguestfs: command: run: \ -o /var/tmp/.guestfs-0/appliance.d
supermin: version: 5.1.16
supermin: rpm: detected RPM version 4.11
supermin: package handler: fedora/rpm
supermin: acquiring lock on /var/tmp/.guestfs-0/lock
supermin: build: /usr/lib64/guestfs/supermin.d
supermin: reading the supermin appliance
supermin: build: visiting /usr/lib64/guestfs/supermin.d/base.tar.gz type gzip base image (tar)
supermin: build: visiting /usr/lib64/guestfs/supermin.d/daemon.tar.gz type gzip base image (tar)
supermin: build: visiting /usr/lib64/guestfs/supermin.d/excludefiles type uncompressed excludefiles
supermin: build: visiting /usr/lib64/guestfs/supermin.d/hostfiles type uncompressed hostfiles
supermin: build: visiting /usr/lib64/guestfs/supermin.d/init.tar.gz type gzip base image (tar)
supermin: build: visiting /usr/lib64/guestfs/supermin.d/packages type uncompressed packages
supermin: build: visiting /usr/lib64/guestfs/supermin.d/udev-rules.tar.gz type gzip base image (tar)
supermin: mapping package names to installed packages
supermin: resolving full list of package dependencies
supermin: build: 189 packages, including dependencies
supermin: build: 31303 files
supermin: build: 7545 files, after matching excludefiles
supermin: build: 7554 files, after adding hostfiles
supermin: build: 7544 files, after removing unreadable files
supermin: build: 7574 files, after munging
supermin: kernel: SUPERMIN_KERNEL environment variable /boot/vmlinuz-3.10.0-693.el7.x86_64
supermin: kernel: SUPERMIN_KERNEL_VERSION environment variable 3.10.0-693.el7.x86_64
supermin: kernel: SUPERMIN_KERNEL version 3.10.0-693.el7.x86_64
supermin: kernel: SUPERMIN_MODULES environment variable = /lib/modules/3.10.0-693.el7.x86_64
supermin: kernel: kernel_version 3.10.0-693.el7.x86_64
supermin: kernel: modules /lib/modules/3.10.0-693.el7.x86_64
supermin: ext2: creating empty ext2 filesystem '/var/tmp/.guestfs-0/appliance.d.soljd01f/root'
supermin: ext2: populating from base image
supermin: ext2: copying files from host filesystem
supermin: ext2: copying kernel modules
supermin: ext2: creating minimal initrd '/var/tmp/.guestfs-0/appliance.d.soljd01f/initrd'
supermin: ext2: wrote 31 modules to minimal initrd
supermin: renaming /var/tmp/.guestfs-0/appliance.d.soljd01f to /var/tmp/.guestfs-0/appliance.d
libguestfs: finished building supermin appliance
libguestfs: begin testing qemu features
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: checking for previously cached test results of /usr/libexec/qemu-kvm, in /var/tmp/.guestfs-0
libguestfs: command: run: /usr/libexec/qemu-kvm
libguestfs: command: run: \ -display none
libguestfs: command: run: \ -help
libguestfs: qemu version 2.9
libguestfs: command: run: /usr/libexec/qemu-kvm
libguestfs: command: run: \ -display none
libguestfs: command: run: \ -machine accel=kvm:tcg
libguestfs: command: run: \ -device ?
libguestfs: saving test results
libguestfs: trace: get_sockdir
libguestfs: trace: get_sockdir = "/tmp"
libguestfs: finished testing qemu features
libguestfs: trace: get_backend_setting "gdb"
libguestfs: trace: get_backend_setting = NULL (error)
[15139ms] /usr/libexec/qemu-kvm \
    -global virtio-blk-pci.scsi=off \
    -nodefconfig \
    -enable-fips \
    -nodefaults \
    -display none \
    -machine accel=kvm:tcg \
    -cpu host \
    -m 500 \
    -no-reboot \
    -rtc driftfix=slew \
    -no-hpet \
    -global kvm-pit.lost_tick_policy=discard \
    -kernel /var/tmp/.guestfs-0/appliance.d/kernel \
    -initrd /var/tmp/.guestfs-0/appliance.d/initrd \
    -object rng-random,filename=/dev/urandom,id=rng0 \
    -device virtio-rng-pci,rng=rng0 \
    -device virtio-scsi-pci,id=scsi \
    -drive file=/home/jenkins/overcloud-full.qcow2,cache=writeback,discard=unmap,id=hd0,if=none \
    -device scsi-hd,drive=hd0 \
    -drive file=/var/tmp/.guestfs-0/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none,format=raw \
    -device scsi-hd,drive=appliance \
    -device virtio-serial-pci \
    -serial stdio \
    -device sga \
    -chardev socket,path=/tmp/libguestfs83klHq/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -netdev user,id=usernet,net=169.254.0.0/16 \
    -device virtio-net-pci,netdev=usernet \
    -append 'panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 guestfs_network=1 TERM=unknown'
KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000506e3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 e5 af ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


For L0, i don't have access to L0, L0 is compute node on RDO Cloud. I have asked people to provide information from the host, they should provide it when they are available

Comment 5 Bandan Das 2018-04-11 16:18:51 UTC

(In reply to Yatin Karel from comment #4)
> (In reply to Bandan Das from comment #3)
...
> 
> For L0, i don't have access to L0, L0 is compute node on RDO Cloud. I have
> asked people to provide information from the host, they should provide it
> when they are available

Oops! Sorry, I thought this is from L0. So, KVM: entry failed, hardware error 0x0 means L1 vmlaunch/vmresume failed but there's no further information in hardware_entry_failure_reason. I think the log from L0, when we have it, will be helpful. I am looking at other bugs where hardware error 0x0 was encountered to see if it rings a bell.

Comment 7 Paolo Bonzini 2018-08-03 13:21:37 UTC

Yatin, is it not possible at all to get some information on the failing hosts?

This is failing at the very first instruction in the guest, so it is likely that the issue is not in your guest, but rather in the bare metal hypervisor.

Comment 8 Jiri Kortus 2019-10-03 14:44:17 UTC

I hit the same bug when running lorax-composer test that in fact uses nested virtualization on a RHEL-7 host (not under my control), so I'm reopening the bug:

KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 14 a4 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


ATM I don't have any specific information about the host system, however I can try to gather the required bits if I'm told what I should look for.

Comment 9 Dr. David Alan Gilbert 2019-10-03 14:49:09 UTC

(In reply to Jiri Kortus from comment #8)
> I hit the same bug when running lorax-composer test that in fact uses nested
> virtualization on a RHEL-7 host (not under my control), so I'm reopening the
> bug:
> 
> KVM: entry failed, hardware error 0x0
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00009b00
> SS =0000 00000000 0000ffff 00009300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000 
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 14 a4 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0
> 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00
> 
> 
> ATM I don't have any specific information about the host system, however I
> can try to gather the required bits if I'm told what I should look for.

From the host can you find out the version of qemu and kernel it's running and preferably the /var/log/libvirt/qemu/whatever.log for your VM.
Oh and the /proc/cpuinfo as well.

From your guest can you show how you're running your VM.

Comment 10 Martin Pitt 2019-10-09 12:36:52 UTC

Hey, I maintain the bare-metal hosts that runs (most of) the cockpit CI. These run RHEL 7.5 (the latest version available in our e2e Satellite -- I asked for something newer, but that's going slow).

> From the host can you find out the version of qemu and kernel it's running

# uname -a
Linux cockpit-8 3.10.0-862.11.6.el7.x86_64 #1 SMP Fri Aug 10 16:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

kernel-3.10.0-862.11.6.el7.x86_64

The host does not have QEMU installed. The tests all run in a Fedora 30 container (cockpit/tasks, see https://github.com/cockpit-project/cockpituous/blob/master/tasks/Dockerfile), and have qemu-kvm-3.1.1-2.fc30.x86_64.

They are running session libvirt, so there's no /var/log/libvirt. Logs are in ~/.cache/libvirt/qemu/log/ (or, in the container /build/libvirt/qemu/log/), but they are transient. @Jiri, for grabbing a meaningful log (both libvirt and kmesg), I would need to observe a current run. Do you have a pointer to a failed PR/log? I can then re-run the test to trigger the failure.

Comment 11 Jiri Kortus 2019-10-09 12:47:39 UTC

Martin,

thank you for providing the info. The related PR was https://github.com/weldr/lorax/pull/823, resulting log is https://209.132.184.41:8493/logs/pull-823-20191003-093434-e88d22b9-weldr-lorax--rhel-7-7-tar/log. However, after restarting the failed test, the subsequent run finished without any problem: https://209.132.184.41:8493/logs/pull-823-20191003-144747-e88d22b9-weldr-lorax--rhel-7-7-tar/log, so it might need restarting a few times as it seems that the reproducibility is not 100%.

Comment 12 Dr. David Alan Gilbert 2019-10-09 13:39:03 UTC

(In reply to Martin Pitt from comment #10)
> Hey, I maintain the bare-metal hosts that runs (most of) the cockpit CI.
> These run RHEL 7.5 (the latest version available in our e2e Satellite -- I
> asked for something newer, but that's going slow).
> 
> > From the host can you find out the version of qemu and kernel it's running
> 
> # uname -a
> Linux cockpit-8 3.10.0-862.11.6.el7.x86_64 #1 SMP Fri Aug 10 16:55:11 UTC
> 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> kernel-3.10.0-862.11.6.el7.x86_64
> 
> The host does not have QEMU installed. The tests all run in a Fedora 30
> container (cockpit/tasks, see
> https://github.com/cockpit-project/cockpituous/blob/master/tasks/Dockerfile),
> and have qemu-kvm-3.1.1-2.fc30.x86_64.

That's a pretty frankenstienien mix; so can you just confirm to us the full stack:


   L0/actual-host Rhel 7.5 kernel + f30 qemu
   L1  CentOS 7.4 kernel + RHEL7.4-ev qemu (2.9)
   L2 ?? Supermin appliance?

Is that about right?
I can be pretty sure no one has ever tried that mix before.

Running an f30 qemu on a 7.5 kernel is definitely unsupported; and there keep
being lots of nesting fixes, so running a 7.5 kernel for nesting is a bad idea.

If you're running f30 qemu's on the host, why are you bothering running RHEL rather than using an f30 host?
an f30 host+f30 qemu would stand a better chance.

> They are running session libvirt, so there's no /var/log/libvirt. Logs are
> in ~/.cache/libvirt/qemu/log/ (or, in the container
> /build/libvirt/qemu/log/), but they are transient. @Jiri, for grabbing a
> meaningful log (both libvirt and kmesg), I would need to observe a current
> run. Do you have a pointer to a failed PR/log? I can then re-run the test to
> trigger the failure.

I'd watch on the host dmesg and L1 dmesg to see if they spit anything.

Comment 13 Martin Pitt 2019-10-09 14:08:37 UTC

(In reply to Dr. David Alan Gilbert from comment #12)
> That's a pretty frankenstienien mix; so can you just confirm to us the full
> stack:
> 
>    L0/actual-host Rhel 7.5 kernel + f30 qemu

Right, where qemu runs from a fedora-30 container. We use that container across all our CI, e. g. also on CentOS CI, and we can't really rely on the host's qemu for that. RHEL 7.5 is by far too old to be able to run the integration tests for most of our projects (starting with missing Python 3, etc.)

>    L1  CentOS 7.4 kernel + RHEL7.4-ev qemu (2.9)

We do have many images, amongst them centos-7, but the lorax tests that Jiri was referring to are running a RHEL 7.7 image, not CentOS. I. e. the "inner" virtualization is using

kernel-3.10.0-1062.el7.x86_64
qemu-kvm-1.5.3-167.el7.x86_64

>    L2 ?? Supermin appliance?

That's beyond my control, it depends on what the lorax tests try to boot there. But I can't imagine that it's literally supermin, I figure it's just building a tiny RHEL image and test-boot that?

> Running an f30 qemu on a 7.5 kernel is definitely unsupported; and there keep
> being lots of nesting fixes, so running a 7.5 kernel for nesting is a bad
> idea.

I noticed that it doesn't work so well with nested KVM. Non-nested KVM works just fine, and so far most projects that use our CI don't bother with nested virt (as it's a neverending source of trouble), but apparently the lorax tests do.

> If you're running f30 qemu's on the host, why are you bothering running RHEL
> rather than using an f30 host?
> an f30 host+f30 qemu would stand a better chance.

I know, but as I said the only OS available on our e2e Satellite is RHEL 7.5 (well, and 7.4). I'd like to get F30 or RHEL 8 there, but that's not something to track in this bz.

> I'd watch on the host dmesg and L1 dmesg to see if they spit anything.

Ack, I'll try to reproduce.

Comment 14 Dr. David Alan Gilbert 2019-10-09 14:26:23 UTC

(In reply to Martin Pitt from comment #13)
> (In reply to Dr. David Alan Gilbert from comment #12)
> > That's a pretty frankenstienien mix; so can you just confirm to us the full
> > stack:
> > 
> >    L0/actual-host Rhel 7.5 kernel + f30 qemu
> 
> Right, where qemu runs from a fedora-30 container. We use that container
> across all our CI, e. g. also on CentOS CI, and we can't really rely on the
> host's qemu for that. RHEL 7.5 is by far too old to be able to run the
> integration tests for most of our projects (starting with missing Python 3,
> etc.)

Yes, please encourage your satellite provider to upgrade to something newer

> >    L1  CentOS 7.4 kernel + RHEL7.4-ev qemu (2.9)
> 
> We do have many images, amongst them centos-7, but the lorax tests that Jiri
> was referring to are running a RHEL 7.7 image, not CentOS. I. e. the "inner"
> virtualization is using
> 
> kernel-3.10.0-1062.el7.x86_64
> qemu-kvm-1.5.3-167.el7.x86_64

I wouldn't use 1.5.3 for testing unless you're explicitly testing it;  it's
better to use the qemu-kvm-rhev stream (i.e. the centos qemu-kvm-ev stream)
to get something more modern.
 
> >    L2 ?? Supermin appliance?
> 
> That's beyond my control, it depends on what the lorax tests try to boot
> there. But I can't imagine that it's literally supermin, I figure it's just
> building a tiny RHEL image and test-boot that?

Yeh; the L2 is the least critical part.
 
> > Running an f30 qemu on a 7.5 kernel is definitely unsupported; and there keep
> > being lots of nesting fixes, so running a 7.5 kernel for nesting is a bad
> > idea.
> 
> I noticed that it doesn't work so well with nested KVM. Non-nested KVM works
> just fine, and so far most projects that use our CI don't bother with nested
> virt (as it's a neverending source of trouble), but apparently the lorax
> tests do.

Yeh; nesting has taken a lot of work, so you really want the newest possible
if you're doing nesting; especially the L1 kernel and L1 qemu.
 
> > If you're running f30 qemu's on the host, why are you bothering running RHEL
> > rather than using an f30 host?
> > an f30 host+f30 qemu would stand a better chance.
> 
> I know, but as I said the only OS available on our e2e Satellite is RHEL 7.5
> (well, and 7.4). I'd like to get F30 or RHEL 8 there, but that's not
> something to track in this bz.
>
IMHO the chances of nesting getting fixed for your weird combo setup are pretty slim unless
something jumps outas obvious.

> > I'd watch on the host dmesg and L1 dmesg to see if they spit anything.
> 
> Ack, I'll try to reproduce.

Comment 15 Dr. David Alan Gilbert 2019-10-09 14:26:58 UTC

> Yeh; nesting has taken a lot of work, so you really want the newest possible
> if you're doing nesting; especially the L1 kernel and L1 qemu.

Oops, I meant L0 kernel and L0 qemu

Comment 16 Martin Pitt 2019-10-09 15:43:15 UTC

(In reply to Jiri Kortus from comment #11)
> thank you for providing the info. The related PR was
> https://github.com/weldr/lorax/pull/823, resulting log is
> https://209.132.184.41:8493/logs/pull-823-20191003-093434-e88d22b9-weldr-
> lorax--rhel-7-7-tar/log. However, after restarting the failed test, the
> subsequent run finished without any problem:
> https://209.132.184.41:8493/logs/pull-823-20191003-144747-e88d22b9-weldr-
> lorax--rhel-7-7-tar/log, so it might need restarting a few times as it seems
> that the reproducibility is not 100%.

Thanks! These two ran on ci-srv-02 and -05, which are identical. The only node that is a bit special is cockpit-7 (it has a slightly newer kernel from RHEL 7.6).

Anyway, I tried to reproduce that and it's quite easy to do. On the RHEL 7.5 host, I went into the cockpit/tasks container (as above), and started a RHEL 7.7 image:

docker exec -it cockpit-tasks-1 /bin/bash
cockpit-project/bots/vm-run rhel-7-7

and inside, I tried to start another VM. Initially on an actual image, but it turns out that this doesn't matter at all: the simplest possible invocation triggers this:

$ /usr/libexec/qemu-kvm -display none  
KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 14 a4 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

This is pretty well exactly the same error as in the first log [1], which at least takes libvirt and the particular L2 image out of the picture.

When that happens, the host (L0) has this to say in dmesg -w:

  [6055296.355058] nested_vmx_exit_handled failed vm entry 7

There are no dmesg logs at all from the outer VM (L1).

[1] https://209.132.184.41:8493/logs/pull-823-20191003-093434-e88d22b9-weldr-lorax--rhel-7-7-tar/log

> an f30 host+f30 qemu would stand a better chance.

FTR, I already tried to grab the kernel.rpm from F30 and install it on RHEL 7.5, but that totally doesn't work. The whole grub configuration, dependencies, etc. are just too different. Is there any newer kernel for CentOS/RHEL 7 in some COPR or so that I could try?

Comment 17 Dr. David Alan Gilbert 2019-10-09 15:55:17 UTC

>  [6055296.355058] nested_vmx_exit_handled failed vm entry 7

same error as in ancient:
  https://bugzilla.redhat.com/show_bug.cgi?id=1069089
(Supposedly fixed in 7.0)
and:
  https://bugzilla.redhat.com/show_bug.cgi?id=1086058
???

What -cpu option is getting passed to the top level qemu?

>> an f30 host+f30 qemu would stand a better chance.

> FTR, I already tried to grab the kernel.rpm from F30 and install it on RHEL 7.5, but that totally doesn't work. The whole grub configuration, dependencies, etc. are just too different. Is there any newer kernel for CentOS/RHEL 7 in some COPR or so that I could try?

there was someone internally who used to keep a build set of current kernels that were useful for testing.
You could also try a recent CentOS 7 kernel?

Comment 18 Martin Pitt 2019-10-10 17:16:22 UTC

> What -cpu option is getting passed to the top level qemu?

We specify <cpu mode='host-passthrough'/> in the libvirt machine, which gets translated to "-cpu host". As far as I understand it, that should still be the most compatible option for nested KVM?

> there was someone internally who used to keep a build set of current kernels that were useful for testing.

I'm happy to give that a shot, if you have some pointer?

> You could also try a recent CentOS 7 kernel?

Latest CentOS 7 has kernel 3.10.0-1062.1.1.el7.x86_64 (smells like RHEL 7.7 based), i. e. it's slightly newer than what our hosts currently have (RHEL 7.5), but still very old. I installed it on one of our machines, and right after a reboot the above nested qemu-kvm does not crash any more; I tried to boot a CirrOS image, and at least got a grub prompt, although I never see anything beyond that (locally, I see the serial console). However, I rebooted another machine (uptime ~ 70 days) with the original 3.10.0-862.11.6 kernel, and now nested virt works as well.

So this is more about the kvm subsystem accumulating some cruft over time that breaks nested VM at some point, and not (necessarily) the 862 → 1062 kernel update. Even the latter is really old, for testing it would be useful to have a 4.x or 5.x kernel.

@Jiri: I rebooted all our CI machines now, so you are welcome to do another attempt at running your test. At some point it will likely break again, but we can see how often that happens -- if it lasts a few weeks, rebooting is fine; if it only lasts an hour, rebooting becomes too intrusive.

Comment 19 Paolo Bonzini 2019-10-11 06:07:09 UTC

> Latest CentOS 7 has kernel 3.10.0-1062.1.1.el7.x86_64 (smells like RHEL 7.7 based), i. e. it's
> slightly newer than what our hosts currently have (RHEL 7.5), but still very old.

Remember that we do a lot of backports from more recent kernels.  Especially for nested
virtualization a lot of fixes were backported to both 7.7 and 7.6.z.

> So this is more about the kvm subsystem accumulating some cruft over time that breaks nested
> VM at some point

Yes, this was reported in the past (https://bugzilla.redhat.com/show_bug.cgi?id=1565739)
and it should be fixed, or at least improved, in kernel-3.10.0-1035.el7 and
kernel-3.10.0-957.26.1.el7.

Comment 20 Martin Pitt 2019-10-11 06:20:21 UTC

Thanks David and Paolo for your help! @Jiri, please let me know how it goes on the current infra -- if the CentOS 7 kernel helps, I'm happy to roll this out on all machines. I'm happy to run tests manually on the machine that already has it.