Description of problem: I tried to create a VM with guest agent with the following spec: http://pastebin.test.redhat.com/1059571 datavolume: http://pastebin.test.redhat.com/1059572 but I get this error message on the events of the VMI: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 55m virtualmachine-controller Created virtual machine pod virt-launcher-test-vm-l7fw2 Normal Created 55m virt-handler VirtualMachineInstance defined. Normal Started 55m virt-handler VirtualMachineInstance started. Warning SyncFailed 2m9s (x31 over 55m) virt-handler server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required')" I checked the logs (tail -n 200 /var/log/libvirt/qemu/*.log) in the virt-launcher pod and I noticed this error: -msg timestamp=on KVM: entry failed, hardware error 0x8 EAX=00000000 EBX=00000000 ECX=00000000 EDX=00080661 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=04 66 41 eb f1 66 83 c9 ff 66 89 c8 66 5b 66 5e 66 5f 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? also I checked on the virt-launcher pod the status of the VM: [mperetz@mperetz ~]$ oc rsh virt-launcher-simple-vm-kmnbc sh-4.4# virsh list Id Name State ---------------------------------- 1 default_simple-vm paused sh-4.4# exut\ > ^C sh-4.4# exit exit command terminated with exit code 130 [mperetz@mperetz ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY simple-vm 6m59s Running 10.128.2.40 oadp-12290-wqlcn-worker-0-llq8b True [mperetz@mperetz ~]$ additional details: lscpu of the worker nodes: http://pastebin.test.redhat.com/1059422 OCP version: 4.10 (OpenStack on PSI). Also tried 4.9. Openstack flavor: ci.m1.xlarge lscpu output: sh-4.4# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 8 NUMA node(s): 1 Vendor ID: GenuineIntel BIOS Vendor ID: Red Hat CPU family: 6 Model: 134 Model name: Intel Xeon Processor (Icelake) BIOS Model name: RHEL 7.6.0 PC (i440FX + PIIX, 1996) Stepping: 0 CPU MHz: 2294.608 BogoMIPS: 4589.21 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 md_clear arch_capabilities I'm not sure what exactly causes the issue based on the error message. I also tried it on OCP 4.10 with the same CNV version, but with Openstack flavor ci.standard.xl and with a different server for the worker nodes: sh-4.4# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 8 NUMA node(s): 1 Vendor ID: GenuineIntel BIOS Vendor ID: Red Hat CPU family: 6 Model: 85 Model name: Intel Xeon Processor (Skylake, IBRS) BIOS Model name: RHEL 7.6.0 PC (i440FX + PIIX, 1996) Stepping: 4 CPU MHz: 2095.076 BogoMIPS: 4190.15 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke md_clear arch_capabilities and there it works. Version-Release number of selected component (if applicable): CNV version: 4.9/4.10.2 (production) How reproducible: 100% on the specific platform with the Icelake cpu-model Steps to Reproduce: not sure exactly what is the root cause as mentioned above, but that's how I reproduce: 1. Create with flexy-install job openstack cluster on PSI, with OCP version 4.10 and flavor ci.m1.xlarge (which usually deploys the worker nodes on a server with the Icelake CPU model). 2. deploy the following data volume and VM (happened also with other templates, like alpine, so not necessarily these exact templates are required): http://pastebin.test.redhat.com/1059571 datavolume: http://pastebin.test.redhat.com/1059572 3. check the events of the VMI. Note you get this error evnetually: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required')" 4. Look for the other logs/statuses as mentioned in the problem description. Actual results: Expected results: Additional info:
Hi, Can you confirm: a) Is this running directly on the hardware or is it nested? I'm suspicious this is nested from some fo the debug b) what host CPUs exactly do you have? c) is there anything in the dmesg when you try and run the VMs? d) What's the qemu command line As KVM errors go, that's a particularly odd one.
I believe this is a nested environment bug using an old 8.2 host as L0, the same as bz 2103118. If it is a nested environment like that, then please dupe this to 2103118. A work around is to configure the L1 as an older CPU type.
This is obviously a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2103118. The crash dump is virtually identical and the configuration also involves OpenStack. @mperetz I don't believe this is a supported use-case anyway, as CNV only supports bare-metal nodes. IIUC https://bugzilla.redhat.com/show_bug.cgi?id=2103118 suggests either updating L0 to 8.6+ or lowering the CPU type in L1. I will close this as a duplicate unless there's a good reason to keep it open.
*** This bug has been marked as a duplicate of bug 2103118 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days