Description of problem: Issue reprodiced by tobiko test test_reboot_computes_recovery: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/43/testReport/tobiko.tests.faults.ha.test_cloud_recovery/DisruptTripleoNodesTest/Tobiko___test_reboot_computes_recovery/ compute-0 is hard rebooted: 2020-11-29 12:02:03,431 INFO tobiko.tests.faults.ha.cloud_disruptions | reboot exec: sudo chmod o+w /proc/sysrq-trigger;sudo echo b > /proc/sysrq-trigger on server: compute-0 All instances are in status SHUTOFF at 12:03:57 Request to start instance 0cf82ec3-efd1-4f6e-baf4-8f04efe90925 is sent via nova API at 12:04:07: 2020-11-29 12:04:07,156 DEBUG tobiko.openstack.nova._client | Waiting for server 0cf82ec3-efd1-4f6e-baf4-8f04efe90925 status to get from SHUTOFF to ACTIVE (progress=None%) Instance status is not ACTIVE 5 minutes later: 2020-11-29 12:09:03,483 DEBUG tobiko.openstack.nova._client | Waiting for server 0cf82ec3-efd1-4f6e-baf4-8f04efe90925 status to get from SHUTOFF to ACTIVE (progress=None%) The next test cases fail because that instance never reaches status ACTIVE (last checkings fail at 12:14:00) Nova compute logs: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/43/compute-0/var/log/containers/nova/nova-compute.log.gz 2020-11-29 12:04:09.452 8 DEBUG nova.compute.manager [req-557d9ed5-94f3-466e-9833-a83f1d9a4f96 f653062052794d108ffdf4add40a89d9 9869705add1e4c939e9f055a39d7950b - default default] [instance: 0cf82ec3-efd1-4f6e-baf4-8f04efe90925] No waiting events found dispatching network-vif-plugged-8d6b66ed-587b-4d2b-bd6a-7b02269e1ecd pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:361 2020-11-29 12:04:09.452 8 WARNING nova.compute.manager [req-557d9ed5-94f3-466e-9833-a83f1d9a4f96 f653062052794d108ffdf4add40a89d9 9869705add1e4c939e9f055a39d7950b - default default] [instance: 0cf82ec3-efd1-4f6e-baf4-8f04efe90925] Received unexpected event network-vif-plugged-8d6b66ed-587b-4d2b-bd6a-7b02269e1ecd for instance with vm_state stopped and task_state powering-on. OVN controller logs: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/43/compute-0/var/log/containers/openvswitch/ovn-controller.log.gz 2020-11-29T12:02:48.441Z|00013|binding|INFO|Releasing lport 8d6b66ed-587b-4d2b-bd6a-7b02269e1ecd from this chassis. 2020-11-29T12:02:48.442Z|00014|binding|INFO|Releasing lport 741e9af3-ee2b-42a7-9190-a216cb8f7d24 from this chassis. 2020-11-29T12:02:48.442Z|00015|binding|INFO|Releasing lport 9d6b0a2d-2bff-4059-8323-4019a554abc9 from this chassis. 2020-11-29T12:04:09.104Z|00016|binding|INFO|Claiming lport 8d6b66ed-587b-4d2b-bd6a-7b02269e1ecd for this chassis. 2020-11-29T12:04:09.104Z|00017|binding|INFO|8d6b66ed-587b-4d2b-bd6a-7b02269e1ecd: Claiming fa:16:3e:c3:ab:7f 10.100.114.175 2001:db8:0:72b0:f816:3eff:fec3:ab7f 2020-11-29T12:04:09.104Z|00018|binding|INFO|8d6b66ed-587b-4d2b-bd6a-7b02269e1ecd: Claiming unknown Can this bug be related to https://bugzilla.redhat.com/show_bug.cgi?id=1890895? Both of them are reproduced by test_reboot_computes_recovery, but BZ1890895 happens more often. Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20201110.n.1 How reproducible: Unfrequently (I only saw it once) Steps to Reproduce: 1. run tobiko test test_reboot_computes_recovery 2. 3.
Hi, I quickly skimmed the logs from Neutron and Nova compute but I don't see anything specific that I think would cause it to fail. But, I believe the error was in QEMU restarting the VM. Here's the creation of that the VM from the nova logs [0] 2020-11-29 10:38:39.943 7 DEBUG nova.virt.libvirt.driver [req-1ac3db66-07c2-4882-b177-a56a14b384c6 e678d845e06f46a7b588ccd7cc732eb9 48ce485e440c4ca0b10a4b4b1c4f24fb - default default] [instance: 0cf82ec3-efd1-4f6e-baf4-8f04efe90925] End _get_guest_xml xml=<domain type="kvm"> <uuid>0cf82ec3-efd1-4f6e-baf4-8f04efe90925</uuid> <name>instance-0000019a</name> ... So the instance is the "instance-0000019a". When I look at the libvirt logs for that particular instance I see that the VM failed to start [1] with: 2020-11-29T12:04:09.254297Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 00000000 00008000 CS =0000 00000000 00000000 00009b00 SS =0000 00000000 00000000 00009300 DS =0000 00000000 00000000 00008000 FS =0000 00000000 00000000 00008000 GS =0000 00000000 00000000 00008000 LDT=0000 00000000 00000000 00008000 TR =0000 00000000 00000000 00008000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 So maybe the problem is that the VM itself never started due to the error above and therefore never moved to ACTIVE. I don't know exactly what the error above means, maybe someone more familiar with libvirt/QEMU can give us some clues. [0] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/43/compute-0/var/log/containers/nova/nova-compute.log.1.gz [1] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/43/compute-0/var/log/libvirt/qemu/instance-0000019a.log.gz
*** This bug has been marked as a duplicate of bug 1890895 ***