Description of problem: Issue reproduced with this job: https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/1/ Some VM instances are running on compute nodes. Compute nodes are rebooted. When the reboots end successfully, the test waits until all instances state is SHUTOFF (via nova API). Then, all instances are started via nova API. They all reached status ACTIVE. However, some of those instances are paused: [root@compute-1 ~]# podman exec -it -uroot nova_libvirt virsh list --all Id Name State ----------------------------------- 2 instance-00000192 paused 3 instance-0000018c running 4 instance-00000186 running 5 instance-00000180 paused 'openstack server show' command shows this: OS-EXT-STS:power_state | Paused According to nova-compute logs, these instances were unexpectedly paused immediately after they had been started: 2020-10-22 21:14:43.337 7 DEBUG nova.virt.driver [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] Emitting event <LifecycleEvent: 1603401283.228271, 8ab0c045-14b8-4333-bdca-e897086958f5 => Started> emit_event /usr/lib/python3.6/site-packages/nova/virt/driver.py:1708 2020-10-22 21:14:43.337 7 INFO nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] VM Started (Lifecycle Event) ... 2020-10-22 21:14:50.933 7 DEBUG nova.virt.driver [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] Emitting event <LifecycleEvent: 1603401290.9325552, 8ab0c045-14b8-4333-bdca-e897086958f5 => Paused> emit_event /usr/lib/python3.6/site-packages/nova/virt/driver.py:1708 2020-10-22 21:14:50.933 7 INFO nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] VM Paused (Lifecycle Event) 2020-10-22 21:14:50.987 7 DEBUG nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] Checking state _get_power_state /usr/lib/python3.6/site-packages/nova/compute/manager.py:1498 2020-10-22 21:14:50.992 7 DEBUG nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] Synchronizing instance power state after lifecycle event "Paused"; current vm_state: active, current task_state: None, current DB power_state: 1, VM power_state: 3 handle_lifecycle_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:1250 2020-10-22 21:14:51.038 7 INFO nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (3). Updating power_state in the DB to match the hypervisor. 2020-10-22 21:14:51.233 7 WARNING nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] Instance is paused unexpectedly. Ignore. For some reason this only happens with some instances. Workarounds (both worked): 1) openstack server reboot <server-id> 2) podman exec -it -uroot nova_libvirt virsh reset <domain-id>; podman exec -it -uroot nova_libvirt virsh resume <domain-id> Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20201021.n.0 root@compute-1 ~]# podman exec -it -uroot nova_libvirt rpm -qa | grep nova puppet-nova-15.6.1-1.20200814103355.51a6857.el8ost.noarch python3-novaclient-15.1.1-0.20200629073413.79959ab.el8ost.noarch openstack-nova-common-20.4.1-1.20200914172612.el8ost.noarch openstack-nova-compute-20.4.1-1.20200914172612.el8ost.noarch openstack-nova-migration-20.4.1-1.20200914172612.el8ost.noarch python3-nova-20.4.1-1.20200914172612.el8ost.noarch How reproducible: Some previous tobiko jobs reproduced the issue too: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/38/ https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/37/ Steps to Reproduce: 1. create workload (instances) 2. reboot compute nodes 3. openstack server list (check instances status if SHUTOFF) 4. openstack server start <vm-ids> 5. all instances are ACTIVE, but some of them are paused
Created attachment 1723748 [details] nova compute logs instance 8ab0c045-14b8-4333-bdca-e897086958f5 is paused unexpectedly. instance fd8003ba-c528-44c7-bfa5-a167856a2aab is started successfully.
Can you share the QEMU log for the instance from the underlying compute /var/log/libvirt/qemu/instance-00000180
Apologies I didn't think to check the jobs for the logs first, the following is seen in the QEMU log: https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/1/artifact/compute-1.tar.gz compute-1/var/log/libvirt/qemu/instance-00000180.log 2020-10-22 21:14:42.680+0000: starting up libvirt version: 6.0.0, package: 25.4.module+el8.2.1+8060+c0c58169 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-09-11-18:58:56, ), qemu version: 4.2.0qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4, kernel: 4.18.0-193.28.1.el8_2.x86_64, hostname: compute-1.redhat.local LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180/.config \ QEMU_AUDIO_DRV=none \ /usr/libexec/qemu-kvm \ -name guest=instance-00000180,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-instance-00000180/master-key.aes \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu Broadwell,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,arch-capabilities=on,xsaveopt=on,pdpe1gb=on,abm=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,rtm=on,hle=on \ -m 128 \ -overcommit mem-lock=off \ -smp 1,sockets=1,dies=1,cores=1,threads=1 \ -uuid 8ab0c045-14b8-4333-bdca-e897086958f5 \ -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.4.1-1.20200914172612.el8ost,serial=8ab0c045-14b8-4333-bdca-e897086958f5,uuid=8ab0c045-14b8-4333-bdca-e897086958f5,family=Virtual Machine' \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=37,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -boot strict=on \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/89f7a9e95a22b48fb8452ae2471efa5ae525e746","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ -blockdev '{"driver":"file","filename":"/var/lib/nova/instances/8ab0c045-14b8-4333-bdca-e897086958f5/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \ -device virtio-net-pci,rx_queue_size=512,host_mtu=1442,netdev=hostnet0,id=net0,mac=fa:16:3e:06:f6:c7,bus=pci.0,addr=0x3 \ -add-fd set=3,fd=42 \ -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on \ -device isa-serial,chardev=charserial0,id=serial0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 172.17.1.68:4 \ -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/4 (label charserial0) 2020-10-22T21:14:42.785875Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. RAX=ffffffff90298110 RBX=0000000000000000 RCX=0000000000000001 RDX=ffff9767c762b4c0 RSI=ffffffff90e03de0 RDI=0000000000000000 RBP=ffffffff90e03e10 RSP=ffffffff90e03e10 R8 =000000018c0dccc1 R9 =0000000000000000 R10=ffffa4528003fd78 R11=0000000000000000 R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 RIP=ffffffff90298522 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> CS =0000 0000000000000000 00000000 00a09b00 DPL=0 CS64 [-RA] SS =0000 0000000000000000 00000000 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> FS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> GS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> LDT=0000 0000000000000000 00000000 00008000 DPL=0 <hiword> TR =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> GDT= 0000000000000000 00000000 IDT= 0000000000000000 00000000 CR0=80050033 CR2=00007f734aba0ae8 CR3=000000000286e002 CR4=00360ef0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000fffe0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=00 00 55 48 89 e5 e9 07 00 00 00 0f 00 2d f2 26 57 00 fb f4 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53
https://bugzilla.kernel.org/show_bug.cgi?id=198991 maybe related?
Moving to RHEL 8 AV qemu-kvm for review of the trace in c#3, as outlined in c#0 the use case here is guests failing to start after a host reboot. Let me know if I can provide anymore OpenStack specific context.
Thanks, Lee. Adding some information on the libvirt/kvm packages installed on the compute node: [root@compute-1 ~]# podman exec -it -uroot nova_libvirt rpm -qa | grep "libvirt\|qemu\|kvm" libvirt-daemon-kvm-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-libs-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 qemu-kvm-block-rbd-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 libvirt-daemon-driver-nwfilter-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-disk-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 qemu-kvm-block-curl-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 libvirt-daemon-driver-nodedev-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-iscsi-direct-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-client-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 qemu-kvm-core-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 ipxe-roms-qemu-20181214-5.git133f4c47.el8.noarch qemu-kvm-block-gluster-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 libvirt-daemon-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-secret-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-gluster-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-scsi-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-interface-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 qemu-img-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 qemu-kvm-block-ssh-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 libvirt-daemon-driver-qemu-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-iscsi-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-mpath-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-network-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 libvirt-daemon-config-nwfilter-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 qemu-kvm-block-iscsi-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 libvirt-daemon-driver-storage-logical-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-bash-completion-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 qemu-kvm-common-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 libvirt-daemon-driver-storage-core-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 libvirt-daemon-driver-storage-rbd-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
I completely forgot to also note that the `compute` host here is nested at L1, with the guests running on L2. The following tarball has logs from the L0 RHEL 8 host: https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/1/artifact/hypervisor.tar.gz @eolivare you might want to provide a full sosreport of this host if you can. I also note that the underlying host doesn't appear to be using RHEL AV?
Please, find the sosreport of the BM server in the following link: http://file.mad.redhat.com/eolivare/BZ1890895/sosreport-panther23-BZ1890895-2020-10-23-dfuycbv.tar.xz
If this is easily reproducible, please set the kvm_intel.dump_invalid_vmcs=1 module option in the L1 compute host and see if there is a dump in the dmesg log right after L2 fails. Thanks!
(In reply to Paolo Bonzini from comment #10) > If this is easily reproducible, please set the kvm_intel.dump_invalid_vmcs=1 > module option in the L1 compute host and see if there is a dump in the dmesg > log right after L2 fails. Thanks! To enable it [NB: the below works when /dev/kvm not in use; i.e. VMs must be shutdown]: $> sudo rmmod kvm-intel $> sudo modprobe kvm-intel dump_invalid_vmcs=y $> cat /sys/module/kvm_intel/parameters/dump_invalid_vmcs Y
I can not start my guest with -m 128 , is there any special settings in the guest ?
re-enabling the needinfo on eolivare to answer Paolo's question in Comment 10 related to how easily reproducible and then to find out if the setup suggested by Paolo and further elaborated by Kashyap in Comment 13 to get more details in dmesg were what was provided in comment 15. It's just not clear. Also, if comment 4 is any indication, this perhaps is a kernel/KVM bug and not a qemu-kvm/general bug. A solution would thus be in RHEL, not RHEL-AV.
(In reply to John Ferlan from comment #17) > re-enabling the needinfo on eolivare to answer Paolo's question in Comment > 10 related to how easily reproducible and then to find out if the setup > suggested by Paolo and further elaborated by Kashyap in Comment 13 to get > more details in dmesg were what was provided in comment 15. It's just not > clear. > What is provided in Comment 15 corresponds with the request from Paolo and the commands suggested by Kashyap to enable kvm_intel.dump_invalid_vmcs were used during the test (more specifically, before the test). Regarding how easily it is to reproduce, the issue is easily reproduced on an OSP13 environment. Check "Steps to Reproduce" from Description. > Also, if comment 4 is any indication, this perhaps is a kernel/KVM bug and > not a qemu-kvm/general bug. A solution would thus be in RHEL, not RHEL-AV. Regarding this, I can only comment that the error referred at Comment 4 looks similar to the error I found in libvirt logs: KVM: entry failed, hardware error 0x80000021 That error can be found here: http://file.mad.redhat.com/eolivare/BZ1890895/logs-instance-000001c7.tgz
Paolo - any thoughts related to the traces provided ... the guest instance log from libvirt seems to consistently generate the following: char device redirected to /dev/pts/2 (label charserial0) 2020-10-23T13:30:56.654362Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. RAX=ffffffff8fe98110 RBX=0000000000000000 RCX=0000000000000001 RDX=ffff9a8d4762b4c0 RSI=ffffffff90a03de0 RDI=0000000000000000 RBP=ffffffff90a03e10 RSP=ffffffff90a03e10 R8 =0000000012e2ae29 R9 =0000000000000000 R10=ffffc0138005fcd8 R11=0000000000000000 R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 RIP=ffffffff8fe98522 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> CS =0000 0000000000000000 00000000 00a09b00 DPL=0 CS64 [-RA] SS =0000 0000000000000000 00000000 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> FS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> GS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> LDT=0000 0000000000000000 00000000 00008000 DPL=0 <hiword> TR =0000 0000000000000000 00000000 00008000 DPL=0 <hiword> GDT= 0000000000000000 00000000 IDT= 0000000000000000 00000000 CR0=80050033 CR2=00007fa1d4f89de0 CR3=00000000031a8002 CR4=00360ef0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000fffe0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=00 00 55 48 89 e5 e9 07 00 00 00 0f 00 2d f2 26 57 00 fb f4 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 2020-10-23T13:49:02.175587Z qemu-kvm: terminating on signal 15 from pid 3211 (/usr/sbin/libvirtd) 2020-10-23 13:49:02.376+0000: shutting down, reason=destroyed 2020-10-23 13:49:02.918+0000: starting up libvirt version: 6.0.0, package: 25.4.module+el8.2.1+8060+c0c58169 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-09-11-18:58:56, ), qemu version: 4.2.0qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4, kernel: 4.18.0-193.28.1.el8_2.x86_64, hostname: compute-0.redhat.local ... The trace in dmesg.txt has: [ 133.307938] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 134.004514] *** Guest State *** [ 134.006828] CR0: actual=0x0000000080050033, shadow=0x0000000080050033, gh_mask=fffffffffffffff7 [ 134.008130] CR4: actual=0x00000000000626f0, shadow=0x00000000000606b0, gh_mask=ffffffffffffe871 [ 134.009613] CR3 = 0x000000000640a001 [ 134.010408] RSP = 0xffffffff86003e18 RIP = 0xffffffff84b8a394 [ 134.011644] RFLAGS=0x00000002 DR7 = 0x0000000000000400 [ 134.012648] Sysenter RSP=fffffe0000002200 CS:RIP=0010:ffffffff85601700 [ 134.013811] CS: sel=0x0000, attr=0x0a09b, limit=0x00000000, base=0x0000000000000000 [ 134.015146] DS: sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000 [ 134.016471] SS: sel=0x0000, attr=0x1c000, limit=0x00000000, base=0x0000000000000000 [ 134.017848] ES: sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000 [ 134.019041] FS: sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000 [ 134.020327] GS: sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000 [ 134.021869] GDTR: limit=0x00000000, base=0x0000000000000000 [ 134.023217] LDTR: sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000 [ 134.024584] IDTR: limit=0x00000000, base=0x0000000000000000 [ 134.025892] TR: sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000 [ 134.027349] EFER = 0x0000000000000500 PAT = 0x0407050600070106 [ 134.028373] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 [ 134.029516] Interruptibility = 00000000 ActivityState = 00000000 [ 134.030638] InterruptStatus = 0000 [ 134.031395] *** Host State *** [ 134.031968] RIP = 0xffffffffc0909dc0 RSP = 0xffffa7098427bca0 [ 134.032913] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 [ 134.033905] FSBase=00007f29624a6700 GSBase=ffff89691fbc0000 TRBase=fffffe0000130000 [ 134.035168] GDTBase=fffffe000012e000 IDTBase=fffffe0000000000 [ 134.036105] CR0=0000000080050033 CR3=00000007c894c006 CR4=0000000000362ee0 [ 134.037318] Sysenter RSP=fffffe000012f200 CS:RIP=0010:ffffffff8d801770 [ 134.038325] EFER = 0x0000000000000d01 PAT = 0x0407050600070106 [ 134.039206] *** Control State *** [ 134.039813] PinBased=000000ff CPUBased=b5a06dfa SecondaryExec=000233eb [ 134.041194] EntryControls=000053ff ExitControls=000befff [ 134.042064] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 [ 134.043197] VMEntry: intr_info=00000b0e errcode=00000000 ilen=00000000 [ 134.044397] VMExit: intr_info=00000000 errcode=00000000 ilen=00000000 [ 134.045563] reason=80000021 qualification=0000000000000000 [ 134.046624] IDTVectoring: info=00000000 errcode=00000000 [ 134.047496] TSC Offset = 0xffff722d436bc143 [ 134.048215] SVI|RVI = 00|00 TPR Threshold = 0x00 [ 134.048999] APIC-access addr = 0x00000007c5870000 virt-APIC addr = 0x00000007c865f000 [ 134.050290] PostedIntrVec = 0xf2 [ 134.050999] EPT pointer = 0x00000007c3d9d05e [ 134.052016] Virtual processor ID = 0x0001
*** Bug 1903134 has been marked as a duplicate of this bug. ***
I re-installed the host to rhel 8.2.0,still can not reproduce this bug. Host: kernel: 4.18.0-193.el8.x86_64 qemu-kvm: qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64 CPU Model name: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz L1 guest: kernel: 4.18.0-193.39.1.el8_2.x86_64 qemu-kvm: qemu-kvm-4.2.0-29.module+el8.2.1+8442+7a3eadf7.5.x86_64 libvirt: libvirt-daemon-6.0.0-25.5.module+el8.2.1+8680+ea98947b.x86_64 Setup 4 L2 guests on L1,and set them autostart. After reboot L1 guest, all L2 guests started and in running state.
(In reply to eolivare from comment #0) > Steps to Reproduce: > 1. create workload (instances) Hi Eduardo, Could you please specify what kind of 'workload' you mean here? Thanks. > 2. reboot compute nodes > 3. openstack server list (check instances status if SHUTOFF) > 4. openstack server start <vm-ids> > 5. all instances are ACTIVE, but some of them are paused
Right now there isn't even a reproducer. :(
I reproduced this bug finally. My environment is: Host: CPU module: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Kernel: 4.18.0-193.6.3.el8_2.x86_64 qemu-kvm: qemu-kvm-core-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64 L1 guest: Kernel: 4.18.0-193.14.3.el8_2.x86_64 qemu-kvm: qemu-kvm-core-4.2.0-29.module+el8.2.1+8442+7a3eadf7.5.x86_64 Reboot L1 by command: sudo chmod o+w /proc/sysrq-trigger; sudo echo b > /proc/sysrq-trigger after reboot L1 guest, check L2 status: # virsh list --all Id Name State ------------------------- 1 cirros3 running 2 cirros1 paused 3 cirros2 running 4 cirros4 running This is not easy to reproduce in my environment.
Qinghua, can you reproduce it just by starting and stopping VMs many times?
No, it is not reproduced for me just starting and stopping VMs many times in a loop.
Yes, done.
Closing due to lack of reproducer.