1890895 – VM instances paused after compute reboots

Bug 1890895 - VM instances paused after compute reboots

Summary: VM instances paused after compute reboots

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Paolo Bonzini
QA Contact:	Qinghua Cheng
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1903134 (view as bug list)
Depends On:
Blocks:	1948358
TreeView+	depends on / blocked

Reported:	2020-10-23 08:46 UTC by Eduardo Olivares
Modified:	2021-08-25 01:47 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-23 12:11:04 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
nova compute logs (787.08 KB, application/gzip) 2020-10-23 08:48 UTC, Eduardo Olivares	no flags	Details
View All

Description Eduardo Olivares 2020-10-23 08:46:18 UTC

Description of problem:
Issue reproduced with this job: https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/1/


Some VM instances are running on compute nodes. Compute nodes are rebooted. When the reboots end successfully, the test waits until all instances state is SHUTOFF (via nova API). Then, all instances are started via nova API. They all reached status ACTIVE.

However, some of those instances are paused:
[root@compute-1 ~]# podman exec -it -uroot nova_libvirt virsh list --all
 Id   Name                State
-----------------------------------
 2    instance-00000192   paused
 3    instance-0000018c   running
 4    instance-00000186   running
 5    instance-00000180   paused

'openstack server show' command shows this:
OS-EXT-STS:power_state              | Paused


According to nova-compute logs, these instances were unexpectedly paused immediately after they had been started:
2020-10-22 21:14:43.337 7 DEBUG nova.virt.driver [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] Emitting event <LifecycleEvent: 1603401283.228271, 8ab0c045-14b8-4333-bdca-e897086958f5 => Started> emit_event /usr/lib/python3.6/site-packages/nova/virt/driver.py:1708
2020-10-22 21:14:43.337 7 INFO nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] VM Started (Lifecycle Event)
...
2020-10-22 21:14:50.933 7 DEBUG nova.virt.driver [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] Emitting event <LifecycleEvent: 1603401290.9325552, 8ab0c045-14b8-4333-bdca-e897086958f5 => Paused> emit_event /usr/lib/python3.6/site-packages/nova/virt/driver.py:1708
2020-10-22 21:14:50.933 7 INFO nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] VM Paused (Lifecycle Event)
2020-10-22 21:14:50.987 7 DEBUG nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] Checking state _get_power_state /usr/lib/python3.6/site-packages/nova/compute/manager.py:1498
2020-10-22 21:14:50.992 7 DEBUG nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] Synchronizing instance power state after lifecycle event "Paused"; current vm_state: active, current task_state: None, current DB power_state: 1, VM power_state: 3 handle_lifecycle_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:1250
2020-10-22 21:14:51.038 7 INFO nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (3). Updating power_state in the DB to match the hypervisor.
2020-10-22 21:14:51.233 7 WARNING nova.compute.manager [req-c0716a6a-b6ed-4c25-ac5e-22be9e587f01 - - - - -] [instance: 8ab0c045-14b8-4333-bdca-e897086958f5] Instance is paused unexpectedly. Ignore.


For some reason this only happens with some instances.


Workarounds (both worked):
1) openstack server reboot <server-id>
2) podman exec -it -uroot nova_libvirt virsh reset <domain-id>; podman exec -it -uroot nova_libvirt virsh resume <domain-id>



Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20201021.n.0

root@compute-1 ~]# podman exec -it -uroot nova_libvirt rpm -qa | grep nova
puppet-nova-15.6.1-1.20200814103355.51a6857.el8ost.noarch
python3-novaclient-15.1.1-0.20200629073413.79959ab.el8ost.noarch
openstack-nova-common-20.4.1-1.20200914172612.el8ost.noarch
openstack-nova-compute-20.4.1-1.20200914172612.el8ost.noarch
openstack-nova-migration-20.4.1-1.20200914172612.el8ost.noarch
python3-nova-20.4.1-1.20200914172612.el8ost.noarch


How reproducible:
Some previous tobiko jobs reproduced the issue too:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/38/
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/37/


Steps to Reproduce:
1. create workload (instances)
2. reboot compute nodes
3. openstack server list (check instances status if SHUTOFF)
4. openstack server start <vm-ids>
5. all instances are ACTIVE, but some of them are paused

Comment 1 Eduardo Olivares 2020-10-23 08:48:45 UTC

Created attachment 1723748 [details]
nova compute logs

instance 8ab0c045-14b8-4333-bdca-e897086958f5 is paused unexpectedly.
instance fd8003ba-c528-44c7-bfa5-a167856a2aab is started successfully.

Comment 2 Lee Yarwood 2020-10-23 09:30:22 UTC

Can you share the QEMU log for the instance from the underlying compute /var/log/libvirt/qemu/instance-00000180

Comment 3 Lee Yarwood 2020-10-23 09:35:43 UTC

Apologies I didn't think to check the jobs for the logs first, the following is seen in the QEMU log:

https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/1/artifact/compute-1.tar.gz

compute-1/var/log/libvirt/qemu/instance-00000180.log

2020-10-22 21:14:42.680+0000: starting up libvirt version: 6.0.0, package: 25.4.module+el8.2.1+8060+c0c58169 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-09-11-18:58:56, ), qemu version: 4.2.0qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4, kernel: 4.18.0-193.28.1.el8_2.x86_64, hostname: compute-1.redhat.local
LC_ALL=C \                                                                      
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \             
HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180 \                         
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180/.local/share \   
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180/.cache \        
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000180/.config \         
QEMU_AUDIO_DRV=none \                                                           
/usr/libexec/qemu-kvm \                                                         
-name guest=instance-00000180,debug-threads=on \                                
-S \                                                                            
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-instance-00000180/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \            
-cpu Broadwell,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,arch-capabilities=on,xsaveopt=on,pdpe1gb=on,abm=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,rtm=on,hle=on \
-m 128 \                                                                        
-overcommit mem-lock=off \                                                      
-smp 1,sockets=1,dies=1,cores=1,threads=1 \                                     
-uuid 8ab0c045-14b8-4333-bdca-e897086958f5 \                                    
-smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.4.1-1.20200914172612.el8ost,serial=8ab0c045-14b8-4333-bdca-e897086958f5,uuid=8ab0c045-14b8-4333-bdca-e897086958f5,family=Virtual Machine' \
-no-user-config \                                                               
-nodefaults \                                                                   
-chardev socket,id=charmonitor,fd=37,server,nowait \                            
-mon chardev=charmonitor,id=monitor,mode=control \                              
-rtc base=utc,driftfix=slew \                                                   
-global kvm-pit.lost_tick_policy=delay \                                        
-no-hpet \                                                                      
-no-shutdown \                                                                  
-boot strict=on \                                                               
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \                          
-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/89f7a9e95a22b48fb8452ae2471efa5ae525e746","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/8ab0c045-14b8-4333-bdca-e897086958f5/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \                             
-device virtio-net-pci,rx_queue_size=512,host_mtu=1442,netdev=hostnet0,id=net0,mac=fa:16:3e:06:f6:c7,bus=pci.0,addr=0x3 \
-add-fd set=3,fd=42 \                                                           
-chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on \                 
-device isa-serial,chardev=charserial0,id=serial0 \                             
-device usb-tablet,id=input0,bus=usb.0,port=1 \                                 
-vnc 172.17.1.68:4 \                                                            
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \                               
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \                     
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on                                                               
char device redirected to /dev/pts/4 (label charserial0)                        
2020-10-22T21:14:42.785875Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
KVM: entry failed, hardware error 0x80000021                                    
                                                                                
If you're running a guest on an Intel machine without unrestricted mode         
support, the failure can be most likely due to the guest entering an invalid       
state for Intel VT. For example, the guest maybe running in big real mode          
which is not supported on less recent Intel processors.                         
                                                                                
RAX=ffffffff90298110 RBX=0000000000000000 RCX=0000000000000001 RDX=ffff9767c762b4c0
RSI=ffffffff90e03de0 RDI=0000000000000000 RBP=ffffffff90e03e10 RSP=ffffffff90e03e10
R8 =000000018c0dccc1 R9 =0000000000000000 R10=ffffa4528003fd78 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff90298522 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0        
ES =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>                      
CS =0000 0000000000000000 00000000 00a09b00 DPL=0 CS64 [-RA]                    
SS =0000 0000000000000000 00000000 00c09300 DPL=0 DS   [-WA]                    
DS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>                      
FS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>                      
GS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>                      
LDT=0000 0000000000000000 00000000 00008000 DPL=0 <hiword>                      
TR =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>                      
GDT=     0000000000000000 00000000                                              
IDT=     0000000000000000 00000000                                              
CR0=80050033 CR2=00007f734aba0ae8 CR3=000000000286e002 CR4=00360ef0             
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000fffe0ff0 DR7=0000000000000400                                       
EFER=0000000000000d01                                                           
Code=00 00 55 48 89 e5 e9 07 00 00 00 0f 00 2d f2 26 57 00 fb f4 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53

Comment 4 Lee Yarwood 2020-10-23 09:37:15 UTC

https://bugzilla.kernel.org/show_bug.cgi?id=198991 maybe related?

Comment 5 Lee Yarwood 2020-10-23 09:45:00 UTC

Moving to RHEL 8 AV qemu-kvm for review of the trace in c#3, as outlined in c#0 the use case here is guests failing to start after a host reboot.

Let me know if I can provide anymore OpenStack specific context.

Comment 6 Eduardo Olivares 2020-10-23 09:47:43 UTC

Thanks, Lee.

Adding some information on the libvirt/kvm packages installed on the compute node:

[root@compute-1 ~]# podman exec -it -uroot nova_libvirt rpm -qa | grep "libvirt\|qemu\|kvm"
libvirt-daemon-kvm-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-libs-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
qemu-kvm-block-rbd-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
libvirt-daemon-driver-nwfilter-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-disk-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
qemu-kvm-block-curl-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
libvirt-daemon-driver-nodedev-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-iscsi-direct-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-client-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
qemu-kvm-core-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
ipxe-roms-qemu-20181214-5.git133f4c47.el8.noarch
qemu-kvm-block-gluster-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
libvirt-daemon-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-secret-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-gluster-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-scsi-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-interface-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
qemu-img-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
qemu-kvm-block-ssh-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
libvirt-daemon-driver-qemu-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-iscsi-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-mpath-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-network-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
libvirt-daemon-config-nwfilter-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
qemu-kvm-block-iscsi-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
libvirt-daemon-driver-storage-logical-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-bash-completion-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
qemu-kvm-common-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64
libvirt-daemon-driver-storage-core-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64
libvirt-daemon-driver-storage-rbd-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64

Comment 7 Lee Yarwood 2020-10-23 10:02:44 UTC

I completely forgot to also note that the `compute` host here is nested at L1, with the guests running on L2.

The following tarball has logs from the L0 RHEL 8 host:

https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko-neutron/1/artifact/hypervisor.tar.gz

@eolivare you might want to provide a full sosreport of this host if you can. 

I also note that the underlying host doesn't appear to be using RHEL AV?

Comment 8 Eduardo Olivares 2020-10-23 10:21:20 UTC

Please, find the sosreport of the BM server in the following link:
http://file.mad.redhat.com/eolivare/BZ1890895/sosreport-panther23-BZ1890895-2020-10-23-dfuycbv.tar.xz

Comment 10 Paolo Bonzini 2020-10-23 10:41:10 UTC

If this is easily reproducible, please set the kvm_intel.dump_invalid_vmcs=1 module option in the L1 compute host and see if there is a dump in the dmesg log right after L2 fails.  Thanks!

Comment 13 Kashyap Chamarthy 2020-10-23 13:20:21 UTC

(In reply to Paolo Bonzini from comment #10)
> If this is easily reproducible, please set the kvm_intel.dump_invalid_vmcs=1
> module option in the L1 compute host and see if there is a dump in the dmesg
> log right after L2 fails.  Thanks!

To enable it [NB: the below works when /dev/kvm not in use; i.e. VMs must be shutdown]:

    $> sudo rmmod kvm-intel
    $> sudo modprobe kvm-intel dump_invalid_vmcs=y
    $> cat /sys/module/kvm_intel/parameters/dump_invalid_vmcs
    Y

Comment 16 Qinghua Cheng 2020-10-27 02:26:04 UTC

I can not start my guest with -m 128 , is there any special settings in the guest ?

Comment 17 John Ferlan 2020-11-12 21:55:04 UTC

re-enabling the needinfo on eolivare to answer Paolo's question in Comment 10 related to how easily reproducible and then to find out if the setup suggested by Paolo and further elaborated by Kashyap in Comment 13 to get more details in dmesg were what was provided in comment 15. It's just not clear.

Also, if comment 4 is any indication, this perhaps is a kernel/KVM bug and not a qemu-kvm/general bug. A solution would thus be in RHEL, not RHEL-AV.

Comment 18 Eduardo Olivares 2020-11-13 12:45:03 UTC

(In reply to John Ferlan from comment #17)
> re-enabling the needinfo on eolivare to answer Paolo's question in Comment
> 10 related to how easily reproducible and then to find out if the setup
> suggested by Paolo and further elaborated by Kashyap in Comment 13 to get
> more details in dmesg were what was provided in comment 15. It's just not
> clear.
> 
What is provided in Comment 15 corresponds with the request from Paolo and the commands suggested by Kashyap to enable kvm_intel.dump_invalid_vmcs were used during the test (more specifically, before the test).
Regarding how easily it is to reproduce, the issue is easily reproduced on an OSP13 environment. Check "Steps to Reproduce" from Description.


> Also, if comment 4 is any indication, this perhaps is a kernel/KVM bug and
> not a qemu-kvm/general bug. A solution would thus be in RHEL, not RHEL-AV.
Regarding this, I can only comment that the error referred at Comment 4 looks similar to the error I found in libvirt logs:
KVM: entry failed, hardware error 0x80000021
That error can be found here: http://file.mad.redhat.com/eolivare/BZ1890895/logs-instance-000001c7.tgz

Comment 19 John Ferlan 2020-11-17 12:58:27 UTC

Paolo - any thoughts related to the traces provided ... the guest instance log from libvirt seems to consistently generate the following:

char device redirected to /dev/pts/2 (label charserial0)
2020-10-23T13:30:56.654362Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
KVM: entry failed, hardware error 0x80000021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

RAX=ffffffff8fe98110 RBX=0000000000000000 RCX=0000000000000001 RDX=ffff9a8d4762b4c0
RSI=ffffffff90a03de0 RDI=0000000000000000 RBP=ffffffff90a03e10 RSP=ffffffff90a03e10
R8 =0000000012e2ae29 R9 =0000000000000000 R10=ffffc0138005fcd8 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff8fe98522 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>
CS =0000 0000000000000000 00000000 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 00000000 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>
FS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>
GS =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>
LDT=0000 0000000000000000 00000000 00008000 DPL=0 <hiword>
TR =0000 0000000000000000 00000000 00008000 DPL=0 <hiword>
GDT=     0000000000000000 00000000
IDT=     0000000000000000 00000000
CR0=80050033 CR2=00007fa1d4f89de0 CR3=00000000031a8002 CR4=00360ef0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=00 00 55 48 89 e5 e9 07 00 00 00 0f 00 2d f2 26 57 00 fb f4 <5d> c3 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53
2020-10-23T13:49:02.175587Z qemu-kvm: terminating on signal 15 from pid 3211 (/usr/sbin/libvirtd)
2020-10-23 13:49:02.376+0000: shutting down, reason=destroyed
2020-10-23 13:49:02.918+0000: starting up libvirt version: 6.0.0, package: 25.4.module+el8.2.1+8060+c0c58169 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-09-11-18:58:56, ), qemu version: 4.2.0qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4, kernel: 4.18.0-193.28.1.el8_2.x86_64, hostname: compute-0.redhat.local


... The trace in dmesg.txt has:

[  133.307938] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[  134.004514] *** Guest State ***
[  134.006828] CR0: actual=0x0000000080050033, shadow=0x0000000080050033, gh_mask=fffffffffffffff7
[  134.008130] CR4: actual=0x00000000000626f0, shadow=0x00000000000606b0, gh_mask=ffffffffffffe871
[  134.009613] CR3 = 0x000000000640a001
[  134.010408] RSP = 0xffffffff86003e18  RIP = 0xffffffff84b8a394
[  134.011644] RFLAGS=0x00000002         DR7 = 0x0000000000000400
[  134.012648] Sysenter RSP=fffffe0000002200 CS:RIP=0010:ffffffff85601700
[  134.013811] CS:   sel=0x0000, attr=0x0a09b, limit=0x00000000, base=0x0000000000000000
[  134.015146] DS:   sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000
[  134.016471] SS:   sel=0x0000, attr=0x1c000, limit=0x00000000, base=0x0000000000000000
[  134.017848] ES:   sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000
[  134.019041] FS:   sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000
[  134.020327] GS:   sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000
[  134.021869] GDTR:                           limit=0x00000000, base=0x0000000000000000
[  134.023217] LDTR: sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000
[  134.024584] IDTR:                           limit=0x00000000, base=0x0000000000000000
[  134.025892] TR:   sel=0x0000, attr=0x00000, limit=0x00000000, base=0x0000000000000000
[  134.027349] EFER =     0x0000000000000500  PAT = 0x0407050600070106
[  134.028373] DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
[  134.029516] Interruptibility = 00000000  ActivityState = 00000000
[  134.030638] InterruptStatus = 0000
[  134.031395] *** Host State ***
[  134.031968] RIP = 0xffffffffc0909dc0  RSP = 0xffffa7098427bca0
[  134.032913] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
[  134.033905] FSBase=00007f29624a6700 GSBase=ffff89691fbc0000 TRBase=fffffe0000130000
[  134.035168] GDTBase=fffffe000012e000 IDTBase=fffffe0000000000
[  134.036105] CR0=0000000080050033 CR3=00000007c894c006 CR4=0000000000362ee0
[  134.037318] Sysenter RSP=fffffe000012f200 CS:RIP=0010:ffffffff8d801770
[  134.038325] EFER = 0x0000000000000d01  PAT = 0x0407050600070106
[  134.039206] *** Control State ***
[  134.039813] PinBased=000000ff CPUBased=b5a06dfa SecondaryExec=000233eb
[  134.041194] EntryControls=000053ff ExitControls=000befff
[  134.042064] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
[  134.043197] VMEntry: intr_info=00000b0e errcode=00000000 ilen=00000000
[  134.044397] VMExit: intr_info=00000000 errcode=00000000 ilen=00000000
[  134.045563]         reason=80000021 qualification=0000000000000000
[  134.046624] IDTVectoring: info=00000000 errcode=00000000
[  134.047496] TSC Offset = 0xffff722d436bc143
[  134.048215] SVI|RVI = 00|00 TPR Threshold = 0x00
[  134.048999] APIC-access addr = 0x00000007c5870000 virt-APIC addr = 0x00000007c865f000
[  134.050290] PostedIntrVec = 0xf2
[  134.050999] EPT pointer = 0x00000007c3d9d05e
[  134.052016] Virtual processor ID = 0x0001

Comment 22 Jakub Libosvar 2020-12-01 14:14:58 UTC

*** Bug 1903134 has been marked as a duplicate of this bug. ***

Comment 30 Qinghua Cheng 2021-01-07 03:27:04 UTC

I re-installed the host to rhel 8.2.0，still can not reproduce this bug.

Host:
kernel: 4.18.0-193.el8.x86_64
qemu-kvm: qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64

CPU Model name: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz

L1 guest:

kernel: 4.18.0-193.39.1.el8_2.x86_64
qemu-kvm: qemu-kvm-4.2.0-29.module+el8.2.1+8442+7a3eadf7.5.x86_64
libvirt: libvirt-daemon-6.0.0-25.5.module+el8.2.1+8680+ea98947b.x86_64

Setup 4 L2 guests on L1，and set them autostart.

After reboot L1 guest, all L2 guests started and in running state.

Comment 31 CongLi 2021-01-07 03:49:42 UTC

(In reply to eolivare from comment #0)
> Steps to Reproduce:
> 1. create workload (instances)

Hi Eduardo,

Could you please specify what kind of 'workload' you mean here?

Thanks. 

> 2. reboot compute nodes
> 3. openstack server list (check instances status if SHUTOFF)
> 4. openstack server start <vm-ids>
> 5. all instances are ACTIVE, but some of them are paused

Comment 33 Paolo Bonzini 2021-01-08 17:41:48 UTC

Right now there isn't even a reproducer. :(

Comment 47 Qinghua Cheng 2021-01-25 10:25:18 UTC

I reproduced this bug finally.

My environment is: 

Host:

CPU module: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz

Kernel: 4.18.0-193.6.3.el8_2.x86_64
qemu-kvm: qemu-kvm-core-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64

L1 guest: 

Kernel: 4.18.0-193.14.3.el8_2.x86_64
qemu-kvm: qemu-kvm-core-4.2.0-29.module+el8.2.1+8442+7a3eadf7.5.x86_64

Reboot L1 by command: sudo chmod o+w /proc/sysrq-trigger; sudo echo b > /proc/sysrq-trigger

after reboot L1 guest, check L2 status:

# virsh list --all
 Id   Name      State
-------------------------
 1    cirros3   running
 2    cirros1   paused
 3    cirros2   running
 4    cirros4   running

This is not easy to reproduce in my environment.

Comment 48 Paolo Bonzini 2021-01-25 10:32:00 UTC

Qinghua, can you reproduce it just by starting and stopping VMs many times?

Comment 49 Qinghua Cheng 2021-01-25 10:34:47 UTC

No, it is not reproduced for me just starting and stopping VMs many times in a loop.

Comment 52 Paolo Bonzini 2021-03-25 15:16:54 UTC

Yes, done.

Comment 53 Paolo Bonzini 2021-07-23 12:11:04 UTC

Closing due to lack of reproducer.

Note You need to log in before you can comment on or make changes to this bug.

akatz
coli
dasmith
eglynn
ehadley
ekuris
jhakimra
jinzhao
juzhang
kchamart
knoel
lyarwood
mpryc
pbonzini
pkomarov
qcheng
sbauza
sgordon
smitterl
tfreger
virt-maint
vkuznets
vromanso