Description of problem: - Windows instances crash time to time - From windows OS they had seen power loss log written to all windows crashes - Server certificate RHOSP16 and RHEL8 - https://catalog.redhat.com/hardware/servers/detail/2941651 - March to June 2021 the env had OSP upgrade from OSP 13 to 16 & Firmware BIOS update to Computes - Issue is on Windows 2012 & 2016, More reports are on 2016 Version-Release number of selected component (if applicable): [redhat-release] Red Hat Enterprise Linux release 8.2 (Ootpa) [rhosp-release] Red Hat OpenStack Platform release 16.1.3 GA (Train) - the qemu-kvm and libvirtd is containerized, and this host is using : "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp16/openstack-nova-libvirt/images/16.1.3-7.1614767861", Which corresponds to this: https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-nova-libvirt/5de6c2ddbed8bd164a0c1bbf?tag=16.1.3-7.1614767861&push_date=1615227731000&container-tabs=packages So the qemu-kvm and libvirt versions: - qemu-kvm-4.2.0-29.module+el8.2.1+9791+7d72b149.6.x86_64 - libvirt-daemon-6.0.0-25.5.module+el8.2.1+8680+ea98947b.x86_64 How reproducible: We didn't find a reason for reproduce, but it's happen randomly Additional info: gdb -e /usr/libexec/qemu-kvm -c ./core.qemu-kvm.107.5c1789ec0e454a61a539f2120495cc87.340182.1644135667000000 BFD: warning: /home/fdelorey/Desktop/./core.qemu-kvm.107.5c1789ec0e454a61a539f2120495cc87.340182.1644135667000000 is truncated: expected core file size >= 34764460032, found: 2147483648 MANY LINES DELETED: Failed to read a valid object file image from memory. Core was generated by `/usr/libexec/qemu-kvm -name guest=instance-00005xxx,debug-threads=on -S -object'. Program terminated with signal SIGABRT, Aborted. #0 0x00007ff84c48470f in ?? () [Current thread is 1 (LWP 340200)] (gdb) bt #0 0x00007ff84c48470f in ?? () Backtrace stopped: Cannot access memory at address 0x7ff83f7fd110
I tried some steps but didn't reproduce the issue by following steps: 1. Installed one rhel8.2.1 host and installed the packages and win2016 guest used by customer: # rpm -q qemu-kvm qemu-kvm-4.2.0-29.module+el8.2.1+9791+7d72b149.6.x86_64 # uname -r 4.18.0-193.29.1.el8_2.x86_64 2. Booted the win2016 guest for a whole night: /usr/libexec/qemu-kvm \ -name guest=instance-00005f1d,debug-threads=on \ -S \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=on \ -cpu SandyBridge-IBRS,vme=on,f16c=on,rdrand=on,hypervisor=on,arat=on,xsaveopt=on,abm=on \ -m 32768 \ -overcommit mem-lock=off \ -smp 6,sockets=6,dies=1,cores=1,threads=1 \ -uuid 773b0d15-a735-43bb-82cb-fdefcad28ea3 \ -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.4.1-1.20200917173450.el8ost,serial=773b0d15-a735-43bb-82cb-fdefcad28ea3,uuid=773b0d15-a735-43bb-82cb-fdefcad28ea3,family=Virtual Machine' \ -no-user-config \ -nodefaults \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -boot strict=on \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -blockdev '{"driver":"file","filename":"/home/win2016-64-virtio.raw","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-2-format,id=virtio-disk0,bootindex=1,write-cache=on,serial=9b2c8658-4b54-409d-93eb-f934a8540ceb \ -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,rx_queue_size=512,host_mtu=9000,netdev=hostnet0,id=net0,mac=00:16:3e:09:55:49,bus=pci.0,addr=0x3 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc :0 \ -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -sandbox on \ -msg timestamp=on -monitor stdio 3. Then changed the guest time backwards/forwards, after that, rebooted the guest.
I ran all our windows timer device cases with the test environment on comment 5 and the same qemu cmd line as customer. Still can't reproduce the issue. Summary: Finshed=25, PASS=25 And here is the related code, does anyone have any suggestions on how to reproduce the bug? 190 /* 191 * if the periodic timer's update is due to period re-configuration, 192 * we should count the clock since last interrupt. 193 */ 194 if (old_period && period_change) { 195 int64_t last_periodic_clock, next_periodic_clock; 196 197 next_periodic_clock = muldiv64(s->next_periodic_time, 198 RTC_CLOCK_RATE, NANOSECONDS_PER_SECOND); 199 last_periodic_clock = next_periodic_clock - old_period; 200 lost_clock = cur_clock - last_periodic_clock; 201 assert(lost_clock >= 0); 202 }