Description of problem: I'm running Avocado-vt migrate.with_reboot tests and they are failing with the current qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.aarch64. Version-Release number of selected component (if applicable): * qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.aarch64 * updated latest-RHEL-8 (2019-10-11) How reproducible: Always Steps to Reproduce: 1. Install Avocado-vt https://avocado-vt.readthedocs.io/en/latest/GetStartedGuide.html 2. Install guest (replace "$URL" with path to RHEL os): avocado --show all run --vt-guest-os RHEL.7.devel unattended_install.url.extra_cdrom_ks.default_install.aio_threads --vt-extra-params url=$URL kernel_params="ks=cdrom nicdelay=60 console=ttyS0 ip=dhcp inst.sshd inst.repo=$URL" 3. Run "avocado --show all run migrate..tcp..with_reboot --vt-guest-os RHEL.7.devel.aarch64..arm64-pci" Actual results: ... root: (monitor avocado-vt-vm1.qmpmonitor1) normal-bytes: 681680896 root: (monitor avocado-vt-vm1.qmpmonitor1) normal: 166426 -root: [qemu output] qemu-kvm: /builddir/build/BUILD/qemu-4.1.0/accel/kvm/kvm-all.c:673: kvm_physical_log_clear: Assertion `mem->dirty_bmap' failed. \root: Waiting for migration to complete (22.162233 secs) root: (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'query-migrate' root: Send command: {'execute': 'query-migrate', 'id': '6zD5hZai'} /root: [qemu output] qemu-kvm: Unknown combination of migration flags: 0 root: [qemu output] qemu-kvm: error while loading state section id 1(ram) root: [qemu output] qemu-kvm: load of migration failed: Invalid argument root: [qemu output] /tmp/aexpect_ConQCoWr/aexpect-d0dlcbbj.sh: line 1: 19204 Aborted (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/images/avocado/avocado-vt/images/rhel7devel-aarch64_AAVMF_VARS.fd,if=pflash,format=raw,unit=1 -machine virt,gic-version=host -nodefaults -device virtio-gpu-pci,bus=pcie.0,addr=0x1 -m 1024 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_xg6u7f37/monitor-qmpmonitor1-20191111-132900-W1vPnKqN,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_xg6u7f37/monitor-catch_monitor-20191111-132900-W1vPnKqN,server,nowait -mon chardev=qmp_id_catch_monitor,mode=control -serial unix:'/var/tmp/avocado_xg6u7f37/serial-serial0-20191111-132900-W1vPnKqN',server,nowait -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 -blockdev node-name=file_image1,driver=file,filename=/var/lib/libvirt/images/avocado/avocado-vt/images/rhel7devel-aarch64.qcow2 -blockdev node-name=drive_image1,driver=qcow2,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1 -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 -device virtio-net-pci,mac=9a:0a:d7:d9:28:3c,rombar=0,id=idxSnogZ,netdev=idKpXcaA,bus=pcie.0-root-port-4,addr=0x0 -netdev tap,id=idKpXcaA,fd=20 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -rtc base=utc,clock=host -enable-kvm -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 -device pcie-root-port,id=pcie_extra_root_port_1,slot=6,chassis=6,addr=0x6,bus=pcie.0 Expected results: It should PASS Additional info: * What the test does is this: 1. starts VM 2. ssh to the VM and runs tcpdump in it 3. runs "reboot" 4. immediately starts migrating via tcp 5. keeps migrating until the VM boots When I add 2s (or more) wait between step 3 and 4 the test passes.
Also I forgot to mention it works well with upstream qemu version (19bef037fe096b17edda103fd513ce6451da23c8), although I'm using custom "configure" that might affect things. In any case I can try finding a matching/broken upstream version and bisect it, if that helps.
Hi, Can you try with: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24872526 this looks like it might be the same as 1772774.
Slightly newer version: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24874372
Lukas, It's reproducible on upstream as well. I'm using below software combination and the issue can be reproduced successfully: [root@amd-seattle-07 ~]# uname -a Linux amd-seattle-07.khw1.lab.eng.bos.redhat.com 4.18.0-149.el8.aarch64 #1 SMP Wed Nov 13 04:01:26 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux [root@amd-seattle-07 ~]# mkdir /home/gshan; cd /home/gshan [root@amd-seattle-07 ~]# git clone https://git.qemu.org/git/qemu.git qemu.main; cd qemu.main [root@amd-seattle-07 ~]# ./configure --target-list=aarch64-softmmu --enable-debug --enable-werror --enable-kvm --disable-xen --disable-vnc [root@amd-seattle-07 ~]# make -j 8 On amd-seattle-09, the qemu code is cloned and its image is built as well. [root@amd-seattle-09 ~]# /home/gshan/qemu.main/aarch64-softmmu/qemu-system-aarch64 \ -machine virt,accel=kvm,gic-version=2 \ -cpu host -m 4096 -smp 8,sockets=8,cores=1,threads=1 \ -monitor none -serial mon:stdio -nographic -s \ -drive file=/home/gshan/images/vm00.img,format=qcow2,cache=none,if=none,id=disk0 \ -device virtio-blk-pci,drive=disk0 \ -netdev tap,script=/home/gshan/scripts/if-up.sh,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:83:83:7c \ -bios /usr/share/AAVMF/AAVMF_CODE.fd \ -incoming tcp:0:4444 [root@amd-seattle-07 ~]# /home/gshan/qemu.main/aarch64-softmmu/qemu-system-aarch64 \ -machine virt,accel=kvm,gic-version=2 \ -cpu host -m 4096 -smp 8,sockets=8,cores=1,threads=1 \ -monitor none -serial mon:stdio -nographic -s \ -drive file=/home/gshan/images/vm00.img,format=qcow2,cache=none,if=none,id=disk0 \ -device virtio-blk-pci,drive=disk0 \ -netdev tap,script=/home/gshan/scripts/if-up.sh,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:83:83:70 \ -bios /usr/share/AAVMF/AAVMF_CODE.fd : [ OK ] Started Crash recovery kernel arming. Red Hat Enterprise Linux 8.2 Beta (Ootpa) Kernel 4.18.0-151.el8.aarch64 on an aarch64 Activate the web console with: systemctl enable --now cockpit.socket localhost login: root Password: Last login: Mon Nov 25 00:33:04 on ttyAMA0 [root@localhost ~]# reboot : [ OK ] Stopped System Security Services Daemon. [ OK ] Stopped Network Manager. [ OK ] Stopped target Network (Pre). Stopping firewalld - dynamic firewall daemon... QEMU 4.1.92 monitor - type 'help' for more information (qemu) migrate tcp:10.16.200.171:4444 : [ 34.963763] reboot: Restarting system UEFI firmware starting. : EFI stub: Booting Linux Kernel... EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd072] [ 0.000000] Linux version 4.18.0-151.el8.aarch64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)) #1 SMP Fri Nov 15 19:47:25 UTC 2019 [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: EFI v2.70 by EDK II : [ OK ] Started VDO volume services. [ OK ] Started Login Service. [ OK ] Started firewalld - dynamic firewall daemon. [ OK ] Reached target Network (Pre). Starting Network Manager... [ OK ] Started Network Manager. [ OK ] Reached target Network. Starting Enable periodic update of entitlement certificates.... Starting Permit User Sessions... qemu-system-aarch64: /home/gshan/qemu.main/accel/kvm/kvm-all.c:650: kvm_log_clear_one_slot: Assertion `mem->dirty_bmap' failed. Aborted (core dumped)
(In reply to Dr. David Alan Gilbert from comment #5) > Slightly newer version: > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24874372 Thank you, David, it seems to address the issue (hasn't reproduced in 20 iterations, previously always failed)
Rerun test case in comment#6 with upstream qemu, plus below patch from David. No crash found after 5 iterations. https://www.mail-archive.com/qemu-devel@nongnu.org/msg660559.html ("kvm: Reallocate dirty_bmap when we change a slot")