Bug 1771032 - Migration while rebooting results in qemu crash on aarch64 ( Assertion `mem->dirty_bmap' failed )
Summary: Migration while rebooting results in qemu crash on aarch64 ( Assertion `mem->...
Keywords:
Status: CLOSED DUPLICATE of bug 1772774
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Dr. David Alan Gilbert
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1772774
Blocks: 1677408
TreeView+ depends on / blocked
 
Reported: 2019-11-11 18:32 UTC by Lukáš Doktor
Modified: 2020-01-02 12:08 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-02 12:08:27 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lukáš Doktor 2019-11-11 18:32:52 UTC
Description of problem:
I'm running Avocado-vt migrate.with_reboot tests and they are failing with the current qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.aarch64.

Version-Release number of selected component (if applicable):
* qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.aarch64
* updated latest-RHEL-8 (2019-10-11)

How reproducible:
Always

Steps to Reproduce:
1. Install Avocado-vt https://avocado-vt.readthedocs.io/en/latest/GetStartedGuide.html
2. Install guest (replace "$URL" with path to RHEL os): avocado --show all run --vt-guest-os RHEL.7.devel unattended_install.url.extra_cdrom_ks.default_install.aio_threads --vt-extra-params url=$URL kernel_params="ks=cdrom nicdelay=60  console=ttyS0 ip=dhcp inst.sshd inst.repo=$URL"
3. Run "avocado --show all run migrate..tcp..with_reboot --vt-guest-os RHEL.7.devel.aarch64..arm64-pci"

Actual results:
...
root: (monitor avocado-vt-vm1.qmpmonitor1)        normal-bytes: 681680896
root: (monitor avocado-vt-vm1.qmpmonitor1)        normal: 166426
-root: [qemu output] qemu-kvm: /builddir/build/BUILD/qemu-4.1.0/accel/kvm/kvm-all.c:673: kvm_physical_log_clear: Assertion `mem->dirty_bmap' failed.
\root: Waiting for migration to complete (22.162233 secs)
root: (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'query-migrate' 
root: Send command: {'execute': 'query-migrate', 'id': '6zD5hZai'}
/root: [qemu output] qemu-kvm: Unknown combination of migration flags: 0
root: [qemu output] qemu-kvm: error while loading state section id 1(ram)
root: [qemu output] qemu-kvm: load of migration failed: Invalid argument
root: [qemu output] /tmp/aexpect_ConQCoWr/aexpect-d0dlcbbj.sh: line 1: 19204 Aborted                 (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/images/avocado/avocado-vt/images/rhel7devel-aarch64_AAVMF_VARS.fd,if=pflash,format=raw,unit=1 -machine virt,gic-version=host -nodefaults -device virtio-gpu-pci,bus=pcie.0,addr=0x1 -m 1024 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_xg6u7f37/monitor-qmpmonitor1-20191111-132900-W1vPnKqN,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_xg6u7f37/monitor-catch_monitor-20191111-132900-W1vPnKqN,server,nowait -mon chardev=qmp_id_catch_monitor,mode=control -serial unix:'/var/tmp/avocado_xg6u7f37/serial-serial0-20191111-132900-W1vPnKqN',server,nowait -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 -blockdev node-name=file_image1,driver=file,filename=/var/lib/libvirt/images/avocado/avocado-vt/images/rhel7devel-aarch64.qcow2 -blockdev node-name=drive_image1,driver=qcow2,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1 -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 -device virtio-net-pci,mac=9a:0a:d7:d9:28:3c,rombar=0,id=idxSnogZ,netdev=idKpXcaA,bus=pcie.0-root-port-4,addr=0x0 -netdev tap,id=idKpXcaA,fd=20 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -rtc base=utc,clock=host -enable-kvm -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 -device pcie-root-port,id=pcie_extra_root_port_1,slot=6,chassis=6,addr=0x6,bus=pcie.0


Expected results:
It should PASS

Additional info:
* What the test does is this:

1. starts VM
2. ssh to the VM and runs tcpdump in it
3. runs "reboot"
4. immediately starts migrating via tcp
5. keeps migrating until the VM boots

When I add 2s (or more) wait between step 3 and 4 the test passes.

Comment 2 Lukáš Doktor 2019-11-18 07:47:57 UTC
Also I forgot to mention it works well with upstream qemu version (19bef037fe096b17edda103fd513ce6451da23c8), although I'm using custom "configure" that might affect things. In any case I can try finding a matching/broken upstream version and bisect it, if that helps.

Comment 4 Dr. David Alan Gilbert 2019-11-21 15:23:51 UTC
Hi,
  Can you try with:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24872526

this looks like it might be the same as 1772774.

Comment 5 Dr. David Alan Gilbert 2019-11-21 16:54:04 UTC
Slightly newer version:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24874372

Comment 6 Guowen Shan 2019-11-25 05:42:39 UTC
Lukas, It's reproducible on upstream as well. I'm using below software combination and the issue can be reproduced successfully:

[root@amd-seattle-07 ~]# uname -a
Linux amd-seattle-07.khw1.lab.eng.bos.redhat.com 4.18.0-149.el8.aarch64 #1 SMP Wed Nov 13 04:01:26 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux


[root@amd-seattle-07 ~]# mkdir /home/gshan; cd /home/gshan
[root@amd-seattle-07 ~]# git clone https://git.qemu.org/git/qemu.git qemu.main; cd qemu.main
[root@amd-seattle-07 ~]# ./configure --target-list=aarch64-softmmu --enable-debug --enable-werror --enable-kvm --disable-xen --disable-vnc
[root@amd-seattle-07 ~]# make -j 8

On amd-seattle-09, the qemu code is cloned and its image is built as well.

[root@amd-seattle-09 ~]# /home/gshan/qemu.main/aarch64-softmmu/qemu-system-aarch64 \
-machine virt,accel=kvm,gic-version=2                                              \
-cpu host -m 4096 -smp 8,sockets=8,cores=1,threads=1                               \
-monitor none -serial mon:stdio -nographic -s                                      \
-drive file=/home/gshan/images/vm00.img,format=qcow2,cache=none,if=none,id=disk0   \
-device virtio-blk-pci,drive=disk0                                                 \
-netdev tap,script=/home/gshan/scripts/if-up.sh,id=hostnet0                        \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:83:83:7c               \
-bios /usr/share/AAVMF/AAVMF_CODE.fd                                               \
-incoming tcp:0:4444

[root@amd-seattle-07 ~]# /home/gshan/qemu.main/aarch64-softmmu/qemu-system-aarch64 \
-machine virt,accel=kvm,gic-version=2                                              \
-cpu host -m 4096 -smp 8,sockets=8,cores=1,threads=1                               \
-monitor none -serial mon:stdio -nographic -s                                      \
-drive file=/home/gshan/images/vm00.img,format=qcow2,cache=none,if=none,id=disk0   \
-device virtio-blk-pci,drive=disk0                                                 \
-netdev tap,script=/home/gshan/scripts/if-up.sh,id=hostnet0                        \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:83:83:70               \
-bios /usr/share/AAVMF/AAVMF_CODE.fd
   :
[  OK  ] Started Crash recovery kernel arming.

Red Hat Enterprise Linux 8.2 Beta (Ootpa)
Kernel 4.18.0-151.el8.aarch64 on an aarch64

Activate the web console with: systemctl enable --now cockpit.socket

localhost login: root
Password: 
Last login: Mon Nov 25 00:33:04 on ttyAMA0
[root@localhost ~]# reboot
    :
[  OK  ] Stopped System Security Services Daemon.
[  OK  ] Stopped Network Manager.
[  OK  ] Stopped target Network (Pre).
         Stopping firewalld - dynamic firewall daemon...
QEMU 4.1.92 monitor - type 'help' for more information
(qemu) migrate tcp:10.16.200.171:4444
   :
[   34.963763] reboot: Restarting system
UEFI firmware starting.
   :
EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd072]
[    0.000000] Linux version 4.18.0-151.el8.aarch64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)) #1 SMP Fri Nov 15 19:47:25 UTC 2019
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: EFI v2.70 by EDK II
   :
[  OK  ] Started VDO volume services.
[  OK  ] Started Login Service.
[  OK  ] Started firewalld - dynamic firewall daemon.
[  OK  ] Reached target Network (Pre).
         Starting Network Manager...
[  OK  ] Started Network Manager.
[  OK  ] Reached target Network.
         Starting Enable periodic update of entitlement certificates....
         Starting Permit User Sessions...
qemu-system-aarch64: /home/gshan/qemu.main/accel/kvm/kvm-all.c:650: kvm_log_clear_one_slot: Assertion `mem->dirty_bmap' failed.
Aborted (core dumped)

Comment 7 Lukáš Doktor 2019-11-25 16:47:21 UTC
(In reply to Dr. David Alan Gilbert from comment #5)
> Slightly newer version:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24874372

Thank you, David, it seems to address the issue (hasn't reproduced in 20 iterations, previously always failed)

Comment 8 Guowen Shan 2019-11-25 22:15:07 UTC
Rerun test case in comment#6 with upstream qemu, plus below patch from David. No crash found after 5 iterations.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg660559.html
("kvm: Reallocate dirty_bmap when we change a slot")


Note You need to log in before you can comment on or make changes to this bug.