Bug 1981207
| Summary: | Qemu core dump when migrate with cdrom in use | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Li Xiaohui <xiaohli> |
| Component: | qemu-kvm | Assignee: | Guowen Shan <gshan> |
| qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | low | CC: | chayang, drjones, gshan, jinzhao, lcapitulino, qzhang, virt-maint, zhenyzha |
| Version: | 9.0 | Keywords: | Triaged |
| Target Milestone: | beta | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-20 00:52:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1924294 | ||
Test pass after running with rhel9 vm on rhel9.0.0 x86 platform (kernel-5.13.0-0.rc4.33.el9.x86_64 & qemu-img-6.0.0-7.el9.x86_64): --> Running case(1/1): RHEL7-10063-[migration] Migrate guest while cdrom in use (13 min 16 sec)--- PASS. So doesn't hit such product issue on x86. Xiaohui, So this issue is arm64 specific according to comment#1? (In reply to Guowen Shan from comment #2) > Xiaohui, So this issue is arm64 specific according to comment#1? Hi, I haven't tried on ppc as have no available machines now, but on x86 didn't hit this issue. I think it's worthy to check if last QEMU build has same issue or not.
qemu-img-6.0.0-7.el9.aarch64 # Where the issue was reported
qemu-img-6.0.0-9.el9.aarch64 # last QEMU build
I failed to reproduce the issue with upstream QEMU. What I did:
(1) Create ISO file
dd if=/dev/urandom of=/home/gavin/sandbox/images/cdrom.raw bs=1M count=10240
md5sum /home/gavin/sandbox/images/cdrom.raw
mkisofs -o /home/gavin/sandbox/images/cdrom.iso -max-iso9660-filenames \
-relaxed-filenames -allow-limited-size -D --input-charset iso8859-1 \
/home/gavin/sandbox/images/cdrom.raw
(2) Start source/target VMs with the following parameters
-device virtio-scsi-pci,id=virtio_scsi0 \
-blockdev driver=file,cache.direct=on,read-only=on,cache.no-flush=off,filename=/home/gavin/sandbox/images/cdrom.iso,node-name=cdrom_iso \
-blockdev driver=raw,node-name=cdrom_drive,read-only=on,file=cdrom_iso -device scsi-cd,id=cdrom_dev,drive=cdrom_drive
(3) Kick off copying files from CDROM and then start the migration.
The migration is finished without error
mount /dev/sr0 /media
rm -fr /home/gavin/sandbox/data/*
nohup cp -rf /media/* /home/gavin/sandbox/data/ &
(qemu) migrate -d tcp:0:4444
Hi Gavin,
Tried this bz again on the qemu-img-6.0.0-9.el9.aarch64, but the recurrence rate is particularly low, 3/40 reproduce.
The qemu coredump info:
[root@ampere-hr350a-04 home]# coredumpctl
TIME PID UID GID SIG COREFILE EXE >
Mon 2021-07-26 04:21:06 EDT 59648 0 0 SIGABRT truncated /usr/libexec/qemu-k>
Mon 2021-07-26 05:22:59 EDT 64226 0 0 SIGABRT truncated /usr/libexec/qemu-k>
Mon 2021-07-26 11:17:25 EDT 80086 0 0 SIGABRT present /usr/libexec/qemu-k>
[root@ampere-hr350a-04 home]# coredumpctl info 80086
PID: 80086 (qemu-kvm)
UID: 0 (root)
GID: 0 (root)
Signal: 6 (ABRT)
Timestamp: Mon 2021-07-26 11:16:52 EDT (11h ago)
Command Line: /usr/libexec/qemu-kvm -name mouse-vm -sandbox on -machine virt,>
Executable: /usr/libexec/qemu-kvm
Control Group: /user.slice/user-0.slice/session-2.scope
Unit: session-2.scope
Slice: user-0.slice
Session: 2
Owner UID: 0 (root)
Boot ID: 7cd0b28d44b04a1081960dc52da7e2f1
Machine ID: d2c8c2a2c98a4a3e84961e3831b4bff9
Hostname: ampere-hr350a-04.khw4.lab.eng.bos.redhat.com
Storage: /var/lib/systemd/coredump/core.qemu-kvm.0.7cd0b28d44b04a1081960>
Disk Size: 2.0G
Message: Process 80086 (qemu-kvm) of user 0 dumped core.
Stack trace of thread 80282:
#0 0x0000ffff88869e28 __pthread_kill_internal (libc.so.6 + 0x8>
#1 0x0000ffff88823b80 raise (libc.so.6 + 0x3db80)
#2 0x0000ffff88810178 abort (libc.so.6 + 0x2a178)
#3 0x0000ffff8881d2c4 __assert_fail_base (libc.so.6 + 0x372c4)
#4 0x0000ffff8881d340 __assert_fail (libc.so.6 + 0x37340)
#5 0x0000aaaadb9af3b0 aio_task_pool_wait_one (qemu-kvm + 0x63f>
#6 0x0000aaaadb9591fc qcow2_co_pwritev_part.lto_priv.0 (qemu-k>
#7 0x0000aaaadb961024 bdrv_driver_pwritev.lto_priv.0 (qemu-kvm>
#8 0x0000aaaadb962304 bdrv_aligned_pwritev.lto_priv.0 (qemu-kv>
#9 0x0000aaaadb963b78 bdrv_co_pwritev_part (qemu-kvm + 0x5f3b7>
#10 0x0000aaaadb9700c0 blk_do_pwritev_part.lto_priv.0 (qemu-kvm>
#11 0x0000aaaadb9703b4 blk_aio_write_entry.lto_priv.0 (qemu-kvm>
#12 0x0000aaaadba41f3c coroutine_trampoline (qemu-kvm + 0x6d1f3>
#13 0x0000ffff88832e00 n/a (libc.so.6 + 0x4ce00)
#14 0x0000ffff88832e00 n/a (libc.so.6 + 0x4ce00)
lines 13-35/35 (END)
Detailed core dump file please see:
http://kvmqe-tools.qe.lab.eng.nay.redhat.com/logjump.html?target=bos&path=xiaohli/bug/bz_1981207
(In reply to Guowen Shan from comment #5) > I think it's worthy to check if last QEMU build has same issue or not. > > qemu-img-6.0.0-7.el9.aarch64 # Where the issue was reported > qemu-img-6.0.0-9.el9.aarch64 # last QEMU build > > I failed to reproduce the issue with upstream QEMU. What I did: > > (1) Create ISO file > > dd if=/dev/urandom of=/home/gavin/sandbox/images/cdrom.raw bs=1M > count=10240 > md5sum /home/gavin/sandbox/images/cdrom.raw > mkisofs -o /home/gavin/sandbox/images/cdrom.iso -max-iso9660-filenames > \ > -relaxed-filenames -allow-limited-size -D --input-charset > iso8859-1 \ > /home/gavin/sandbox/images/cdrom.raw > > (2) Start source/target VMs with the following parameters > > -device virtio-scsi-pci,id=virtio_scsi0 \ > -blockdev > driver=file,cache.direct=on,read-only=on,cache.no-flush=off,filename=/home/ > gavin/sandbox/images/cdrom.iso,node-name=cdrom_iso > \ > -blockdev driver=raw,node-name=cdrom_drive,read-only=on,file=cdrom_iso > -device scsi-cd,id=cdrom_dev,drive=cdrom_drive > > (3) Kick off copying files from CDROM and then start the migration. > The migration is finished without error > > mount /dev/sr0 /media > rm -fr /home/gavin/sandbox/data/* > nohup cp -rf /media/* /home/gavin/sandbox/data/ & > > (qemu) migrate -d tcp:0:4444 Did you migrate on same host? I'd doubt it's the reason that you didn't reproduce bz. But also please try more times. Xiaohui, I always try migration on same machine because I don't have two machines to migrate via network. By the way, could you help to try last QEMU/kernel image. The LTO compiling option was disabled recently. It has been affecting many aspects, including the virtio-block devices. (In reply to Guowen Shan from comment #8) > Xiaohui, I always try migration on same machine because I don't have two > machines to migrate via network. > > By the way, could you help to try last QEMU/kernel image. The LTO compiling > option was disabled recently. It has been affecting many aspects, including > the virtio-block devices. Ok, I loaned two machines just now. Will update test results here with the latest qemu-kvm on rhel9. Tested this bz with repeating 100 times on qemu-kvm-6.1.0-2.el9.aarch64, all pass, didn't hit this bz Gavin, shall we close this bz as currentrelease according to your comment 8? (In reply to Li Xiaohui from comment #10) > Tested this bz with repeating 100 times on qemu-kvm-6.1.0-2.el9.aarch64, all > pass, didn't hit this bz > > > Gavin, shall we close this bz as currentrelease according to your comment 8? > Yes, I agree to close it as currentrelease. The LTO compiling option sometimes introduce weird problems. Thanks for testing it again, particularly 100 cycles of migrations take much time. (In reply to Guowen Shan from comment #11) > (In reply to Li Xiaohui from comment #10) > > Tested this bz with repeating 100 times on qemu-kvm-6.1.0-2.el9.aarch64, all > > pass, didn't hit this bz > > > > > > Gavin, shall we close this bz as currentrelease according to your comment 8? > > > > Yes, I agree to close it as currentrelease. The LTO compiling option > sometimes introduce weird problems. Thanks for testing it again, particularly 100 > cycles of migrations take much time. It's my pleasure, please go ahead, thanks. |
Description of problem: Qemu core dump when migrate with cdrom in use Version-Release number of selected component (if applicable): hosts info: kernel-5.13.0-0.rc7.51.el9.aarch64 & qemu-img-6.0.0-7.el9.aarch64 the lscpu of src and dst host are same guest info: kernel-5.13.0-0.rc7.51.el9.aarch64 How reproducible: 100% Steps to Reproduce: 1.Prepare a 10G iso image on src host: # [root@host ~]# dd if=/dev/urandom of=/home/ipa/6ObtrK_raw bs=1M count=10240 # [root@host ~]# md5sum /home/ipa/6ObtrK_raw # [root@host ~]# mkisofs -o /home/ipa/6ObtrK.iso -max-iso9660-filenames -relaxed-filenames -allow-limited-size -D --input-charset iso8859-1 /home/ipa/6ObtrK_raw 2.Boot a vm on src host with cdrom using the iso iamge of above 1 as qemu cmds[1]; 3.Boot a vm on dst host with same qemu cmds as src host but append with "-incoming defer" 4.Copy large files from cdrom to disk: # [root@guest ~]# mount /dev/sr0 /media # mount: /media: WARNING: source write-protected, mounted read-only. # [root@guest ~]# rm -rf /home/ios_files && mkdir -p /home/ios_files # [root@guest ~]# nohup cp -rf /media/* /home/ios_files/ & 5.Migrate vm from src to dst host: (dst qmp): {"execute": "migrate-incoming", "arguments": {"uri": "tcp:[::]:4000"}, "id": "aHHO6gi9"} (src qmp) {"execute": "migrate-set-parameters", "arguments": {"max-bandwidth": 167772160}, "id": "5xPuVkla"} {"return": {}, "id": "5xPuVkla"} {"execute": "migrate-set-parameters", "arguments": {"downtime-limit": 30000}, "id": "w3JDN3hu"} {"return": {}, "id": "w3JDN3hu"} {"execute": "migrate", "arguments": {"uri": "tcp:$dst_host_ip:4000"}, "id": "vbPf6gnX"} {"return": {}, "id": "vbPf6gnX"} Actual results: Query migration, it's active in some seconds, then qemu core dump on src&dst host: {"execute": "query-migrate", "id": "84MvB65h"} {"return": {"blocked": false, "expected-downtime": 30000, "status": "active", "setup-time": 7, "total-time": 27249, "ram": {"total": 4429512704, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 14344, "page-size": 4096, "remaining": 2083717120, "mbps": 470.97186206896549, "transferred": 1620360460, "duplicate": 178270, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 1615597568, "normal": 394433}}, "id": "84MvB65h"} (src hmp): (qemu) qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed. Aborted (core dumped) (dst hmp): (qemu) qemu-kvm: check_section_footer: Read section footer failed: -5 qemu-kvm: load of migration failed: Invalid argument Expected results: Migration succeeds and the cp works well. Additional info: 1.Didn't hit such issue when test on the rhelav-8.5.0: kernel-4.18.0-316.el8.aarch64 & qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09.aarch64 qemu cmd[1] /usr/libexec/qemu-kvm \ -name "mouse-vm" \ -sandbox on \ -machine virt,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \ -cpu host \ -nodefaults \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device usb-mouse,id=usb-mouse1,bus=usb1.0,port=2 \ -device usb-kbd,id=usb-kbd1,bus=usb1.0,port=3 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0,write-cache=on \ -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \ -device virtio-net-pci,mac=9a:0a:71:f3:69:7d,rombar=0,id=idv2eapv,netdev=tap0,bus=pcie-root-port-4,addr=0x0 \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x2,chassis=6 \ -device pcie-root-port,id=pcie_extra_root_port_1,addr=0x2.0x1,bus=pcie.0,chassis=7 \ -device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \ -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/kvm_autotest_root/images//rhel900-aarch64-virtio-scsi.qcow2,node-name=drive_sys1 \ -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \ -blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images//rhel900-aarch64-virtio-scsi.qcow2.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \ -blockdev driver=file,cache.direct=on,read-only=on,cache.no-flush=off,filename=/home/ipa/6ObtrK.iso,node-name=drive_iso_pre \ -blockdev driver=raw,node-name=drive_cd1,read-only=on,file=drive_iso_pre \ -netdev tap,id=tap0,vhost=on \ -m 4096 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -vnc :10 \ -rtc base=utc,clock=host,driftfix=slew \ -enable-kvm \ -qmp tcp:0:3333,server=on,wait=off \ -qmp tcp:0:9999,server=on,wait=off \ -qmp tcp:0:9888,server=on,wait=off \ -serial tcp:0:4444,server=on,wait=off \ -monitor stdio \