RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1981207 - Qemu core dump when migrate with cdrom in use
Summary: Qemu core dump when migrate with cdrom in use
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.0
Hardware: aarch64
OS: Linux
low
medium
Target Milestone: beta
: ---
Assignee: Guowen Shan
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks: 1924294
TreeView+ depends on / blocked
 
Reported: 2021-07-12 04:37 UTC by Li Xiaohui
Modified: 2021-09-20 00:52 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-20 00:52:32 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Li Xiaohui 2021-07-12 04:37:12 UTC
Description of problem:
Qemu core dump when migrate with cdrom in use


Version-Release number of selected component (if applicable):
hosts info: kernel-5.13.0-0.rc7.51.el9.aarch64 & qemu-img-6.0.0-7.el9.aarch64
the lscpu of src and dst host are same
guest info: kernel-5.13.0-0.rc7.51.el9.aarch64 


How reproducible:
100%


Steps to Reproduce:
1.Prepare a 10G iso image on src host:
# [root@host ~]# dd if=/dev/urandom of=/home/ipa/6ObtrK_raw bs=1M count=10240 
# [root@host ~]# md5sum /home/ipa/6ObtrK_raw
# [root@host ~]# mkisofs -o /home/ipa/6ObtrK.iso -max-iso9660-filenames -relaxed-filenames -allow-limited-size -D --input-charset iso8859-1 /home/ipa/6ObtrK_raw
2.Boot a vm on src host with cdrom using the iso iamge of above 1 as qemu cmds[1];
3.Boot a vm on dst host with same qemu cmds as src host but append with "-incoming defer"
4.Copy large files from cdrom to disk:
# [root@guest ~]# mount /dev/sr0 /media
# mount: /media: WARNING: source write-protected, mounted read-only.
# [root@guest ~]# rm -rf /home/ios_files && mkdir -p /home/ios_files
# [root@guest ~]# nohup cp -rf /media/* /home/ios_files/ &
5.Migrate vm from src to dst host:
(dst qmp):
{"execute": "migrate-incoming", "arguments": {"uri": "tcp:[::]:4000"}, "id": "aHHO6gi9"}
(src qmp)
{"execute": "migrate-set-parameters", "arguments": {"max-bandwidth": 167772160}, "id": "5xPuVkla"}
{"return": {}, "id": "5xPuVkla"}
{"execute": "migrate-set-parameters", "arguments": {"downtime-limit": 30000}, "id": "w3JDN3hu"}
{"return": {}, "id": "w3JDN3hu"}
{"execute": "migrate", "arguments": {"uri": "tcp:$dst_host_ip:4000"}, "id": "vbPf6gnX"}
{"return": {}, "id": "vbPf6gnX"}


Actual results:
Query migration, it's active in some seconds, then qemu core dump on src&dst host:
{"execute": "query-migrate", "id": "84MvB65h"}
{"return": {"blocked": false, "expected-downtime": 30000, "status": "active", "setup-time": 7, "total-time": 27249, "ram": {"total": 4429512704, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 14344, "page-size": 4096, "remaining": 2083717120, "mbps": 470.97186206896549, "transferred": 1620360460, "duplicate": 178270, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 1615597568, "normal": 394433}}, "id": "84MvB65h"}

(src hmp):
(qemu) qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed.
Aborted (core dumped)
(dst hmp):
(qemu) qemu-kvm: check_section_footer: Read section footer failed: -5
qemu-kvm: load of migration failed: Invalid argument


Expected results:
Migration succeeds and the cp works well. 


Additional info:
1.Didn't hit such issue when test on the rhelav-8.5.0: kernel-4.18.0-316.el8.aarch64 & qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09.aarch64


qemu cmd[1]
/usr/libexec/qemu-kvm  \
-name "mouse-vm" \
-sandbox on \
-machine virt,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
-cpu host \
-nodefaults  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=usb-mouse1,bus=usb1.0,port=2 \
-device usb-kbd,id=usb-kbd1,bus=usb1.0,port=3 \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0,write-cache=on \
-device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
-device virtio-net-pci,mac=9a:0a:71:f3:69:7d,rombar=0,id=idv2eapv,netdev=tap0,bus=pcie-root-port-4,addr=0x0 \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x2,chassis=6 \
-device pcie-root-port,id=pcie_extra_root_port_1,addr=0x2.0x1,bus=pcie.0,chassis=7 \
-device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/kvm_autotest_root/images//rhel900-aarch64-virtio-scsi.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \
-blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
-blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images//rhel900-aarch64-virtio-scsi.qcow2.fd,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
-blockdev driver=file,cache.direct=on,read-only=on,cache.no-flush=off,filename=/home/ipa/6ObtrK.iso,node-name=drive_iso_pre \
-blockdev driver=raw,node-name=drive_cd1,read-only=on,file=drive_iso_pre \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host,driftfix=slew \
-enable-kvm  \
-qmp tcp:0:3333,server=on,wait=off \
-qmp tcp:0:9999,server=on,wait=off \
-qmp tcp:0:9888,server=on,wait=off \
-serial tcp:0:4444,server=on,wait=off \
-monitor stdio \

Comment 1 Li Xiaohui 2021-07-12 06:24:50 UTC
Test pass after running with rhel9 vm on rhel9.0.0 x86 platform (kernel-5.13.0-0.rc4.33.el9.x86_64 & qemu-img-6.0.0-7.el9.x86_64):
--> Running case(1/1): RHEL7-10063-[migration] Migrate guest while cdrom in use (13 min 16 sec)--- PASS.

So doesn't hit such product issue on x86.

Comment 2 Guowen Shan 2021-07-13 01:23:18 UTC
Xiaohui, So this issue is arm64 specific according to comment#1?

Comment 3 Li Xiaohui 2021-07-13 01:26:41 UTC
(In reply to Guowen Shan from comment #2)
> Xiaohui, So this issue is arm64 specific according to comment#1?

Hi, I haven't tried on ppc as have no available machines now, but on x86 didn't hit this issue.

Comment 5 Guowen Shan 2021-07-22 01:59:08 UTC
I think it's worthy to check if last QEMU build has same issue or not.

   qemu-img-6.0.0-7.el9.aarch64     # Where the issue was reported
   qemu-img-6.0.0-9.el9.aarch64     # last QEMU build

I failed to reproduce the issue with upstream QEMU. What I did:

(1) Create ISO file

    dd if=/dev/urandom of=/home/gavin/sandbox/images/cdrom.raw bs=1M count=10240
    md5sum /home/gavin/sandbox/images/cdrom.raw
    mkisofs -o /home/gavin/sandbox/images/cdrom.iso -max-iso9660-filenames      \
            -relaxed-filenames -allow-limited-size -D --input-charset iso8859-1 \
            /home/gavin/sandbox/images/cdrom.raw

(2) Start source/target VMs with the following parameters

    -device virtio-scsi-pci,id=virtio_scsi0 \
    -blockdev driver=file,cache.direct=on,read-only=on,cache.no-flush=off,filename=/home/gavin/sandbox/images/cdrom.iso,node-name=cdrom_iso                              \
    -blockdev driver=raw,node-name=cdrom_drive,read-only=on,file=cdrom_iso -device scsi-cd,id=cdrom_dev,drive=cdrom_drive

(3) Kick off copying files from CDROM and then start the migration.
    The migration is finished without error

    mount /dev/sr0 /media
    rm -fr /home/gavin/sandbox/data/*
    nohup cp -rf /media/* /home/gavin/sandbox/data/ &

    (qemu) migrate -d tcp:0:4444

Comment 6 Li Xiaohui 2021-07-27 07:25:58 UTC
Hi Gavin,
Tried this bz again on the qemu-img-6.0.0-9.el9.aarch64, but the recurrence rate is particularly low, 3/40 reproduce.

The qemu coredump info:
[root@ampere-hr350a-04 home]# coredumpctl
TIME                          PID UID GID SIG     COREFILE  EXE                >
Mon 2021-07-26 04:21:06 EDT 59648   0   0 SIGABRT truncated /usr/libexec/qemu-k>
Mon 2021-07-26 05:22:59 EDT 64226   0   0 SIGABRT truncated /usr/libexec/qemu-k>
Mon 2021-07-26 11:17:25 EDT 80086   0   0 SIGABRT present   /usr/libexec/qemu-k>
[root@ampere-hr350a-04 home]# coredumpctl info 80086
           PID: 80086 (qemu-kvm)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Mon 2021-07-26 11:16:52 EDT (11h ago)
  Command Line: /usr/libexec/qemu-kvm -name mouse-vm -sandbox on -machine virt,>
    Executable: /usr/libexec/qemu-kvm
 Control Group: /user.slice/user-0.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-0.slice
       Session: 2
     Owner UID: 0 (root)
       Boot ID: 7cd0b28d44b04a1081960dc52da7e2f1
    Machine ID: d2c8c2a2c98a4a3e84961e3831b4bff9
      Hostname: ampere-hr350a-04.khw4.lab.eng.bos.redhat.com
       Storage: /var/lib/systemd/coredump/core.qemu-kvm.0.7cd0b28d44b04a1081960>
     Disk Size: 2.0G
       Message: Process 80086 (qemu-kvm) of user 0 dumped core.
                
                Stack trace of thread 80282:
                #0  0x0000ffff88869e28 __pthread_kill_internal (libc.so.6 + 0x8>
                #1  0x0000ffff88823b80 raise (libc.so.6 + 0x3db80)
                #2  0x0000ffff88810178 abort (libc.so.6 + 0x2a178)
                #3  0x0000ffff8881d2c4 __assert_fail_base (libc.so.6 + 0x372c4)
                #4  0x0000ffff8881d340 __assert_fail (libc.so.6 + 0x37340)
                #5  0x0000aaaadb9af3b0 aio_task_pool_wait_one (qemu-kvm + 0x63f>
                #6  0x0000aaaadb9591fc qcow2_co_pwritev_part.lto_priv.0 (qemu-k>
                #7  0x0000aaaadb961024 bdrv_driver_pwritev.lto_priv.0 (qemu-kvm>
                #8  0x0000aaaadb962304 bdrv_aligned_pwritev.lto_priv.0 (qemu-kv>
                #9  0x0000aaaadb963b78 bdrv_co_pwritev_part (qemu-kvm + 0x5f3b7>
                #10 0x0000aaaadb9700c0 blk_do_pwritev_part.lto_priv.0 (qemu-kvm>
                #11 0x0000aaaadb9703b4 blk_aio_write_entry.lto_priv.0 (qemu-kvm>
                #12 0x0000aaaadba41f3c coroutine_trampoline (qemu-kvm + 0x6d1f3>
                #13 0x0000ffff88832e00 n/a (libc.so.6 + 0x4ce00)
                #14 0x0000ffff88832e00 n/a (libc.so.6 + 0x4ce00)
lines 13-35/35 (END)

Detailed core dump file please see:
http://kvmqe-tools.qe.lab.eng.nay.redhat.com/logjump.html?target=bos&path=xiaohli/bug/bz_1981207

Comment 7 Li Xiaohui 2021-07-27 07:29:05 UTC
(In reply to Guowen Shan from comment #5)
> I think it's worthy to check if last QEMU build has same issue or not.
> 
>    qemu-img-6.0.0-7.el9.aarch64     # Where the issue was reported
>    qemu-img-6.0.0-9.el9.aarch64     # last QEMU build
> 
> I failed to reproduce the issue with upstream QEMU. What I did:
> 
> (1) Create ISO file
> 
>     dd if=/dev/urandom of=/home/gavin/sandbox/images/cdrom.raw bs=1M
> count=10240
>     md5sum /home/gavin/sandbox/images/cdrom.raw
>     mkisofs -o /home/gavin/sandbox/images/cdrom.iso -max-iso9660-filenames  
> \
>             -relaxed-filenames -allow-limited-size -D --input-charset
> iso8859-1 \
>             /home/gavin/sandbox/images/cdrom.raw
> 
> (2) Start source/target VMs with the following parameters
> 
>     -device virtio-scsi-pci,id=virtio_scsi0 \
>     -blockdev
> driver=file,cache.direct=on,read-only=on,cache.no-flush=off,filename=/home/
> gavin/sandbox/images/cdrom.iso,node-name=cdrom_iso                          
> \
>     -blockdev driver=raw,node-name=cdrom_drive,read-only=on,file=cdrom_iso
> -device scsi-cd,id=cdrom_dev,drive=cdrom_drive
> 
> (3) Kick off copying files from CDROM and then start the migration.
>     The migration is finished without error
> 
>     mount /dev/sr0 /media
>     rm -fr /home/gavin/sandbox/data/*
>     nohup cp -rf /media/* /home/gavin/sandbox/data/ &
> 
>     (qemu) migrate -d tcp:0:4444

Did you migrate on same host? I'd doubt it's the reason that you didn't reproduce bz. But also please try more times.

Comment 8 Guowen Shan 2021-09-13 23:34:39 UTC
Xiaohui, I always try migration on same machine because I don't have two
machines to migrate via network.

By the way, could you help to try last QEMU/kernel image. The LTO compiling
option was disabled recently. It has been affecting many aspects, including
the virtio-block devices.

Comment 9 Li Xiaohui 2021-09-15 08:38:32 UTC
(In reply to Guowen Shan from comment #8)
> Xiaohui, I always try migration on same machine because I don't have two
> machines to migrate via network.
> 
> By the way, could you help to try last QEMU/kernel image. The LTO compiling
> option was disabled recently. It has been affecting many aspects, including
> the virtio-block devices.

Ok, I loaned two machines just now. Will update test results here with the latest qemu-kvm on rhel9.

Comment 10 Li Xiaohui 2021-09-16 06:25:31 UTC
Tested this bz with repeating 100 times on qemu-kvm-6.1.0-2.el9.aarch64, all pass, didn't hit this bz


Gavin, shall we close this bz as currentrelease according to your comment 8?

Comment 11 Guowen Shan 2021-09-16 23:46:28 UTC
(In reply to Li Xiaohui from comment #10)
> Tested this bz with repeating 100 times on qemu-kvm-6.1.0-2.el9.aarch64, all
> pass, didn't hit this bz
> 
> 
> Gavin, shall we close this bz as currentrelease according to your comment 8?
>

Yes, I agree to close it as currentrelease. The LTO compiling option sometimes
introduce weird problems. Thanks for testing it again, particularly 100 cycles
of migrations take much time.

Comment 12 Li Xiaohui 2021-09-17 06:49:55 UTC
(In reply to Guowen Shan from comment #11)
> (In reply to Li Xiaohui from comment #10)
> > Tested this bz with repeating 100 times on qemu-kvm-6.1.0-2.el9.aarch64, all
> > pass, didn't hit this bz
> > 
> > 
> > Gavin, shall we close this bz as currentrelease according to your comment 8?
> >
> 
> Yes, I agree to close it as currentrelease. The LTO compiling option
> sometimes introduce weird problems. Thanks for testing it again, particularly 100
> cycles of migrations take much time.

It's my pleasure, please go ahead, thanks.


Note You need to log in before you can comment on or make changes to this bug.