RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2186181 - Qemu core dump when query-jobs after creating target mirror node(iothread enable)
Summary: Qemu core dump when query-jobs after creating target mirror node(iothread ena...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-12 10:34 UTC by aihua liang
Modified: 2023-07-03 08:31 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-30 06:38:46 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-154513 0 None None None 2023-04-12 10:35:46 UTC

Description aihua liang 2023-04-12 10:34:17 UTC
Description of problem:
Qemu core dump when query-jobs after creating target mirror nodee


Version-Release number of selected component (if applicable):
kernel version:5.14.0-295.el9.x86_64
qemu-kvm version:qemu-kvm-8.0.0-0.rc1.el9.candidate


How reproducible:
2/5

Steps to Reproduce:
1.Start guest with qemu cmdline:
/usr/libexec/qemu-kvm \
 	-S  \
 	-name 'avocado-vt-vm1'  \
 	-sandbox on  \
 	-blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \
 	-blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
 	-blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \
 	-blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
 	-machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
 	-device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
 	-device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
 	-nodefaults \
 	-device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
 	-m 30720 \
 	-object '{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
 	-smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
 	-cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
 	-chardev socket,path=/var/tmp/avocado_22sdgjjv/monitor-qmpmonitor1-20230412-035446-fRTermpy,id=qmp_id_qmpmonitor1,wait=off,server=on  \
 	-mon chardev=qmp_id_qmpmonitor1,mode=control \
 	-chardev socket,path=/var/tmp/avocado_22sdgjjv/monitor-catch_monitor-20230412-035446-fRTermpy,id=qmp_id_catch_monitor,wait=off,server=on  \
 	-mon chardev=qmp_id_catch_monitor,mode=control \
 	-device '{"ioport": 1285, "driver": "pvpanic", "id": "idXYf3vj"}' \
 	-chardev socket,path=/var/tmp/avocado_22sdgjjv/serial-serial0-20230412-035446-fRTermpy,id=chardev_serial0,wait=off,server=on \
 	-device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
 	-chardev socket,id=seabioslog_id_20230412-035446-fRTermpy,path=/var/tmp/avocado_22sdgjjv/seabios-20230412-035446-fRTermpy,server=on,wait=off \
 	-device isa-debugcon,chardev=seabioslog_id_20230412-035446-fRTermpy,iobase=0x402 \
 	-device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
 	-device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
 	-blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
 	-object '{"qom-type": "iothread", "id": "iothread0"}' \
 	-blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
 	-device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
 	-device '{"driver": "virtio-blk-pci", "id": "image1", "drive": "drive_image1", "bootindex": 0, "write-cache": "on", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
 	-device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
 	-device '{"driver": "virtio-net-pci", "mac": "9a:af:02:d9:a7:c4", "id": "idHizN18", "netdev": "idSWxZ3M", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
 	-netdev tap,id=idSWxZ3M,vhost=on,vhostfd=16,fd=12  \
 	-vnc :0  \
 	-rtc base=utc,clock=host,driftfix=slew  \
 	-boot menu=off,order=cdn,once=c,strict=off \
 	-chardev socket,id=char_vtpm_avocado-vt-vm1_tpm0,path=/root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_swtpm.sock \
 	-tpmdev emulator,chardev=char_vtpm_avocado-vt-vm1_tpm0,id=emulator_vtpm_avocado-vt-vm1_tpm0 \
 	-device '{"id": "tpm-crb_vtpm_avocado-vt-vm1_tpm0", "tpmdev": "emulator_vtpm_avocado-vt-vm1_tpm0", "driver": "tpm-crb"}' \
 	-enable-kvm \
 	-device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}'

2.Continue vm
 {'execute': 'cont', 'id': 'XKDhbUzA'}

3.Create a file in guest
 (guest)#dd if=/dev/urandom of=/var/tmp/IA2q bs=1M count=10 oflag=direct
    	#md5sum /var/tmp/IA2q > /var/tmp/IA2q.md5 && sync

4.Create mirror target file node
  {'execute': 'blockdev-create', 'arguments': {'options': {'driver': 'file', 'filename': '/tmp/tmp_target_path/mirror1.qcow2', 'size': 21474836480}, 'job-id': 'file_mirror1'}, 'id': 'IJIXcNXa'}

5. Check job status
  {'execute': 'query-jobs', 'id': 'LubWCdSf'}

6. Dismiss job
  {'execute': 'job-dismiss', 'arguments': {'id': 'file_mirror1'}, 'id': 'KOupRvUY'}

7. Add mirror target file node
  {'execute': 'blockdev-add', 'arguments': {'node-name': 'file_mirror1', 'driver': 'file', 'filename': '/tmp/tmp_target_path/mirror1.qcow2', 'aio': 'threads', 'auto-read-only': True, 'discard': 'unmap'}, 'id': 'nzSslZfn'}

8. Create mirror target format node
  {'execute': 'blockdev-create', 'arguments': {'options': {'driver': 'qcow2', 'file': 'file_mirror1', 'size': 21474836480}, 'job-id': 'drive_mirror1'}, 'id': 'sIHz0l2u'}
{"timestamp": {"seconds": 1681286134, "microseconds": 135983}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_mirror1"}}
{"timestamp": {"seconds": 1681286134, "microseconds": 136017}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_mirror1"}}
 {"return": {}, "id": "sIHz0l2u"}

9. Check block job status
  {'execute': 'query-jobs', 'id': 'XFqDVXUh'}
{"return": [{"current-progress": 0, "status": "running", "total-progress": 1, "type": "create", "id": "drive_mirror1"}], "id": "XFqDVXUh"}

10. Check job status again
  {'execute': 'query-jobs', 'id': 'RNNblrXT'}

Actual results:
After step10, no response for query-jobs and qemu coredump with info:
 [qemu output] /tmp/aexpect_j9SQ86Ot/aexpect-5jxdpzjj.sh: line 1: 307497 Segmentation fault  	(core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}' -nodefaults -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' -m 30720 -object '{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2 -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt -chardev socket,path=/var/tmp/avocado_22sdgjjv/monitor-qmpmonitor1-20230412-035446-fRTermpy,id=qmp_id_qmpmonitor1,wait=off,server=on -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,path=/var/tmp/avocado_22sdgjjv/monitor-catch_monitor-20230412-035446-fRTermpy,id=qmp_id_catch_monitor,wait=off,server=on -mon chardev=qmp_id_catch_monitor,mode=control ...

Expected results:
Mirror node can created/add successfully, and mirror executed successfully.

Additional info:
Coredump info:
 coredumpctl debug 307497
       	PID: 307497 (qemu-kvm)
       	UID: 0 (root)
       	GID: 0 (root)
    	Signal: 11 (SEGV)
 	Timestamp: Wed 2023-04-12 03:55:34 EDT (19min ago)
  Command Line: /usr/libexec/qemu-kvm -S -name avocado-vt-vm1 -sandbox on -blockdev $'{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev $'{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' -blockdev $'{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev $'{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars -device $'{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' -device $'{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}' -nodefaults -device $'{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' -m 30720 -object $'{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2 -cpu Cascadelake-Server-noTSX,+kvm_pv_unhalt -chardev socket,path=/var/tmp/avocado_22sdgjjv/monitor-qmpmonitor1-20230412-035446-fRTermpy,id=qmp_id_qmpmonitor1,wait=off,server=on -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,path=/var/tmp/avocado_22sdgjjv/monitor-catch_monitor-20230412-035446-fRTermpy,id=qmp_id_catch_monitor,wait=off,server=on -mon chardev=qmp_id_catch_monitor,mode=control -device $'{"ioport": 1285, "driver": "pvpanic", "id": "idXYf3vj"}' -chardev socket,path=/var/tmp/avocado_22sdgjjv/serial-serial0-20230412-035446-fRTermpy,id=chardev_serial0,wait=off,server=on -device $'{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}' -chardev socket,id=seabioslog_id_20230412-035446-fRTermpy,path=/var/tmp/avocado_22sdgjjv/seabios-20230412-035446-fRTermpy,server=on,wait=off -device isa-debugcon,chardev=seabioslog_id_20230412-035446-fRTermpy,iobase=0x402 -device $'{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' -device $'{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' -device $'{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' -blockdev $'{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' -object $'{"qom-type": "iothread", "id": "iothread0"}' -blockdev $'{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' -device $'{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' -device $'{"driver": "virtio-blk-pci", "id": "image1", "drive": "drive_image1", "bootindex": 0, "write-cache": "on", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' -device $'{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' -device $'{"driver": "virtio-net-pci", "mac": "9a:af:02:d9:a7:c4", "id": "idHizN18", "netdev": "idSWxZ3M", "bus": "pcie-root-port-3", "addr": "0x0"}' -netdev tap,id=idSWxZ3M,vhost=on,vhostfd=16,fd=12 -vnc :0 -rtc base=utc,clock=host,driftfix=slew -boot menu=off,order=cdn,once=c,strict=off -chardev socket,id=char_vtpm_avocado-vt-vm1_tpm0,path=/root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_swtpm.sock -tpmdev emulator,chardev=char_vtpm_avocado-vt-vm1_tpm0,id=emulator_vtpm_avocado-vt-vm1_tpm0 -device $'{"id": "tpm-crb_vtpm_avocado-vt-vm1_tpm0", "tpmdev": "emulator_vtpm_avocado-vt-vm1_tpm0", "driver": "tpm-crb"}' -enable-kvm -device $'{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}'
	Executable: /usr/libexec/qemu-kvm
 Control Group: /user.slice/user-0.slice/session-11.scope
      	Unit: session-11.scope
     	Slice: user-0.slice
   	Session: 11
 	Owner UID: 0 (root)
   	Boot ID: 9e198a9379fe4bf09c651e58127fa82a
	Machine ID: 17118f5bdfe643f79903478703dce496
  	Hostname: dell-per440-27.lab.eng.pek2.redhat.com
   	Storage: /var/lib/systemd/coredump/core.qemu-kvm.0.9e198a9379fe4bf09c651e58127fa82a.307497.1681286134000000.zst (present)
  Size on Disk: 557.5M
   	Message: Process 307497 (qemu-kvm) of user 0 dumped core.
           	 
            	Stack trace of thread 307499:
            	#0  0x00007fe1a00ae1c3 _int_malloc (libc.so.6 + 0xae1c3)
            	#1  0x00007fe1a00b012e __libc_calloc (libc.so.6 + 0xb012e)
            	#2  0x00007fe1a06dfb41 g_malloc0 (libglib-2.0.so.0 + 0x5ab41)
            	#3  0x0000560428975c47 qemu_coroutine_new (qemu-kvm + 0xa02c47)
            	#4  0x0000560428974500 qemu_coroutine_create (qemu-kvm + 0xa01500)
            	#5  0x0000560428762a71 blk_io_plug (qemu-kvm + 0x7efa71)
            	#6  0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#7  0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#8  0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#9  0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#10 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#11 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#12 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#13 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#14 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#15 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#16 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#17 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#18 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#19 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#20 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#21 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#22 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#23 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#24 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#25 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#26 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#27 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#28 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#29 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#30 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#31 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#32 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#33 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#34 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#35 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#36 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#37 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#38 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#39 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#40 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#41 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#42 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#43 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#44 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#45 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#46 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#47 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#48 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#49 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#50 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#51 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#52 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#53 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#54 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
            	#55 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#56 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#57 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
            	#58 0x000056042876631d bdrv_poll_co (qemu-kvm + 0x7f331d)
            	#59 0x0000560428762a7d blk_io_plug (qemu-kvm + 0x7efa7d)
            	#60 0x00005604285de750 virtio_blk_handle_vq (qemu-kvm + 0x66b750)
   	query     	#61 0x00005604286243a4 virtio_queue_host_notifier_aio_poll_ready (qemu-kvm + 0x6b13a4)
            	#62 0x00005604289578c1 aio_dispatch_handler (qemu-kvm + 0x9e48c1)
            	#63 0x0000560428958613 aio_poll (qemu-kvm + 0x9e5613)
           	 
            	Stack trace of thread 307498:
            	#0  0x00007fe1a003ee5d syscall (libc.so.6 + 0x3ee5d)
            	#1  0x000056042895bc5f qemu_event_wait (qemu-kvm + 0x9e8c5f)
            	#2  0x000056042896821b call_rcu_thread (qemu-kvm + 0x9f521b)
            	#3  0x000056042895bf0a qemu_thread_start (qemu-kvm + 0x9e8f0a)
            	#4  0x00007fe1a009f832 start_thread (libc.so.6 + 0x9f832)
            	#5  0x00007fe1a003f450 __clone3 (libc.so.6 + 0x3f450)
           	 
            	Stack trace of thread 307505:
            	#0  0x00007fe1a009c590 __GI___lll_lock_wait (libc.so.6 + 0x9c590)
            	#1  0x00007fe1a00a2c52 __pthread_mutex_lock.5 (libc.so.6 + 0xa2c52)
            	#2  0x000056042895af6f qemu_mutex_lock_impl (qemu-kvm + 0x9e7f6f)
            	#3  0x000056042865af9d flatview_write_continue (qemu-kvm + 0x6e7f9d)
            	#4  0x000056042865ad11 flatview_write (qemu-kvm + 0x6e7d11)
            	#5  0x000056042865f14c address_space_write (qemu-kvm + 0x6ec14c)
            	#6  0x000056042870323e kvm_cpu_exec (qemu-kvm + 0x79023e)
            	#7  0x00005604287057aa kvm_vcpu_thread_fn (qemu-kvm + 0x7927aa)
            	#8  0x000056042895bf0a qemu_thread_start (qemu-kvm + 0x9e8f0a)
            	#9  0x00007fe1a009f832 start_thread (libc.so.6 + 0x9f832)
            	#10 0x00007fe1a003f450 __clone3 (libc.so.6 + 0x3f450)
           	 
            	Stack trace of thread 307497:
            	#0  0x00007fe1a009c590 __GI___lll_lock_wait (libc.so.6 + 0x9c590)
            	#1  0x00007fe1a00a2cad __pthread_mutex_lock.5 (libc.so.6 + 0xa2cad)
            	#2  0x000056042895af6f qemu_mutex_lock_impl (qemu-kvm + 0x9e7f6f)
            	#3  0x00005604287afe68 bdrv_co_yield_to_drain (qemu-kvm + 0x83ce68)
            	#4  0x00005604287b57a9 bdrv_drain_all_end (qemu-kvm + 0x8427a9)
            	#5  0x000056042876d16a bdrv_replace_child_noperm (qemu-kvm + 0x7fa16a)
            	#6  0x000056042876d044 bdrv_root_unref_child (qemu-kvm + 0x7fa044)
            	#7  0x000056042879b366 blk_unref (qemu-kvm + 0x828366)
            	#8  0x00005604287e1ea7 qcow2_co_create (qemu-kvm + 0x86eea7)
            	#9  0x00005604287a7a21 blockdev_create_run (qemu-kvm + 0x834a21)
            	#10 0x000056042877d011 job_co_entry (qemu-kvm + 0x80a011)
            	#11 0x0000560428975e26 coroutine_trampoline (qemu-kvm + 0xa02e26)
            	#12 0x00007fe1a002a360 n/a (libc.so.6 + 0x2a360)
            	#13 0x0000000000000000 n/a (n/a + 0x0)
            	ELF object binary architecture: AMD x86-64

Note:
 Hit this issue by automation, will do more investigation on it to check if it's a regression and iothread related.

Comment 5 Kevin Wolf 2023-04-20 08:03:47 UTC
I have no idea why malloc() segfaults, this sounds a bit scary. I hope it's not memory corruption. Maybe a stack overflow that just coincidentally hits there?

Anyway, I don't think the recursion shown in the stack trace is intended. Stefan, can you have a look?

Comment 8 Stefan Hajnoczi 2023-04-27 21:52:26 UTC
The recursion is not intended. QEMU's virtqueue polling sees that there are new requests waiting to be processed, but blk_io_plug() makes a nested aio_poll() call. This creates an infinite call chain that will cause stack exhaustion.

I have patches that eliminate the blk_io_plug() API. They were not intended as a bug fix, instead they are necessary for upcoming QEMU multi-block layer changes. However, they are not a complete solution because they only eliminate blk_io_plug(). There may be other nested aio_poll() calls.

Two solutions come to mind:
1. Disable polling in nested aio_poll() calls because it is level-triggered instead of edge-triggered. If we read() the ioeventfd instead of relying on virtqueue memory polling, then this infinite call chain cannot happen.
2. Always pop virtqueue requests before calling aio_poll().

#1 fixes the entire class of bugs.
#2 is a per-device solution and there is no way to verify that all cases have been fixed other than auditing the whole codebase.

Comment 17 Stefan Hajnoczi 2023-05-24 20:05:18 UTC
I will post a backport for testing that includes Kevin's blk_co_unref() fix and my aio_poll() fix.

Comment 18 Stefan Hajnoczi 2023-05-26 11:02:39 UTC
Please test this rpm that includes Kevin's recent blk_co_unref() fix:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=52881197

Comment 22 aihua liang 2023-06-20 01:58:19 UTC
Test on qemu-kvm-8.0.0-5.el9 with case blockdev_full_backup, all pass.
 (12/16) Host_RHEL.m9.u3.ovmf.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_full_backup.simple_test.manual_completed.auto_compress.src_cluster_size_2M.q35: PASS (141.43 s)
 (13/16) Host_RHEL.m9.u3.ovmf.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_full_backup.during_reboot.with_data_plane.q35: PASS (187.11 s)
 (14/16) Host_RHEL.m9.u3.ovmf.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_full_backup.during_reboot.q35: PASS (187.11 s)
 (15/16) Host_RHEL.m9.u3.ovmf.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_full_backup.during_stress.with_data_plane.q35: PASS (155.82 s)
 (16/16) Host_RHEL.m9.u3.ovmf.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_full_backup.during_stress.q35: PASS (161.15 s)

Test on qemu-kvm-8.0.0-5.el9 with cases:differential_backup,blockdev_mirror_vm_reboot,blockdev_mirror_stress,blockdev_commit_install,blockdev_mirror_no_space,blockdev_mirror_vm_stop_cont,blockdev_snapshot_chains for total 100 times, all pass.
 (96/100) repeat7.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_mirror_vm_stop_cont.q35: PASS (482.48 s)
 (97/100) repeat7.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.differential_backup.q35: PASS (107.34 s)
 (96/100) repeat7.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (441.93 s)
 (97/100) repeat7.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_mirror_vm_reboot.q35: PASS (387.34 s)
 (98/100) repeat7.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_commit_install.q35: PASS (840.45 s)
 (99/100) repeat7.Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_mirror_no_space.q35: PASS (74.00 s)
 (100/100) Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.blockdev_mirror_stress.q35: PASS (508.58 s)


And also run regression test, all cases pass.


Note You need to log in before you can comment on or make changes to this bug.