Bug 2029980

Summary: Failed assertion in IDE emulation with Ceph backend
Product: Red Hat Enterprise Linux 8 Reporter: Hanna Czenczek <hreitz>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: Storage QA Contact: Tingting Mao <timao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: coli, jinzhao, jsnow, juzhang, kkiwi, qinwang, timao, virt-maint, yfu
Version: 8.4Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-15.module+el8.7.0+15644+189a21f6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 09:18:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproducer, attach as IDE-HD image to x86 PC machine
none
Reproducer source code (FASM) none

Description Hanna Czenczek 2021-12-07 17:48:57 UTC
As reported in BZ 2026706 comment 45 and BZ 2026706 comment 46, Tingting Mao encountered a failed assertion in our IDE emulation (at least when using a block device driven by block/rbd.c).

(QEMU version: qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64)

This is the top part of the stack trace:

#0  0x00007f49fb0bf37f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f49fb0a9db5 in __GI_abort () at abort.c:79
#2  0x00007f49fb0a9c89 in __assert_fail_base
    (fmt=0x7f49fb2125f8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x558556a50b03 "s->bus->dma->aiocb == NULL", file=0x558556a50ada "../hw/ide/core.c", line=730, function=<optimized out>)
    at assert.c:92
#3  0x00007f49fb0b7a76 in __GI___assert_fail
    (assertion=assertion@entry=0x558556a50b03 "s->bus->dma->aiocb == NULL", file=file@entry=0x558556a50ada "../hw/ide/core.c", line=line@entry=730, function=function@entry=0x558556a51040 <__PRETTY_FUNCTION__.32556> "ide_cancel_dma_sync") at assert.c:101
#4  0x00005585566a417e in ide_cancel_dma_sync (s=0x558559957698) at ../hw/ide/core.c:730
#5  0x00005585566a417e in ide_cancel_dma_sync (s=0x558559957698) at ../hw/ide/core.c:699
#6  0x00005585567337c2 in bmdma_cmd_writeb (bm=0x5585599587f0, val=<optimized out>)
    at /usr/src/debug/qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64/include/hw/ide/internal.h:570
#7  0x0000558556840f68 in memory_region_write_accessor
    (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at ../softmmu/memory.c:491

Unfortunately, I can’t access the core dump linked in BZ 2026706 comment 46 (the link gives me a 403 error), and so I don’t know what led to this.  Perhaps some new request comes in between the blk_drain() and the assertion?  (Again unfortunately, the IDE emulation code isn’t quite my home.)

Comment 2 John Ferlan 2021-12-07 19:08:36 UTC
Moving this to RHEL for resolution. If we need a RHEL-AV bug, then we will have to use a z-stream process.

Comment 3 Tingting Mao 2021-12-08 02:38:07 UTC
Correct the reproduce steps(The qemu will core dumped after writing a larger data size than the free size in the guest.)


1. Boot from the windows guest with IDE emulation
/usr/libexec/qemu-kvm \
-name guest=instance-0000026f,debug-threads=on \
-S \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
-cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff \
-m 1024 \
-object memory-backend-ram,id=pc.ram,size=1073741824 \
-overcommit mem-lock=off \
-smp 4,sockets=4,dies=1,cores=1,threads=1 \
-uuid 75d92da9-2652-4632-ac9f-497677076cf5 \
-smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.6.2-2.20210607104828.el8ost.4,serial=75d92da9-2652-4632-ac9f-497677076cf5,uuid=75d92da9-2652-4632-ac9f-497677076cf5,family=Virtual Machine' \
-no-user-config \
-nodefaults \
-rtc base=localtime,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-boot strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-blockdev '{"driver":"rbd","pool":"rbd","image":"win2016.img","cache":{"direct":false,"no-flush":false},"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \
-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1,write-cache=on \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,rx_queue_size=512,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:e9:f3:ab,bus=pci.0,addr=0x3 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc :0 \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,resourcecontrol=deny \
-msg timestamp=on \
-monitor stdio \
-chardev socket,server=on,path=/var/tmp/monitor-qmpmonitor1-20210721-024113-AsZ7KYro,id=qmp_id_qmpmonitor1,wait=off  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-blockdev node-name=file_winutils,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_winutils,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_winutils \
-device ide-cd,id=winutils,drive=drive_winutils,write-cache=on,bus=ide.1,unit=0 \

2. Write data in the guest over the free size in the guest.
(guest)# fio.exe -name=test --filename=C:\User\Administrator\Destop\fio --size=100G  --rw=write


Result:
After step2, the qemu coredumped.
(gdb) bt
#0  0x00007f8591a2f37f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8591a19db5 in __GI_abort () at abort.c:79
#2  0x00007f8591a19c89 in __assert_fail_base
    (fmt=0x7f8591b825f8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x559aabfaab03 "s->bus->dma->aiocb == NULL", file=0x559aabfaaada "../hw/ide/core.c", line=730, function=<optimized out>)
    at assert.c:92
#3  0x00007f8591a27a76 in __GI___assert_fail
    (assertion=assertion@entry=0x559aabfaab03 "s->bus->dma->aiocb == NULL", file=file@entry=0x559aabfaaada "../hw/ide/core.c", line=line@entry=730, function=function@entry=0x559aabfab040 <__PRETTY_FUNCTION__.32556> "ide_cancel_dma_sync") at assert.c:101
#4  0x0000559aabbfe17e in ide_cancel_dma_sync (s=0x559aad883be8) at ../hw/ide/core.c:730
#5  0x0000559aabbfe17e in ide_cancel_dma_sync (s=0x559aad883be8) at ../hw/ide/core.c:699
#6  0x0000559aabc8d7c2 in bmdma_cmd_writeb (bm=0x559aad884d40, val=<optimized out>) at /usr/src/debug/qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64/include/hw/ide/internal.h:570
#7  0x0000559aabd9af68 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...)
    at ../softmmu/memory.c:491
#8  0x0000559aabd99b5e in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f850edfc518, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=
    0x559aabd9af10 <memory_region_write_accessor>, mr=0x559aad884e90, attrs=...) at ../softmmu/memory.c:552
#9  0x0000559aabd9cf03 in memory_region_dispatch_write (mr=mr@entry=0x559aad884e90, addr=0, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...) at ../softmmu/memory.c:1501
#10 0x0000559aabdfb2c7 in flatview_write_continue
    (fv=fv@entry=0x7f8504275090, addr=addr@entry=49376, attrs=..., ptr=ptr@entry=0x7f85957b7000, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x559aad884e90)
    at /usr/src/debug/qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64/include/qemu/host-utils.h:164
#11 0x0000559aabdfb4e6 in flatview_write (fv=0x7f8504275090, addr=49376, attrs=..., buf=0x7f85957b7000, len=1) at ../softmmu/physmem.c:2799
#12 0x0000559aabdff07f in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at ../softmmu/physmem.c:2891
#13 0x0000559aabe11fe4 in kvm_handle_io (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=49376) at ../accel/kvm/kvm-all.c:2297
#14 0x0000559aabe11fe4 in kvm_cpu_exec (cpu=cpu@entry=0x559aad053e60) at ../accel/kvm/kvm-all.c:2543
#15 0x0000559aabdd1445 in kvm_vcpu_thread_fn (arg=0x559aad053e60) at ../accel/kvm/kvm-cpus.c:49
#16 0x0000559aabf46a24 in qemu_thread_start (args=0x559aad07b330) at ../util/qemu-thread-posix.c:521
#17 0x00007f8591dc514a in start_thread (arg=<optimized out>) at pthread_create.c:479
#18 0x00007f8591af4dc3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


The complete coredumped file: 
http://fileshare.englab.nay.redhat.com/pub/section2/kvm/timao/bugs/coredump/core.qemu-kvm.0.1f07eca2af2a43a6bcf3a3e00cd68c56.211561.1638859852000000.lz4

Comment 4 Klaus Heinrich Kiwi 2021-12-09 17:57:00 UTC
Assigning to Hanna for now, but we can move that to John or someone else if that ends up being a better fit.

Comment 5 Hanna Czenczek 2021-12-14 12:38:18 UTC
Here’s what I found from the core dump:

Some TRIM request is going on.  I’m not sure exactly what the request entails, because from what I can see, the I/O vector only contains metadata (64 separate trim requests, each stored as a u64, consisting of a 48 bit sector number and a 16 bit sector count), but it only contains zeroes, so no discard is actually done.

Because the I/O vector only contains zeroes and no discard is done, ide_issue_trim_cb() is done, and has scheduled ide_trim_bh_cb() to run in a bottom half (this BH is the second entry in qemu_aio_context->bh_list).

However, ide_trim_bh_cb() has not been run yet.  If it were, it would have called dma_blk_cb(), which would have called ide_dma_cb(), which would have seen the end of transfer and thus called ide_set_inactive(), which would have cleared s->bus->dma->aiocb.

I believe the BH hasn’t been run because we haven’t polled for it.  blk_drain() just drains the BB, but there are no in-flight requests, and so blk_drain() returns basically immediately (the BDRV_POLL_WHILE() loop in bdrv_do_drained_begin() sees in_flight == 0 everywhere, and so does the AIO_WAIT_WHILE() loop in blk_drain()).

So my best guess right now is that we need ide_cancel_dma_sync() needs another pool loop to wait for all scheduled BHs to settle.

Comment 7 Hanna Czenczek 2022-01-04 18:58:32 UTC
Created attachment 1848924 [details]
Reproducer, attach as IDE-HD image to x86 PC machine

Got a reproducer, works as follows:

$ ./qemu-system-x86_64 -drive if=ide,file=test.bin,format=raw 
qemu-system-x86_64: ../hw/ide/core.c:734: ide_cancel_dma_sync: Assertion `s->bus->dma->aiocb == NULL' failed.
[1]    1447352 abort (core dumped)  ./qemu-system-x86_64 -drive if=ide,file=test.bin,format=raw

Couldn’t get this to reproduce via qtest, because that means the system emulator is in a separate process from the testing process, and so QEMU gets a chance to poll between each I/O access, which means ide_trim_bh_cb() will always be invoked before we can cancel the ongoing request.  Therefore, the reproducer is an x86 boot sector, which:
(1) looks up the IDE PCI device via BIOS functions
(2) sets up DMA
(3) sends a DSM TRIM command
(4) starts the DMA transfer for it (one completely zeroed 512 byte buffer)
(5) immediately cancels it by setting SRST

(Yes, its backtrace goes ide_perform_srst() -> ide_cancel_dma_sync() instead of bmdma_cmd_writeb() -> ide_cancel_dma_sync(), but I don’t think that matters.)

Hanna

Comment 8 Hanna Czenczek 2022-01-04 18:59:30 UTC
Created attachment 1848925 [details]
Reproducer source code (FASM)

Comment 9 Hanna Czenczek 2022-03-08 13:52:04 UTC
Fix has been merged upstream (7e5cdb345f77d76cb4877fe6230c4e17a7d0d0ca, “ide: Increment BB in-flight counter for TRIM BH”), and so will be in the qemu 7.0 release.

Comment 15 Tingting Mao 2022-06-24 07:04:01 UTC
Verified this bug as below.


Tested with:
qemu-kvm-6.2.0-15.module+el8.7.0+15644+189a21f6
kernel-4.18.0-400.el8.x86_64


Steps:
1. Boot from the windows guest with IDE emulation
# /usr/libexec/qemu-kvm \
-name guest=instance-0000026f,debug-threads=on \
-S \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
-cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff \
-m 1024 \
-object memory-backend-ram,id=pc.ram,size=1073741824 \
-overcommit mem-lock=off \
-smp 4,sockets=4,dies=1,cores=1,threads=1 \
-uuid 75d92da9-2652-4632-ac9f-497677076cf5 \
-smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.6.2-2.20210607104828.el8ost.4,serial=75d92da9-2652-4632-ac9f-497677076cf5,uuid=75d92da9-2652-4632-ac9f-497677076cf5,family=Virtual Machine' \
-no-user-config \
-nodefaults \
-rtc base=localtime,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-boot strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-blockdev '{"driver":"rbd","pool":"rbd","image":"win2022.img","cache":{"direct":false,"no-flush":false},"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \
-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1,write-cache=on \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,rx_queue_size=512,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:e9:f3:ab,bus=pci.0,addr=0x3 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc :0 \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,resourcecontrol=deny \
-msg timestamp=on \
-monitor stdio \
-chardev socket,server=on,path=/var/tmp/monitor-qmpmonitor1-20210721-024113-AsZ7KYro,id=qmp_id_qmpmonitor1,wait=off  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-blockdev node-name=file_winutils,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_winutils,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_winutils \
-device ide-cd,id=winutils,drive=drive_winutils,write-cache=on,bus=ide.1,unit=0 \

2. Write data in the guest over the free size in the guest.
(guest)# fio.exe --name=test --filename=C:\fiotest --size=100G  --rw=write


Result:
Steps finished without any errors, and the guest works well.


Also, tried the steps in comment7, there is no core dumped.
# /usr/libexec/qemu-kvm -drive if=ide,file=test.img,format=raw 
VNC server running on ::1:5900

Comment 18 Tingting Mao 2022-06-27 01:44:53 UTC
According to comment15, set this bug as verified.

Comment 20 errata-xmlrpc 2022-11-08 09:18:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7472