Bug 2036178
Summary: | Qemu core dumped when do block-stream to a snapshot node on non-enough space storage | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Gu Nini <ngu> | |
Component: | qemu-kvm | Assignee: | Stefan Hajnoczi <stefanha> | |
qemu-kvm sub component: | Block Jobs | QA Contact: | aihua liang <aliang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | aliang, coli, hreitz, jinzhao, kkiwi, kwolf, smitterl, stefanha, thuth, virt-maint, virt-qe-z | |
Version: | 8.6 | Keywords: | Triaged | |
Target Milestone: | rc | |||
Target Release: | 8.6 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-6.2.0-6.module+el8.6.0+14165+5e5e76ac | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2040123 (view as bug list) | Environment: | ||
Last Closed: | 2022-05-10 13:25:20 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2040123 |
Description
Gu Nini
2021-12-30 09:07:48 UTC
I'd normally classify a crash at the high severity, but since it apparently happens on an error path (short disk space) I feel medium is adequated. Submitter, do you have evidence that this is exclusive to s390x? i.e., does the same test succeeds in other arches? Thomas & Kevin, fyi for now. -Klaus *** Bug 2036289 has been marked as a duplicate of this bug. *** (In reply to Klaus Heinrich Kiwi from comment #3) > I'd normally classify a crash at the high severity, but since it apparently > happens on an error path (short disk space) I feel medium is adequated. > > Submitter, do you have evidence that this is exclusive to s390x? i.e., does > the same test succeeds in other arches? Thomas & Kevin, fyi for now. > > -Klaus Agree, from the gdb debug log, the crash does not relate to the block-stream feature although it's triggered in that scenario. I have tried the same test on both ppc64le and x86_64 with the host kernel and qemu version, failed to reproduce the bug. *** Bug 2036801 has been marked as a duplicate of this bug. *** Thomas, I'm a bit confused in the next steps for the virt-storage-sst team to assist. Is there anyone who we should tag-team to reproduce this on s390x? Hanna, can you take this one? From the back trace and the QMP log, it looks to me like the block job has completed before/while the scsi-hd (attached via virtio-scsi) is reset and draining it. Could be that this has nothing to do with ENOSPC, and the problem simply is that the block job completes and vanishes in the blk_drain() invoked through virtio_reset() -> scsi_disk_reset(). (ENOSPC is just the reason why it completes so quickly and can concide with the virtio-scsi reset, which also comes very quickly after `cont`.) I would have liked to say that this could have something to do with what was fixed in the “block: Attempt on fixing 030-reported errors” series (because that also concerned itself with a stream job completing, and this breaking the block graph for another user)... But that is part of 6.2.0, so should be included in the version for which the bug was reported. :/ Regarding s390x, the only connection I can think of is CCW vs. PCI. I wouldn’t put this as the culprit, but maybe CCW does make reproducing the crash easier. Hanna I believe I’ve been able to come up with a reproducer (on x86) that works as follows: - Create the backing configuration as reported (one base image with some data, and one top image) - Add them to the VM, but ensure that the stream job will encounter an error when it writes to the top[1] - Attach the virtio-scsi scsi-hd device to any of those images[2] - Trigger a virtio-scsi reset[3] after the job has started, but before it has completed[4] [1] Comment 0 described an ENOSPC case, but a blkdebug node that returns EIO on the first write access works just as well. [2] Comment 0 described attaching it to the base node, but attaching it to the top node is a bit simpler, because then you can safe yourself the blockdev-snapshot command, and attach all nodes at VM startup. [3] I believe in comment 0, this was triggered by the firmware resetting the device (after `cont`), but you can also just reset the whole system, so there’s one timing window less to worry about. [4] To hit this other timing window, you can throttle the base image, such that there is some time to get in between the stream job having started and it encountering the error when writing to the target. I’ve written an iotest for this: https://gitlab.com/hreitz/qemu/-/blob/stefans-fix-and-a-test/tests/qemu-iotests/tests/stream-error-on-reset And at least this test no longer segfaults after applying Stefan’s upstream patch “[PATCH v2] block-backend: prevent dangling BDS pointers across aio_poll()”: https://lists.nongnu.org/archive/html/qemu-block/2021-12/msg00249.html I’ll try to provide a brew build with this patch included for testing tomorrow. Hanna Hi, I’ve run a brew build with Stefan’s patch included, the resulting repo is here: http://brew-task-repos.usersys.redhat.com/repos/scratch/hreitz/qemu-kvm/6.2.0/2.el8.hreitz202201111001/ Can you please try whether the bug still appears with that build? Thanks! Hanna (In reply to Hanna Reitz from comment #12) > Hi, > > I’ve run a brew build with Stefan’s patch included, the resulting repo is > here: > > http://brew-task-repos.usersys.redhat.com/repos/scratch/hreitz/qemu-kvm/6.2. > 0/2.el8.hreitz202201111001/ > > Can you please try whether the bug still appears with that build? > > Thanks! > Hanna OK, I will try to test the original scenario on s390x. I have sent v3 of the QEMU patch series with Hanna's test case for this BZ: https://patchew.org/QEMU/20220111153613.25453-1-stefanha@redhat.com/ Hanna: Feel free to assign this BZ to me if you want. I think a single backport will handle both bz2021778 and this BZ. Can reproduce this issue with hanna's iotest script Test Env: kernel version: 4.18.0-357.el8.x86_64 qemu-kvm version: qemu-kvm-6.2.0-2.module+el8.6.0+13738+17338784 Test steps: 1.Create base.img and top.img #qemu-img create -f qcow2 base.img 2G #qemu-img create -f qcow2 top.img 2G 2.Write data to base.img #qemu-io -c 'write 0 2147483136' base.img 3.Start vm with accel=tcg /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem,accel=tcg \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 30720 \ -object memory-backend-ram,size=30720M,id=mem-machine_mem \ -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \ -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on \ -chardev socket,wait=off,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20220105-012140-9igI6qZZ,server=on \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,wait=off,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20220105-012140-9igI6qZZ,server=on \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idsULHUn \ -chardev socket,wait=off,id=chardev_serial0,path=/tmp/serial-serial0-20220105-012140-9igI6qZZ,server=on \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20220105-012140-9igI6qZZ,path=/tmp/seabios-20220105-012140-9igI6qZZ,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20220105-012140-9igI6qZZ,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:d5:a4:20:d5:09,id=idVy3YIg,netdev=idXjRF2L,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idXjRF2L,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \ -monitor stdio \ 4.Add throttle group { "execute" : "object-add", "arguments": { "qom-type" : "throttle-group","id": "thrgr","x-bps-total":16384}} 5.Add base node and top node {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'base','file':{'driver':'throttle','throttle-group':'thrgr','file':{'driver':'file','filename':'/home/base.img'}}}} {"return": {}} {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'top','file':{'driver':'blkdebug','inject-error':[{'event':'pwritev','immediately':true,'once':true}],'image':{'driver':'file','filename':'/home/top.img'}},'backing':'base'}} {"return": {}} 6.Add scsi device {'execute':'device_add','arguments':{'driver': 'virtio-scsi-pci','id':'vscsi','bus':'pcie-root-port-2'}} {"return": {}} {'execute':'device_add','arguments':{'driver': 'scsi-hd','bus':'vscsi.0','drive':'top'}} {"return": {}} 7.Do stream {'execute':'block-stream','arguments':{'device':'top','job-id':'j1'}} {"timestamp": {"seconds": 1641980367, "microseconds": 374744}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}} {"timestamp": {"seconds": 1641980367, "microseconds": 374806}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}} {"return": {}} 8. During step7, reset vm {'execute':'system_reset'} After step8, vm core dump with info: (qemu) qemu-kvm: ../block/io.c:516: bdrv_do_drained_end: Assertion `bs->quiesce_counter > 0' failed. bug.txt: line 35: 747344 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox on -machine q35,memory-backend=mem-machine_mem,accel=tcg -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 -nodefaults -device VGA,bus=pcie.0,addr=0x2 -m 30720 -object memory-backend-ram,size=30720M,id=mem-machine_mem -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on ... And output of step7 is: {"timestamp": {"seconds": 1641980375, "microseconds": 277732}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1641980375, "microseconds": 277777}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "j1"}} {"timestamp": {"seconds": 1641980375, "microseconds": 277830}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 2147483648, "offset": 0, "speed": 0, "type": "stream", "error": "Input/output error"}} {"timestamp": {"seconds": 1641980375, "microseconds": 277858}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}} {"timestamp": {"seconds": 1641980375, "microseconds": 277878}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}} Coredump info: (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fea9868ddb5 in __GI_abort () at abort.c:79 #2 0x00007fea9868dc89 in __assert_fail_base (fmt=0x7fea987f67d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56171a20d532 "bs->quiesce_counter > 0", file=0x56171a2178f6 "../block/io.c", line=516, function=<optimized out>) at assert.c:92 #3 0x00007fea986b33a6 in __GI___assert_fail (assertion=assertion@entry=0x56171a20d532 "bs->quiesce_counter > 0", file=file@entry=0x56171a2178f6 "../block/io.c", line=line@entry=516, function=function@entry=0x56171a218700 <__PRETTY_FUNCTION__.30959> "bdrv_do_drained_end") at assert.c:101 #4 0x000056171a022dcd in bdrv_do_drained_end (bs=bs@entry=0x56171c41cc60, recursive=recursive@entry=false, parent=parent@entry=0x0, ignore_bds_parents=ignore_bds_parents@entry=false, drained_end_counter=drained_end_counter@entry=0x7ffda2f859a4) at ../block/io.c:516 #5 0x000056171a0239b5 in bdrv_drained_end (bs=0x56171c41cc60) at ../block/io.c:541 #6 0x000056171a019eb4 in blk_drain (blk=<optimized out>) at ../block/block-backend.c:1716 #7 0x0000561719e46157 in scsi_device_purge_requests (sdev=sdev@entry=0x56171c8a2550, sense=...) at ../hw/scsi/scsi-bus.c:1638 #8 0x0000561719e47c34 in scsi_disk_reset (dev=0x56171c8a2550) at ../hw/scsi/scsi-disk.c:2254 #9 0x0000561719fcf78b in resettable_phase_hold (obj=0x56171c8a2550, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:182 #10 0x0000561719fcbf74 in bus_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #11 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171bea1b50, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171bf8b898) at ../hw/core/resettable.c:96 #12 resettable_phase_hold (obj=0x56171bf8b898, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #13 0x0000561719fcda9b in device_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:317 #14 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171be0f9c0, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171bf8b640) at ../hw/core/resettable.c:96 #15 resettable_phase_hold (obj=0x56171bf8b640, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #16 0x0000561719fcbf74 in bus_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #17 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171be4a6b0, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171bf8b5b8) at ../hw/core/resettable.c:96 #18 resettable_phase_hold (obj=0x56171bf8b5b8, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #19 0x0000561719fcda9b in device_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:317 #20 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171be4b830, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171bf833b0) at ../hw/core/resettable.c:96 #21 resettable_phase_hold (obj=0x56171bf833b0, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #22 0x0000561719fcbf74 in bus_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #23 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171be11760, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171d08e240) at ../hw/core/resettable.c:96 #24 resettable_phase_hold (obj=0x56171d08e240, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #25 0x0000561719fcda9b in device_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:317 #26 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171bf27af0, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171d08d8d0) at ../hw/core/resettable.c:96 #27 resettable_phase_hold (obj=0x56171d08d8d0, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #28 0x0000561719fcbf74 in bus_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #29 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171be11760, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171c17a700) at ../hw/core/resettable.c:96 #30 resettable_phase_hold (obj=0x56171c17a700, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #31 0x0000561719fcda9b in device_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:317 #32 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171be8e9d0, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171c13d840) at ../hw/core/resettable.c:96 #33 resettable_phase_hold (obj=0x56171c13d840, opaque=<optimized out>, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #34 0x0000561719fcbf74 in bus_reset_child_foreach (obj=<optimized out>, cb=0x561719fcf6e0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #35 0x0000561719fcf758 in resettable_child_foreach (rc=0x56171bec28b0, type=RESET_TYPE_COLD, opaque=0x0, cb=0x561719fcf6e0 <resettable_phase_hold>, obj=0x56171bf4bbe0) at ../hw/core/resettable.c:96 #36 resettable_phase_hold (obj=obj@entry=0x56171bf4bbe0, opaque=opaque@entry=0x0, type=type@entry=RESET_TYPE_COLD) at ../hw/core/resettable.c:173 #37 0x0000561719fcf939 in resettable_assert_reset (obj=0x56171bf4bbe0, type=<optimized out>) at ../hw/core/resettable.c:60 #38 0x0000561719fcfa05 in resettable_reset (obj=0x56171bf4bbe0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45 #39 0x0000561719fcf532 in qemu_devices_reset () at ../hw/core/reset.c:69 #40 0x0000561719e9890f in pc_machine_reset (machine=<optimized out>) at ../hw/i386/pc.c:1948 #41 0x0000561719f02271 in qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) at ../softmmu/runstate.c:443 #42 0x0000561719f02948 in main_loop_should_exit () at ../softmmu/runstate.c:688 #43 qemu_main_loop () at ../softmmu/runstate.c:722 #44 0x0000561719d365d2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 Test on qemu-kvm-6.2.0-2.el8.hreitz202201111001, don't hit this issue any more. Test Steps: 1.Create base.img and top.img #qemu-img create -f qcow2 base.img 2G #qemu-img create -f qcow2 top.img 2G 2.Write data to base.img #qemu-io -c 'write 0 2147483136' base.img 3.Start vm with accel=tcg /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem,accel=tcg \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 30720 \ -object memory-backend-ram,size=30720M,id=mem-machine_mem \ -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \ -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on \ -chardev socket,wait=off,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20220105-012140-9igI6qZZ,server=on \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,wait=off,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20220105-012140-9igI6qZZ,server=on \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idsULHUn \ -chardev socket,wait=off,id=chardev_serial0,path=/tmp/serial-serial0-20220105-012140-9igI6qZZ,server=on \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20220105-012140-9igI6qZZ,path=/tmp/seabios-20220105-012140-9igI6qZZ,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20220105-012140-9igI6qZZ,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:d5:a4:20:d5:09,id=idVy3YIg,netdev=idXjRF2L,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idXjRF2L,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \ -monitor stdio \ 4.Add throttle group { "execute" : "object-add", "arguments": { "qom-type" : "throttle-group","id": "thrgr","x-bps-total":16384}} 5.Add base node and top node {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'base','file':{'driver':'throttle','throttle-group':'thrgr','file':{'driver':'file','filename':'/home/base.img'}}}} {"return": {}} {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'top','file':{'driver':'blkdebug','inject-error':[{'event':'pwritev','immediately':true,'once':true}],'image':{'driver':'file','filename':'/home/top.img'}},'backing':'base'}} {"return": {}} 6.Add scsi device {'execute':'device_add','arguments':{'driver': 'virtio-scsi-pci','id':'vscsi','bus':'pcie-root-port-2'}} {"return": {}} {'execute':'device_add','arguments':{'driver': 'scsi-hd','bus':'vscsi.0','drive':'top'}} {"return": {}} 7.Do stream {'execute':'block-stream','arguments':{'device':'top','job-id':'j1'}} {"timestamp": {"seconds": 1641980367, "microseconds": 374744}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}} {"timestamp": {"seconds": 1641980367, "microseconds": 374806}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}} {"return": {}} 8. During step7, reset vm {'execute':'system_reset'} After step8, Block job completed with "input/output error", then execute the system_reset cmdline. {"timestamp": {"seconds": 1642042558, "microseconds": 778476}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1642042558, "microseconds": 778526}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "j1"}} {"timestamp": {"seconds": 1642042558, "microseconds": 778587}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 2147483648, "offset": 9961472, "speed": 0, "type": "stream", "error": "Input/output error"}} {"timestamp": {"seconds": 1642042558, "microseconds": 778615}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}} {"timestamp": {"seconds": 1642042558, "microseconds": 778634}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}} "{"timestamp": {"seconds": 1642042558, "microseconds": 783202}, "event": "RESET", "data": {"guest": false, "reason": "host-qmp-system-reset"}} {"timestamp": {"seconds": 1642042558, "microseconds": 785442}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}} (In reply to Klaus Heinrich Kiwi from comment #9) > Thomas, I'm a bit confused in the next steps for the virt-storage-sst team > to assist. Is there anyone who we should tag-team to reproduce this on s390x? Sorry for the late reply - I've been on vacation during the last days. Anyway, seems like Hanna reproduced this on x86 as well, so I'm clearing the NeedInfo to myself. I'm also setting ITR to 8.6.0 now, since it sounds like there is a fix in the works and we will be available to provide it in this release, right? Hanna, Stefan, could you please adjust DTM as well? After deliberating for much too long, I think it’s indeed better if you do the backport, Stefan, just because it’ll be better if I do a downstream review of your fix rather than you reviewing your own patch. O:) (And so I’ll also leave the DTM to you.) QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. Test with qemu-kvm-6.2.0-6.module+el8.6.0+14165+5e5e76ac, the problem has been resolve. Test Steps: 1.Create base.img and top.img #qemu-img create -f qcow2 base.img 2G #qemu-img create -f qcow2 top.img 2G 2.Write data to base.img #qemu-io -c 'write 0 2147483136' base.img 3.Start vm with accel=tcg /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem,accel=tcg \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 30720 \ -object memory-backend-ram,size=30720M,id=mem-machine_mem \ -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \ -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on \ -chardev socket,wait=off,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20220105-012140-9igI6qZZ,server=on \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,wait=off,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20220105-012140-9igI6qZZ,server=on \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idsULHUn \ -chardev socket,wait=off,id=chardev_serial0,path=/tmp/serial-serial0-20220105-012140-9igI6qZZ,server=on \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20220105-012140-9igI6qZZ,path=/tmp/seabios-20220105-012140-9igI6qZZ,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20220105-012140-9igI6qZZ,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:d5:a4:20:d5:09,id=idVy3YIg,netdev=idXjRF2L,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idXjRF2L,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \ -monitor stdio \ 4.Add throttle group { "execute" : "object-add", "arguments": { "qom-type" : "throttle-group","id": "thrgr","x-bps-total":16384}} 5.Add base node and top node {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'base','file':{'driver':'throttle','throttle-group':'thrgr','file':{'driver':'file','filename':'/home/base.img'}}}} {"return": {}} {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'top','file':{'driver':'blkdebug','inject-error':[{'event':'pwritev','immediately':true,'once':true}],'image':{'driver':'file','filename':'/home/top.img'}},'backing':'base'}} {"return": {}} 6.Add scsi device {'execute':'device_add','arguments':{'driver': 'virtio-scsi-pci','id':'vscsi','bus':'pcie-root-port-2'}} {"return": {}} {'execute':'device_add','arguments':{'driver': 'scsi-hd','bus':'vscsi.0','drive':'top'}} {"return": {}} 7.Do stream {'execute':'block-stream','arguments':{'device':'top','job-id':'j1'}} {"timestamp": {"seconds": 1644393766, "microseconds": 506482}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}} {"timestamp": {"seconds": 1644393766, "microseconds": 506544}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}} {"return": {}} 8. During step7, reset vm {'execute':'system_reset'} After step8, block job completed with input/output error. {"timestamp": {"seconds": 1644393774, "microseconds": 411343}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1644393774, "microseconds": 411386}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "j1"}} {"timestamp": {"seconds": 1644393774, "microseconds": 411439}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 2147483648, "offset": 0, "speed": 0, "type": "stream", "error": "Input/output error"}} {"timestamp": {"seconds": 1644393774, "microseconds": 411466}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}} {"timestamp": {"seconds": 1644393774, "microseconds": 411486}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}} {"timestamp": {"seconds": 1644393774, "microseconds": 415731}, "event": "RESET", "data": {"guest": false, "reason": "host-qmp-system-reset"}} {"timestamp": {"seconds": 1644393774, "microseconds": 418562}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}} As comment 24 and comment 25, set bug's status to "VERIFIED". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1759 |