Bug 1650975

Summary: hw/scsi/scsi-bus.c:1374: scsi_req_complete: Assertion `req->status == -1' failed.
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: amit, berrange, cfergeau, crobinso, dwmw2, itamar, pbonzini, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1665903 (view as bug list) Environment:
Last Closed: 2019-03-07 07:52:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910269, 1665903    
Attachments:
Description Flags
build.log none

Description Richard W.M. Jones 2018-11-18 11:37:05 UTC
Description of problem:

qemu in Rawhide fails when I test injecting EIO errors into requests
using nbdkit:

nbdkit: memory[1]: debug: error: pread count=1024 offset=102400 flags=0x0
nbdkit: memory[1]: error: injecting EIO error into pread
nbdkit: memory[1]: debug: sending error reply: Input/output error
qemu-system-x86_64: /builddir/build/BUILD/qemu-3.1.0-rc1/hw/scsi/scsi-bus.c:1374: scsi_req_complete: Assertion `req->status == -1' failed.

Version-Release number of selected component (if applicable):

qemu 2:3.1.0-0.1.rc1.fc30

How reproducible:

Unknown.

Steps to Reproduce:
1. Unknown, I'll try to come up with a reproducer if I can make one work locally.

Comment 1 Richard W.M. Jones 2018-11-18 11:37:52 UTC
Created attachment 1506918 [details]
build.log

build.log from Koji showing the failure

Comment 2 Richard W.M. Jones 2018-11-18 11:53:50 UTC
Actually yes this is easily reproducible with qemu from git.

(1) nbdkit -f -v --filter=error memory size=64M error-rate=100%

(2) x86_64-softmmu/qemu-system-x86_64 -device virtio-scsi,id=scsi -drive file=nbd:localhost:10809,format=raw,id=hd0,if=none -device scsi-hd,drive=hd0

qemu-system-x86_64: hw/scsi/scsi-bus.c:1374: scsi_req_complete: Assertion `req->status == -1' failed.
Aborted (core dumped)

Stack trace:

(gdb) bt
#0  0x00007f7f18d4253f in raise () at /lib64/libc.so.6
#1  0x00007f7f18d2c895 in abort () at /lib64/libc.so.6
#2  0x00007f7f18d2c769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007f7f18d3a9f6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x000055ce0f920fb0 in scsi_req_complete (req=<optimized out>, status=<optimized out>) at hw/scsi/scsi-bus.c:1374
#5  0x000055ce0f91b850 in scsi_dma_complete_noio (r=0x55ce116ea090, ret=<optimized out>) at hw/scsi/scsi-disk.c:281
#6  0x000055ce0f91b8ff in scsi_dma_complete (opaque=0x55ce116ea090, ret=-5)
    at hw/scsi/scsi-disk.c:302
#7  0x000055ce0f8103c7 in dma_complete (ret=-5, dbs=0x55ce11d36c00)
    at dma-helpers.c:116
#8  0x000055ce0f8103c7 in dma_blk_cb (opaque=0x55ce11d36c00, ret=-5)
    at dma-helpers.c:138
#9  0x000055ce0fa42cce in blk_aio_complete (acb=0x55ce10d36300)
    at block/block-backend.c:1345
#10 0x000055ce0fafce6b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
#11 0x00007f7f18d58200 in __start_context () at /lib64/libc.so.6
#12 0x00007fff50b87130 in  ()
#13 0x0000000000000000 in  ()

Comment 3 Richard W.M. Jones 2018-11-18 15:11:57 UTC
40dce4ee61c68395f6d463fae792f61b7c003bce is the first bad commit
commit 40dce4ee61c68395f6d463fae792f61b7c003bce
Author: Paolo Bonzini <pbonzini>
Date:   Sat Oct 13 11:52:34 2018 +0200

    scsi-disk: fix rerror/werror=ignore
    
    rerror=ignore was returning true from scsi_handle_rw_error but the callers were not
    calling scsi_req_complete when rerror=ignore returns true (this is the correct thing
    to do when true is returned after executing a passthrough command).  Fix this by
    calling it in scsi_handle_rw_error.
    
    Signed-off-by: Paolo Bonzini <pbonzini>

:040000 040000 311386b9b91d77840a849459ab6ae41a37fd7f42 8adcda67d7487bcc18966f096c9923da3b8dc0b9 M	hw

Comment 4 Cole Robinson 2018-11-19 15:00:22 UTC
Rich reported this upstream: https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg03508.html

Comment 5 Richard W.M. Jones 2018-11-20 22:16:43 UTC
Reported upstream:
https://bugs.launchpad.net/qemu/+bug/1804323

Comment 6 Richard W.M. Jones 2019-03-07 07:52:31 UTC
This was fixed in 3.1.0-rc3.  Since 3.1.0 (final) was released a few
months ago and is present in Fedora 30 and Rawhide I'm going to close this
bug now.