Bug 1869994 - qemu-kvm is crashing with error "virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed."
Summary: qemu-kvm is crashing with error "virtio_scsi_ctx_check: Assertion `blk_get_ai...
Keywords:
Status: CLOSED DUPLICATE of bug 1844343
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: All
OS: Linux
high
urgent
Target Milestone: rc
: 8.3
Assignee: Sergio Lopez
QA Contact: qing.wang
URL:
Whiteboard:
Depends On: 1812399 1888131
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-19 08:02 UTC by nijin ashok
Modified: 2020-10-28 01:35 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-21 05:57:00 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5341841 0 None None None 2020-08-24 07:35:52 UTC

Description nijin ashok 2020-08-19 08:02:10 UTC
Description of problem:

Created a loop to attach and detach a SCSI disk to the VM.

===
# a=1;while true;do let a=a+1;echo "a=$a";virsh  -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf detach-device centos_2 disk.xml; virsh  -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf attach-device centos_2 disk.xml;done

# cat disk.xml 

<disk type='block' device='disk' snapshot='no'>
<driver name='qemu' type='raw' cache='none'  io='threads'/>
      <source dev='/rhev/data-center/mnt/blockSD/16533a2b-532f-40e6-9321-fb3fad7b35aa/images/e24ea41a-ac7d-44b9-975c-31696d234c67/8a4421cf-b446-4e26-a9b9-10c96a4f13ad' index='3'>
        <seclabel model='dac' relabel='no'/>
</source>
      <target dev='sdb' bus='scsi'/>
      <serial>e24ea41a-ac7d-44b9-975c-31696d234c67</serial>
      <alias name='ua-e24ea41a-ac7d-44b9-975c-31696d234c67'/>
      <address type='drive' controller='0' bus='0' target='0' unit='9'/>
</disk>
===

The VM crashed with the error below at 15th attempt.

===
a=15
Device detached successfully

error: Failed to attach device from disk.xml
error: Unable to read from monitor: Connection reset by peer

===

Backtrace.

===
(gdb) bt
#0  0x00007f835b85570f in raise () from /lib64/libc.so.6
#1  0x00007f835b83fb25 in abort () from /lib64/libc.so.6
#2  0x00007f835b83f9f9 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f835b84dcc6 in __assert_fail () from /lib64/libc.so.6
#4  0x000055f494a78df8 in virtio_scsi_ctx_check (d=0x55f4977a6aa0, s=<optimized out>, s=<optimized out>)
    at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/hw/scsi/virtio-scsi.c:250
#5  virtio_scsi_ctx_check (s=0x55f498e96180, s=0x55f498e96180, d=0x55f4977a6aa0) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/hw/scsi/virtio-scsi.c:247
#6  virtio_scsi_handle_cmd_req_prepare (req=0x7f8344016d10, s=0x55f498e96180) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/hw/scsi/virtio-scsi.c:569
#7  virtio_scsi_handle_cmd_vq (s=s@entry=0x55f498e96180, vq=vq@entry=0x7f83500d1140) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/hw/scsi/virtio-scsi.c:612
#8  0x000055f494a79abe in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>, vq=0x7f83500d1140)
    at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/hw/scsi/virtio-scsi-dataplane.c:60
#9  0x000055f494a8764e in virtio_queue_notify_aio_vq (vq=<optimized out>) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/hw/virtio/virtio.c:2243
#10 0x000055f494d4bb12 in aio_dispatch_handlers (ctx=ctx@entry=0x55f4975a25e0) at util/aio-posix.c:429
#11 0x000055f494d4c717 in aio_poll (ctx=0x55f4975a25e0, blocking=blocking@entry=true) at util/aio-posix.c:731
#12 0x000055f494b25744 in iothread_run (opaque=0x55f49756d660) at iothread.c:75
#13 0x000055f494d4e734 in qemu_thread_start (args=0x55f49759f5f0) at util/qemu-thread-posix.c:519
#14 0x00007f835bbe82de in start_thread () from /lib64/libpthread.so.0
#15 0x00007f835b919e83 in clone () from /lib64/libc.so.6
====

The issue was observed in the customer's environment on Commvault agent VM in RHV where there will be lot of disk plug/unplug to the agent VM during the backup process of the VMs. The customer is having qemu-kvm-rhev-2.12.0-33.el7_7.4.x86_64, however I can consistently reproduce this issue on qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64 which contains fix of bug 1764120.


Version-Release number of selected component (if applicable):

qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64

#cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.2 (Ootpa)

# uname -r
4.18.0-193.14.3.el8_2.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Create and attach around 10+ disks to the VM.
2. Attach and detach any of the disks in a loop as above.


Actual results:

qemu-kvm is crashing with error "virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed."

Expected results:

The VM should not crash.

Additional info:

Comment 4 CongLi 2020-08-19 08:13:11 UTC
From the dump info, seems a dup to BZ1844343.

Comment 5 CongLi 2020-08-19 08:17:01 UTC
(In reply to CongLi from comment #4)
> From the dump info, seems a dup to BZ1844343.

Hi John,

Could developer please help check if it's the same issue as BZ1844343?

Thanks.

Comment 6 John Ferlan 2020-08-19 11:01:42 UTC
Sergio could you take a look.

The upstream series referenced in https://bugzilla.redhat.com/show_bug.cgi?id=1844343#c11 is ready for merge from https://bugzilla.redhat.com/show_bug.cgi?id=1812399#c12 into qemu-5.2. In bug 1844343 you note we may want to keep both bugs, but perhaps that isn't necessary. I've added Maxim as a CC here just so he's aware of this. 

To set expectations on backports into some RHEL-AV 8.2.z - could be rather tricky because the change in 1812399 is applied after a rather large/disruptive series addressing device instantiation for qemu-5.1.

Comment 7 CongLi 2020-08-24 02:10:02 UTC
(In reply to John Ferlan from comment #6)
> Sergio could you take a look.
> 
> The upstream series referenced in
> https://bugzilla.redhat.com/show_bug.cgi?id=1844343#c11 is ready for merge
> from https://bugzilla.redhat.com/show_bug.cgi?id=1812399#c12 into qemu-5.2.
> In bug 1844343 you note we may want to keep both bugs, but perhaps that
> isn't necessary. I've added Maxim as a CC here just so he's aware of this. 
> 
> To set expectations on backports into some RHEL-AV 8.2.z - could be rather
> tricky because the change in 1812399 is applied after a rather
> large/disruptive series addressing device instantiation for qemu-5.1.

Hi Sergio,

Could you please help confirm it which is a SEV 1 bug ?

Thanks.

Comment 8 Sergio Lopez 2020-08-24 07:28:01 UTC
Yes, this is the same issue as BZ1844343, which shares the root cause with BZ1812399, so we need Maxim's patch series to fix all three BZs.

As John pointed out, backporting this won't be easy. We'd either have to backport a very long series of dependencies, or aim for a dirty backport with lots of contextual changes.

Thanks,
Sergio.


Note You need to log in before you can comment on or make changes to this bug.