Bug 1854811
Summary: | scsi-bus.c: use-after-free due to race between device unplug and I/O operation causes guest crash | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Prasad Pandit <ppandit> | ||||
Component: | qemu-kvm | Assignee: | Maxim Levitsky <mlevitsk> | ||||
qemu-kvm sub component: | virtio-blk,scsi | QA Contact: | qing.wang <qinwang> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | unspecified | CC: | coli, hhan, jinzhao, juzhang, mlevitsk, qinwang, virt-maint | ||||
Version: | 8.2 | Keywords: | Triaged | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.3 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-5.2.0-6.module+el8.4.0+9871+53903be9 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-05-25 06:42:26 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Prasad Pandit
2020-07-08 09:41:16 UTC
Proposed fix patch from Paolo Bonzini | I think this is simpler than the issue that Maxim is working on. | Wenxiang, would this fix your PoC? | | diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c | index 1c980cab38..1b0cf91532 100644 | --- a/hw/scsi/scsi-bus.c | +++ b/hw/scsi/scsi-bus.c | @@ -137,6 +137,7 @@ static void scsi_dma_restart_bh(void *opaque) | scsi_req_unref(req); | } | aio_context_release(blk_get_aio_context(s->conf.blk)); | + object_unref(OBJECT(s)); | } | | void scsi_req_retry(SCSIRequest *req) | @@ -155,6 +156,8 @@ static void scsi_dma_restart_cb(void *opaque, int | running, RunState state) | } | if (!s->bh) { | AioContext *ctx = blk_get_aio_context(s->conf.blk); | + /* The reference is dropped in scsi_dma_restart_bh. */ | + object_ref(OBJECT(s)); | s->bh = aio_bh_new(ctx, scsi_dma_restart_bh, s); | qemu_bh_schedule(s->bh); | } | | Thanks, | Paolo Hi, could you please take a view of customer how to reproduce this issue? like as using qemu command line to create vm, and what to do may hit this issue. Hello Qing, (In reply to qing.wang from comment #2) > Hi, could you please take a view of customer how to reproduce this issue? > like as using qemu command line to create vm, and what to do may hit this > issue. The attachment here contains 3 files - d.xml, disk1.xml and poc.sh d.xml has the guest configuration and command line parameters in it. Guest starts with $ virsh create --console d.xml We need to edit d.xml and disk1.xml to set local guest image and qemu paths as described above. Hope it helps. Thank you. I tried to reproduce this with latest qemu (which contains my and Paulo's scsi/rcu work), and I wasn't able yet to hit this, however I do think that at least in theory the race is still there. For the use after free to happen this sequence of events should still be possible in theory: 1. vm continue event schedules the scsi_dma_restart_bh (this has to happen before the scsi device is unrealized because first thing scsi_qdev_unrealize does is to remove the VM state change callback which schedules the scsi_dma_restart_bh) 2. scsi device is unrealized, dropped off the bus and scheduled to be removed by the RCU callback 3. rcu thread callback frees the scsi device. 4. for some reason only now the bottom half is run. I'll send the Paulo's patch upstream to discuss it there. Best regards, Maxim Levitsky Resolved by qemu-kvm upstream commit cfd4e36352d4426221aa94da44a172da1aaa741b Setting ITM=13 under the assumption Maxim will be able to post the downstream patch soon We will need a qa_ack+ please too. Feel free to alter the ITM I chose to a later value. Yep. Test on Red Hat Enterprise Linux release 8.4 Beta (Ootpa) 4.18.0-287.el8.x86_64 qemu-kvm-common-5.2.0-6.module+el8.4.0+9871+53903be9.x86_64 Test steps refer to https://bugzilla.redhat.com/show_bug.cgi?id=1812399#c27 Scenario 1: 1.boot vm virsh define pc.xml;virsh start pc 2.hotplug-unplug disk repeatly while true;do virsh attach-device pc disk.xml; virsh detach-device pc disk.xml;done Running over 10 hour , no crash issue found. Scenario 2: 1. create 40 image files qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg0.qcow2 1G ... qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg40.qcow2 1G 2.boot vm /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pc \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pci.0,addr=0x2,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pci.0,addr=0x3 \ -m 2048 \ -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 \ -device pcie-root-port,id=pcie-root-port-1,bus=pci.0,chassis=2 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,bus=pci.0,chassis=3 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -object iothread,id=iothread1 \ -device virtio-scsi,id=scsi0 \ -device virtio-scsi,id=scsi1,iothread=iothread1 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel831-64-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ \ -blockdev node-name=test_disk0,driver=file,filename=/home/kvm_autotest_root/images/stg0.qcow2 \ -device scsi-hd,drive=test_disk0,bus=scsi1.0,bootindex=-1,id=scsi_disk0,channel=0,scsi-id=0,channel=0,scsi-id=0,lun=0,share-rw \ -blockdev node-name=test_disk1,driver=file,filename=/home/kvm_autotest_root/images/stg1.qcow2 \ -blockdev node-name=test_disk2,driver=file,filename=/home/kvm_autotest_root/images/stg2.qcow2 \ -blockdev node-name=test_disk3,driver=file,filename=/home/kvm_autotest_root/images/stg3.qcow2 \ -blockdev node-name=test_disk4,driver=file,filename=/home/kvm_autotest_root/images/stg4.qcow2 \ -blockdev node-name=test_disk5,driver=file,filename=/home/kvm_autotest_root/images/stg5.qcow2 \ -blockdev node-name=test_disk6,driver=file,filename=/home/kvm_autotest_root/images/stg6.qcow2 \ -blockdev node-name=test_disk7,driver=file,filename=/home/kvm_autotest_root/images/stg7.qcow2 \ -blockdev node-name=test_disk8,driver=file,filename=/home/kvm_autotest_root/images/stg8.qcow2 \ -blockdev node-name=test_disk9,driver=file,filename=/home/kvm_autotest_root/images/stg9.qcow2 \ -blockdev node-name=test_disk10,driver=file,filename=/home/kvm_autotest_root/images/stg10.qcow2 \ -blockdev node-name=test_disk11,driver=file,filename=/home/kvm_autotest_root/images/stg11.qcow2 \ -blockdev node-name=test_disk12,driver=file,filename=/home/kvm_autotest_root/images/stg12.qcow2 \ -blockdev node-name=test_disk13,driver=file,filename=/home/kvm_autotest_root/images/stg13.qcow2 \ -blockdev node-name=test_disk14,driver=file,filename=/home/kvm_autotest_root/images/stg14.qcow2 \ -blockdev node-name=test_disk15,driver=file,filename=/home/kvm_autotest_root/images/stg15.qcow2 \ -blockdev node-name=test_disk16,driver=file,filename=/home/kvm_autotest_root/images/stg16.qcow2 \ -blockdev node-name=test_disk17,driver=file,filename=/home/kvm_autotest_root/images/stg17.qcow2 \ -blockdev node-name=test_disk18,driver=file,filename=/home/kvm_autotest_root/images/stg18.qcow2 \ -blockdev node-name=test_disk19,driver=file,filename=/home/kvm_autotest_root/images/stg19.qcow2 \ -blockdev node-name=test_disk20,driver=file,filename=/home/kvm_autotest_root/images/stg20.qcow2 \ -blockdev node-name=test_disk21,driver=file,filename=/home/kvm_autotest_root/images/stg21.qcow2 \ -blockdev node-name=test_disk22,driver=file,filename=/home/kvm_autotest_root/images/stg22.qcow2 \ -blockdev node-name=test_disk23,driver=file,filename=/home/kvm_autotest_root/images/stg23.qcow2 \ -blockdev node-name=test_disk24,driver=file,filename=/home/kvm_autotest_root/images/stg24.qcow2 \ -blockdev node-name=test_disk25,driver=file,filename=/home/kvm_autotest_root/images/stg25.qcow2 \ -blockdev node-name=test_disk26,driver=file,filename=/home/kvm_autotest_root/images/stg26.qcow2 \ -blockdev node-name=test_disk27,driver=file,filename=/home/kvm_autotest_root/images/stg27.qcow2 \ -blockdev node-name=test_disk28,driver=file,filename=/home/kvm_autotest_root/images/stg28.qcow2 \ -blockdev node-name=test_disk29,driver=file,filename=/home/kvm_autotest_root/images/stg29.qcow2 \ -blockdev node-name=test_disk30,driver=file,filename=/home/kvm_autotest_root/images/stg30.qcow2 \ -blockdev node-name=test_disk31,driver=file,filename=/home/kvm_autotest_root/images/stg31.qcow2 \ -blockdev node-name=test_disk32,driver=file,filename=/home/kvm_autotest_root/images/stg32.qcow2 \ -blockdev node-name=test_disk33,driver=file,filename=/home/kvm_autotest_root/images/stg33.qcow2 \ -blockdev node-name=test_disk34,driver=file,filename=/home/kvm_autotest_root/images/stg34.qcow2 \ -blockdev node-name=test_disk35,driver=file,filename=/home/kvm_autotest_root/images/stg35.qcow2 \ -blockdev node-name=test_disk36,driver=file,filename=/home/kvm_autotest_root/images/stg36.qcow2 \ -blockdev node-name=test_disk37,driver=file,filename=/home/kvm_autotest_root/images/stg37.qcow2 \ -blockdev node-name=test_disk38,driver=file,filename=/home/kvm_autotest_root/images/stg38.qcow2 \ -blockdev node-name=test_disk39,driver=file,filename=/home/kvm_autotest_root/images/stg39.qcow2 \ -blockdev node-name=test_disk40,driver=file,filename=/home/kvm_autotest_root/images/stg40.qcow2 \ \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,bus=pci.0,chassis=4 \ -device virtio-net-pci,mac=9a:21:f7:4a:1e:bd,id=idRuZxfv,netdev=idOpPVAe,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idOpPVAe,vhost=on \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -vnc :5 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,bus=pci.0 \ -monitor stdio \ -chardev file,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpdbg.log,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -qmp tcp:0:5955,server,nowait \ -chardev file,path=/var/tmp/monitor-serialdbg.log,id=serial_id_serial0 \ -device isa-serial,chardev=serial_id_serial0 \ 3.login guest and execute sg_luns with multi instances trap 'kill $(jobs -p)' EXIT SIGINT for i in `seq 0 32` ; do while true ; do # sg_luns /dev/sdb > /dev/null 2>&1 sg_luns /dev/sdb done & done echo "wait" wait 4.hotplug-unlug multi disks repeatly on each 3 seconds NUM_LUNS=40 add_devices() { exec 3<>/dev/tcp/localhost/5955 echo "$@" echo -e "{'execute':'qmp_capabilities'}" >&3 read response <&3 echo $response for i in $(seq 1 $NUM_LUNS) ; do cmd="{'execute':'device_add', 'arguments': {'driver':'scsi-hd','drive':'test_disk$i','id':'scsi_disk$i','bus':'scsi1.0','lun':$i}}" echo "$cmd" echo -e "$cmd" >&3 read response <&3 echo "$response" done } remove_devices() { exec 3<>/dev/tcp/localhost/5955 echo "$@" echo -e "{'execute':'qmp_capabilities'}" >&3 read response <&3 echo $response for i in $(seq 1 $NUM_LUNS) ; do cmd="{'execute':'device_del', 'arguments': {'id':'scsi_disk$i'}}" echo "$cmd" echo -e "$cmd" >&3 read response <&3 echo "$response" done } while true ; do echo "adding devices" add_devices sleep 3 echo "removing devices" remove_devices sleep 3 done running over 10 hour, no crash issue found. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098 |