Bug 1043069 - CVE-2013-6458 libvirtd crashes when swapping disks in qemu guest multiple times - qemuMonitorJSONGetBlockStatsInfo segfault [rhel-6.6]
CVE-2013-6458 libvirtd crashes when swapping disks in qemu guest multiple tim...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.4
x86_64 Linux
unspecified Severity unspecified
: rc
: ---
Assigned To: Jiri Denemark
Virtualization Bugs
: Security, SecurityTracking, Upstream
Depends On:
Blocks: CVE-2013-6458 1113828
  Show dependency treegraph
 
Reported: 2013-12-13 16:49 EST by Alexandre M
Modified: 2014-10-14 00:19 EDT (History)
11 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-30.el6
Doc Type: Release Note
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-10-14 00:19:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
contains 4 files (7.02 KB, application/x-gzip)
2013-12-13 16:49 EST, Alexandre M
no flags Details
linux_qemu.log (8.69 KB, text/plain)
2013-12-16 04:56 EST, Jiri Denemark
no flags Details
linux_vm_gdb.txt (11.22 KB, text/plain)
2013-12-16 04:58 EST, Jiri Denemark
no flags Details
win7_mv_gdb.txt (13.76 KB, text/plain)
2013-12-16 04:58 EST, Jiri Denemark
no flags Details
win7_qemu.log (5.19 KB, text/plain)
2013-12-16 04:59 EST, Jiri Denemark
no flags Details
libvirtd.log file (102.42 KB, text/plain)
2013-12-16 15:21 EST, Alexandre M
no flags Details
Python scripts for reproducing bug (10.00 KB, application/x-tar)
2014-02-11 22:08 EST, Hu Jianwei
no flags Details

  None (edit)
Description Alexandre M 2013-12-13 16:49:49 EST
Created attachment 836510 [details]
contains 4 files

Description of problem:

We are facing a common issue here where libvirtd constantly crashes after attaching and detaching multiple times a disk on a qemu guest (windows 7 or linux).

The problem arises in our production OpenStack cluster. We have scripts in place to do continuous tests for EBS attachments of running instances.

The problem seems to happen qemuDomainBlockStats() of qemu/qemu_driver.c where the disk parameter (dev_name) to qemuMonitorJSONGetBlockStatsInfo becomes NULL.

I will let you take a deeper look at the segfault information for analysis.

I have attached 2 separate GDB log session with backtraces after the segfault. One is for a windows guest and the other for linux.

I have also attached the 2 qemu logs of both VMs.

Version-Release number of selected component (if applicable):

# cat /etc/redhat-release
CentOS release 6.4 (Final)
# uname -a
Linux msr-ostck-cmp39.xxx.xxx 2.6.32-358.6.2.el6.x86_64 #1 SMP Thu May 16 20:59:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linu

# rpm -qa | grep libvirt
libvirt-0.10.2-29.el6.1.x86_64
libvirt-debuginfo-0.10.2-29.el6.1.x86_64
libvirt-client-0.10.2-29.el6.1.x86_64
libvirt-python-0.10.2-29.el6.1.x86_64
libvirt-devel-0.10.2-29.el6.1.x86_64


How reproducible:

Attach and detach in a loop a EBS disk on an openstack instance. Eventually libvirtd will crash with a segfault on the disk operation.


QEMU DOMAIN INFO:

<domain type='kvm' id='3'>
  <name>instance-msr-0002eb77</name>
  <uuid>40ad52d1-22be-4359-8bfd-5cb3ea3c9b23</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Red Hat Inc.</entry>
      <entry name='product'>OpenStack Nova</entry>
      <entry name='version'>2013.1.2-4.el6</entry>
      <entry name='serial'>c14b272b-34a4-ea25-1e75-5f60847ca93b</entry>
      <entry name='uuid'>40ad52d1-22be-4359-8bfd-5cb3ea3c9b23</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='rhel6.4.0'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/40ad52d1-22be-4359-8bfd-5cb3ea3c9b23/disk'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-path/ip-192.168.0.11:3260-iscsi-iqn.2010-10.org.openstack:volume-8d2304a2-a67c-477e-9c26-9c4587a8bebf-lun-1'/>
      <target dev='vdxdgapd' bus='virtio'/>
      <serial>8d2304a2-a67c-477e-9c26-9c4587a8bebf</serial>
      <alias name='virtio-disk287105055'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='fa:16:3e:05:78:6d'/>
      <source bridge='br100'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='nova-instance-instance-msr-0002eb77-fa163e05786d'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='file'>
      <source path='/var/lib/nova/instances/40ad52d1-22be-4359-8bfd-5cb3ea3c9b23/console.log'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target port='1'/>
      <alias name='serial1'/>
    </serial>
    <console type='file'>
      <source path='/var/lib/nova/instances/40ad52d1-22be-4359-8bfd-5cb3ea3c9b23/console.log'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>unconfined_u:system_r:svirt_t:s0:c650,c921</label>
    <imagelabel>unconfined_u:object_r:svirt_image_t:s0:c650,c921</imagelabel>
  </seclabel>
</domain>
Comment 1 Jiri Denemark 2013-12-16 04:56:09 EST
Created attachment 837179 [details]
linux_qemu.log
Comment 2 Jiri Denemark 2013-12-16 04:58:32 EST
Created attachment 837180 [details]
linux_vm_gdb.txt
Comment 3 Jiri Denemark 2013-12-16 04:58:52 EST
Created attachment 837181 [details]
win7_mv_gdb.txt
Comment 4 Jiri Denemark 2013-12-16 04:59:08 EST
Created attachment 837182 [details]
win7_qemu.log
Comment 5 Jiri Denemark 2013-12-16 05:37:16 EST
Since you can reproduce this bug, could you turn on debug logs for libvirtd (see for details http://wiki.libvirt.org/page/DebugLogs), try again, and attach the logs?
Comment 6 Alexandre M 2013-12-16 09:51:54 EST
Hi Jiri,
The log level was already at debug with this line in libvirtd.conf:

log_level = 1

However, the log_filters was left with default values.

Is there anything missing? If the filters make a difference, I will change the options and launch the test again.
Comment 7 Jiri Denemark 2013-12-16 10:04:36 EST
Well, the logs are missing :-) We need /var/log/libvirt/libvirtd.log (or wherever you told libvirt to store debug logs), the qemu logs only contain error output of qemu process and libvirt logs generated between fork and exec).
Comment 8 Alexandre M 2013-12-16 10:56:46 EST
You are totally correct! Missed that one ;)

The problem is, it's very big. We had an automatic loop test which ran for a few hours, so the log file has become extremely long. Fortunately, we had a dedicated machine for the test and only one VM was running with libvirtd.

I'm trying to slim it down to keep only the meaningful information when the test started, but it's still around 4GB at the moment.

Where should I put this file?
Comment 9 Alexandre M 2013-12-16 15:21:03 EST
Created attachment 837398 [details]
libvirtd.log file

Contains debug logs of libvirtd with segfault just before crash
Comment 10 Alexandre M 2013-12-16 15:22:03 EST
Comment on attachment 837398 [details]
libvirtd.log file

I extracted just the important messages before crash.
Comment 11 Jiri Denemark 2013-12-17 17:16:46 EST
The log file covers too small period of time. Could you try removing all lines that contain virEvent.* from the huge debug log file and xz compress it? I believe the result should be pretty small even if the original log file was huge.
Comment 12 Alexandre M 2013-12-17 17:45:23 EST
Hi Jiri,
you can find the cleansed logs at:

https://gist.github.com/alexandrem/8013892/download

I removed all event lines starting with virtEvent*

The file weights 50MB, but 845MB when uncompressed.

This should cover most of the event logs during that day when we launched the test.
Comment 13 Jiri Denemark 2013-12-19 08:59:10 EST
Perfect, thanks for the logs. I found the bug and I'm working on a patch.
Comment 14 Eric Blake 2013-12-20 17:01:44 EST
This bug needs a CVE assigned.  A read-only client should not be able to crash libvirtd.
Comment 15 Jiri Denemark 2014-01-07 10:21:11 EST
This is now fixed upstream by v1.2.0-232-gdb86da5:

commit db86da5ca2109e4006c286a09b6c75bfe10676ad
Author: Jiri Denemark <jdenemar@redhat.com>
Date:   Thu Dec 19 22:10:04 2013 +0100

    qemu: Do not access stale data in virDomainBlockStats
    
    CVE-2013-6458
    https://bugzilla.redhat.com/show_bug.cgi?id=1043069
    
    When virDomainDetachDeviceFlags is called concurrently to
    virDomainBlockStats: libvirtd may crash because qemuDomainBlockStats
    finds a disk in vm->def before getting a job on a domain and uses the
    disk pointer after getting the job. However, the domain in unlocked
    while waiting on a job condition and thus data behind the disk pointer
    may disappear. This happens when thread 1 runs
    virDomainDetachDeviceFlags and enters monitor to actually remove the
    disk. Then another thread starts running virDomainBlockStats, finds the
    disk in vm->def, and while it's waiting on the job condition (owned by
    the first thread), the first thread finishes the disk removal. When the
    second thread gets the job, the memory pointed to be the disk pointer is
    already gone.
    
    That said, every API that is going to begin a job should do that before
    fetching data from vm->def.

I found similar patterns in several other APIs and fixed them by the following commits: v1.2.0-233-gb799259, v1.2.0-234-gf93d2ca, v1.2.0-235-gff5f30b, v1.2.0-236-g3b56425.
Comment 16 Eric Blake 2014-01-15 15:05:03 EST
Of the five similar patches, virDomainBlockStats, virDomainGetBlockInfo, qemuDomainBlockJobImpl, and virDomainGetBlockIoTune were all present in upstream 0.10.2.  qemuDomainBlockCopy is not a vulnerability like the other four (since it is the only one of the five that can't be executed on a read-only connection), and while it is not in upstream 0.10.2, it was backported into RHEL 6.3.
Comment 17 Alexandre M 2014-01-15 16:37:02 EST
Hi,
just wanted to let you know that we've run our attachment test script several times with the latest libvirtd trunk version and this fix has indeed corrected the crash we had been encountering.

Thanks a lot.
Comment 19 Hu Jianwei 2014-01-30 02:43:04 EST
(In reply to Jiri Denemark from comment #15)
> This is now fixed upstream by v1.2.0-232-gdb86da5:
> 
> commit db86da5ca2109e4006c286a09b6c75bfe10676ad
> Author: Jiri Denemark <jdenemar@redhat.com>
> Date:   Thu Dec 19 22:10:04 2013 +0100
> 
>     qemu: Do not access stale data in virDomainBlockStats
>     
>     CVE-2013-6458
>     https://bugzilla.redhat.com/show_bug.cgi?id=1043069
>     
>     When virDomainDetachDeviceFlags is called concurrently to
>     virDomainBlockStats: libvirtd may crash because qemuDomainBlockStats
>     finds a disk in vm->def before getting a job on a domain and uses the
>     disk pointer after getting the job. However, the domain in unlocked
>     while waiting on a job condition and thus data behind the disk pointer
>     may disappear. This happens when thread 1 runs
>     virDomainDetachDeviceFlags and enters monitor to actually remove the
>     disk. Then another thread starts running virDomainBlockStats, finds the
>     disk in vm->def, and while it's waiting on the job condition (owned by
>     the first thread), the first thread finishes the disk removal. When the
>     second thread gets the job, the memory pointed to be the disk pointer is
>     already gone.
>     
>     That said, every API that is going to begin a job should do that before
>     fetching data from vm->def.
> 
> I found similar patterns in several other APIs and fixed them by the
> following commits: v1.2.0-233-gb799259, v1.2.0-234-gf93d2ca,
> v1.2.0-235-gff5f30b, v1.2.0-236-g3b56425.
I tried to reproduce it this month, but could not. I used virsh detach-device with --config and virsh domblkstat in two console at the same time repeatedly. 

Is there any method by using combined virsh commands to reproduce it?

Thanks.
Comment 20 Jiri Denemark 2014-01-30 03:38:46 EST
(In reply to Hu Jianwei from comment #19)
> I tried to reproduce it this month, but could not. I used virsh
> detach-device with --config and virsh domblkstat in two console at the same
> time repeatedly. 

I wonder where you got the --config from but it's wrong. You need to hot-unplug the device from a running domain so either no option or --live (but definitely not --config). Anyway, you can check bug 1054804 for a python reproducer which seems to hit the bug more reliably.
Comment 21 Eric Blake 2014-01-30 07:41:28 EST
(In reply to Jiri Denemark from comment #20)
> (In reply to Hu Jianwei from comment #19)
> > I tried to reproduce it this month, but could not. I used virsh
> > detach-device with --config and virsh domblkstat in two console at the same
> > time repeatedly. 
> 
> I wonder where you got the --config from but it's wrong. You need to
> hot-unplug the device from a running domain so either no option or --live
> (but definitely not --config). Anyway, you can check bug 1054804 for a
> python reproducer which seems to hit the bug more reliably.

Furthermore, the bug is only observable if one thread is hotplugging/hot-unplugging disks while the other thread attempts one of the vulnerable APIs.  In other words, to fully reproduce it, you need one thread that is repeatedly cycling a disk in and back out of a running guest, while the other thread is repeatedly trying the problematic API.  The problem that the patches fixed is a use-after-free, so running under valgrind may show the bug more reliably than waiting for a libvirtd crash; or it may slow things down to the point that you don't hit the race as frequently.  The only reliable way to reproduce the problem is to insert strategic sleep() into the hotplug code and recompile.
Comment 22 Hu Jianwei 2014-02-11 22:06:11 EST
Thanks for Jiri and Eric comments, it's easy to reproduce it with below two attached python scripts on libvirt-0.10.2-29.el6.1.x86_64 version.
Comment 23 Hu Jianwei 2014-02-11 22:08:55 EST
Created attachment 862098 [details]
Python scripts for reproducing bug
Comment 25 Hu Jianwei 2014-04-11 03:38:28 EDT
I can reproduce it on libvirt-0.10.2-29.el6, can not reproduce it on below version:

libvirt-0.10.2-31.el6.x86_64
kernel-2.6.32-456.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.423.el6.x86_64

Steps:
1.Download python scripts from attachments and generate disk.xml and its image 
[root@intel-5205-32-2 1043069]# cat disk.xml 
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/libvirt/images/disk1.img'/>
<target dev='vdb' bus='virtio'/>
</disk>
[root@intel-5205-32-2 1043069]# ll /var/lib/libvirt/images/disk1.img
-rw-r--r--. 1 qemu qemu 10485760 Apr 11 13:36 /var/lib/libvirt/images/disk1.img

2.Execute attached python scripts in two different terminals at the same time, running around 1 hour, no crash occurred.

The first terminal:
[root@intel-5205-32-2 1043069]# time ./attach_detach.py
attach disk return: 0
detach disk return: 0
attach disk return: 0
detach disk return: 0
...(clipped)
detach disk return: 0
attach disk return: 0
^Cattach/detach disk fail

real	77m8.458s
user	0m1.613s
sys	0m0.497s

The second terminal:
[root@intel-5205-32-2 1043069]# time ./domstate.py 
(41L, 167936L, 0L, 0L, -1L)
...(clipped)
(41L, 167936L, 0L, 0L, -1L)
libvirt: QEMU Driver error : invalid argument: invalid path: /var/lib/libvirt/images/disk1.img
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(2L, 8192L, 0L, 0L, -1L)
(2L, 8192L, 0L, 0L, -1L)
(2L, 8192L, 0L, 0L, -1L)
(9L, 36864L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(22L, 90112L, 0L, 0L, -1L)
(22L, 90112L, 0L, 0L, -1L)
(22L, 90112L, 0L, 0L, -1L)
(34L, 139264L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
...(clipped)
(41L, 167936L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
^CTraceback (most recent call last):
  File "./domstate.py", line 21, in <module>
    print dom.blockStats(path)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1837, in blockStats
    ret = libvirtmod.virDomainBlockStats(self._o, path)
KeyboardInterrupt

real	77m7.032s
user	1m36.901s
sys	0m25.276s

[root@intel-5205-32-2 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     r6                             running
[root@intel-5205-32-2 1043069]# service libvirtd status
libvirtd (pid  11167) is running...

We can get the expected results, changed to Verified.
Comment 27 errata-xmlrpc 2014-10-14 00:19:13 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1374.html

Note You need to log in before you can comment on or make changes to this bug.