1043069 – CVE-2013-6458 libvirtd crashes when swapping disks in qemu guest multiple times - qemuMonitorJSONGetBlockStatsInfo segfault [rhel-6.6]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1043069 - CVE-2013-6458 libvirtd crashes when swapping disks in qemu guest multiple times - qemuMonitorJSONGetBlockStatsInfo segfault [rhel-6.6]

Summary: CVE-2013-6458 libvirtd crashes when swapping disks in qemu guest multiple tim...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Jiri Denemark
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	CVE-2013-6458 1113828
TreeView+	depends on / blocked

Reported:	2013-12-13 21:49 UTC by Alexandre M
Modified:	2014-10-14 04:19 UTC (History)
CC List:	11 users (show)
Fixed In Version:	libvirt-0.10.2-30.el6
Doc Type:	Release Note
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-10-14 04:19:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
contains 4 files (7.02 KB, application/x-gzip) 2013-12-13 21:49 UTC, Alexandre M	no flags	Details
linux_qemu.log (8.69 KB, text/plain) 2013-12-16 09:56 UTC, Jiri Denemark	no flags	Details
linux_vm_gdb.txt (11.22 KB, text/plain) 2013-12-16 09:58 UTC, Jiri Denemark	no flags	Details
win7_mv_gdb.txt (13.76 KB, text/plain) 2013-12-16 09:58 UTC, Jiri Denemark	no flags	Details
win7_qemu.log (5.19 KB, text/plain) 2013-12-16 09:59 UTC, Jiri Denemark	no flags	Details
libvirtd.log file (102.42 KB, text/plain) 2013-12-16 20:21 UTC, Alexandre M	no flags	Details
Python scripts for reproducing bug (10.00 KB, application/x-tar) 2014-02-12 03:08 UTC, Hu Jianwei	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1374	0	normal	SHIPPED_LIVE	libvirt bug fix and enhancement update	2014-10-14 08:11:54 UTC

Description Alexandre M 2013-12-13 21:49:49 UTC

Created attachment 836510 [details]
contains 4 files

Description of problem:

We are facing a common issue here where libvirtd constantly crashes after attaching and detaching multiple times a disk on a qemu guest (windows 7 or linux).

The problem arises in our production OpenStack cluster. We have scripts in place to do continuous tests for EBS attachments of running instances.

The problem seems to happen qemuDomainBlockStats() of qemu/qemu_driver.c where the disk parameter (dev_name) to qemuMonitorJSONGetBlockStatsInfo becomes NULL.

I will let you take a deeper look at the segfault information for analysis.

I have attached 2 separate GDB log session with backtraces after the segfault. One is for a windows guest and the other for linux.

I have also attached the 2 qemu logs of both VMs.

Version-Release number of selected component (if applicable):

# cat /etc/redhat-release
CentOS release 6.4 (Final)
# uname -a
Linux msr-ostck-cmp39.xxx.xxx 2.6.32-358.6.2.el6.x86_64 #1 SMP Thu May 16 20:59:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linu

# rpm -qa | grep libvirt
libvirt-0.10.2-29.el6.1.x86_64
libvirt-debuginfo-0.10.2-29.el6.1.x86_64
libvirt-client-0.10.2-29.el6.1.x86_64
libvirt-python-0.10.2-29.el6.1.x86_64
libvirt-devel-0.10.2-29.el6.1.x86_64


How reproducible:

Attach and detach in a loop a EBS disk on an openstack instance. Eventually libvirtd will crash with a segfault on the disk operation.


QEMU DOMAIN INFO:

<domain type='kvm' id='3'>
  <name>instance-msr-0002eb77</name>
  <uuid>40ad52d1-22be-4359-8bfd-5cb3ea3c9b23</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Red Hat Inc.</entry>
      <entry name='product'>OpenStack Nova</entry>
      <entry name='version'>2013.1.2-4.el6</entry>
      <entry name='serial'>c14b272b-34a4-ea25-1e75-5f60847ca93b</entry>
      <entry name='uuid'>40ad52d1-22be-4359-8bfd-5cb3ea3c9b23</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='rhel6.4.0'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/40ad52d1-22be-4359-8bfd-5cb3ea3c9b23/disk'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-path/ip-192.168.0.11:3260-iscsi-iqn.2010-10.org.openstack:volume-8d2304a2-a67c-477e-9c26-9c4587a8bebf-lun-1'/>
      <target dev='vdxdgapd' bus='virtio'/>
      <serial>8d2304a2-a67c-477e-9c26-9c4587a8bebf</serial>
      <alias name='virtio-disk287105055'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='fa:16:3e:05:78:6d'/>
      <source bridge='br100'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='nova-instance-instance-msr-0002eb77-fa163e05786d'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='file'>
      <source path='/var/lib/nova/instances/40ad52d1-22be-4359-8bfd-5cb3ea3c9b23/console.log'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target port='1'/>
      <alias name='serial1'/>
    </serial>
    <console type='file'>
      <source path='/var/lib/nova/instances/40ad52d1-22be-4359-8bfd-5cb3ea3c9b23/console.log'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>unconfined_u:system_r:svirt_t:s0:c650,c921</label>
    <imagelabel>unconfined_u:object_r:svirt_image_t:s0:c650,c921</imagelabel>
  </seclabel>
</domain>

Comment 1 Jiri Denemark 2013-12-16 09:56:09 UTC

Created attachment 837179 [details]
linux_qemu.log

Comment 2 Jiri Denemark 2013-12-16 09:58:32 UTC

Created attachment 837180 [details]
linux_vm_gdb.txt

Comment 3 Jiri Denemark 2013-12-16 09:58:52 UTC

Created attachment 837181 [details]
win7_mv_gdb.txt

Comment 4 Jiri Denemark 2013-12-16 09:59:08 UTC

Created attachment 837182 [details]
win7_qemu.log

Comment 5 Jiri Denemark 2013-12-16 10:37:16 UTC

Since you can reproduce this bug, could you turn on debug logs for libvirtd (see for details http://wiki.libvirt.org/page/DebugLogs), try again, and attach the logs?

Comment 6 Alexandre M 2013-12-16 14:51:54 UTC

Hi Jiri,
The log level was already at debug with this line in libvirtd.conf:

log_level = 1

However, the log_filters was left with default values.

Is there anything missing? If the filters make a difference, I will change the options and launch the test again.

Comment 7 Jiri Denemark 2013-12-16 15:04:36 UTC

Well, the logs are missing :-) We need /var/log/libvirt/libvirtd.log (or wherever you told libvirt to store debug logs), the qemu logs only contain error output of qemu process and libvirt logs generated between fork and exec).

Comment 8 Alexandre M 2013-12-16 15:56:46 UTC

You are totally correct! Missed that one ;)

The problem is, it's very big. We had an automatic loop test which ran for a few hours, so the log file has become extremely long. Fortunately, we had a dedicated machine for the test and only one VM was running with libvirtd.

I'm trying to slim it down to keep only the meaningful information when the test started, but it's still around 4GB at the moment.

Where should I put this file?

Comment 9 Alexandre M 2013-12-16 20:21:03 UTC

Created attachment 837398 [details]
libvirtd.log file

Contains debug logs of libvirtd with segfault just before crash

Comment 10 Alexandre M 2013-12-16 20:22:03 UTC

Comment on attachment 837398 [details]
libvirtd.log file

I extracted just the important messages before crash.

Comment 11 Jiri Denemark 2013-12-17 22:16:46 UTC

The log file covers too small period of time. Could you try removing all lines that contain virEvent.* from the huge debug log file and xz compress it? I believe the result should be pretty small even if the original log file was huge.

Comment 12 Alexandre M 2013-12-17 22:45:23 UTC

Hi Jiri,
you can find the cleansed logs at:

https://gist.github.com/alexandrem/8013892/download

I removed all event lines starting with virtEvent*

The file weights 50MB, but 845MB when uncompressed.

This should cover most of the event logs during that day when we launched the test.

Comment 13 Jiri Denemark 2013-12-19 13:59:10 UTC

Perfect, thanks for the logs. I found the bug and I'm working on a patch.

Comment 14 Eric Blake 2013-12-20 22:01:44 UTC

This bug needs a CVE assigned.  A read-only client should not be able to crash libvirtd.

Comment 15 Jiri Denemark 2014-01-07 15:21:11 UTC

This is now fixed upstream by v1.2.0-232-gdb86da5:

commit db86da5ca2109e4006c286a09b6c75bfe10676ad
Author: Jiri Denemark <jdenemar>
Date:   Thu Dec 19 22:10:04 2013 +0100

    qemu: Do not access stale data in virDomainBlockStats
    
    CVE-2013-6458
    https://bugzilla.redhat.com/show_bug.cgi?id=1043069
    
    When virDomainDetachDeviceFlags is called concurrently to
    virDomainBlockStats: libvirtd may crash because qemuDomainBlockStats
    finds a disk in vm->def before getting a job on a domain and uses the
    disk pointer after getting the job. However, the domain in unlocked
    while waiting on a job condition and thus data behind the disk pointer
    may disappear. This happens when thread 1 runs
    virDomainDetachDeviceFlags and enters monitor to actually remove the
    disk. Then another thread starts running virDomainBlockStats, finds the
    disk in vm->def, and while it's waiting on the job condition (owned by
    the first thread), the first thread finishes the disk removal. When the
    second thread gets the job, the memory pointed to be the disk pointer is
    already gone.
    
    That said, every API that is going to begin a job should do that before
    fetching data from vm->def.

I found similar patterns in several other APIs and fixed them by the following commits: v1.2.0-233-gb799259, v1.2.0-234-gf93d2ca, v1.2.0-235-gff5f30b, v1.2.0-236-g3b56425.

Comment 16 Eric Blake 2014-01-15 20:05:03 UTC

Of the five similar patches, virDomainBlockStats, virDomainGetBlockInfo, qemuDomainBlockJobImpl, and virDomainGetBlockIoTune were all present in upstream 0.10.2.  qemuDomainBlockCopy is not a vulnerability like the other four (since it is the only one of the five that can't be executed on a read-only connection), and while it is not in upstream 0.10.2, it was backported into RHEL 6.3.

Comment 17 Alexandre M 2014-01-15 21:37:02 UTC

Hi,
just wanted to let you know that we've run our attachment test script several times with the latest libvirtd trunk version and this fix has indeed corrected the crash we had been encountering.

Thanks a lot.

Comment 19 Hu Jianwei 2014-01-30 07:43:04 UTC

(In reply to Jiri Denemark from comment #15)
> This is now fixed upstream by v1.2.0-232-gdb86da5:
> 
> commit db86da5ca2109e4006c286a09b6c75bfe10676ad
> Author: Jiri Denemark <jdenemar>
> Date:   Thu Dec 19 22:10:04 2013 +0100
> 
>     qemu: Do not access stale data in virDomainBlockStats
>     
>     CVE-2013-6458
>     https://bugzilla.redhat.com/show_bug.cgi?id=1043069
>     
>     When virDomainDetachDeviceFlags is called concurrently to
>     virDomainBlockStats: libvirtd may crash because qemuDomainBlockStats
>     finds a disk in vm->def before getting a job on a domain and uses the
>     disk pointer after getting the job. However, the domain in unlocked
>     while waiting on a job condition and thus data behind the disk pointer
>     may disappear. This happens when thread 1 runs
>     virDomainDetachDeviceFlags and enters monitor to actually remove the
>     disk. Then another thread starts running virDomainBlockStats, finds the
>     disk in vm->def, and while it's waiting on the job condition (owned by
>     the first thread), the first thread finishes the disk removal. When the
>     second thread gets the job, the memory pointed to be the disk pointer is
>     already gone.
>     
>     That said, every API that is going to begin a job should do that before
>     fetching data from vm->def.
> 
> I found similar patterns in several other APIs and fixed them by the
> following commits: v1.2.0-233-gb799259, v1.2.0-234-gf93d2ca,
> v1.2.0-235-gff5f30b, v1.2.0-236-g3b56425.
I tried to reproduce it this month, but could not. I used virsh detach-device with --config and virsh domblkstat in two console at the same time repeatedly. 

Is there any method by using combined virsh commands to reproduce it?

Thanks.

Comment 20 Jiri Denemark 2014-01-30 08:38:46 UTC

(In reply to Hu Jianwei from comment #19)
> I tried to reproduce it this month, but could not. I used virsh
> detach-device with --config and virsh domblkstat in two console at the same
> time repeatedly. 

I wonder where you got the --config from but it's wrong. You need to hot-unplug the device from a running domain so either no option or --live (but definitely not --config). Anyway, you can check bug 1054804 for a python reproducer which seems to hit the bug more reliably.

Comment 21 Eric Blake 2014-01-30 12:41:28 UTC

(In reply to Jiri Denemark from comment #20)
> (In reply to Hu Jianwei from comment #19)
> > I tried to reproduce it this month, but could not. I used virsh
> > detach-device with --config and virsh domblkstat in two console at the same
> > time repeatedly. 
> 
> I wonder where you got the --config from but it's wrong. You need to
> hot-unplug the device from a running domain so either no option or --live
> (but definitely not --config). Anyway, you can check bug 1054804 for a
> python reproducer which seems to hit the bug more reliably.

Furthermore, the bug is only observable if one thread is hotplugging/hot-unplugging disks while the other thread attempts one of the vulnerable APIs.  In other words, to fully reproduce it, you need one thread that is repeatedly cycling a disk in and back out of a running guest, while the other thread is repeatedly trying the problematic API.  The problem that the patches fixed is a use-after-free, so running under valgrind may show the bug more reliably than waiting for a libvirtd crash; or it may slow things down to the point that you don't hit the race as frequently.  The only reliable way to reproduce the problem is to insert strategic sleep() into the hotplug code and recompile.

Comment 22 Hu Jianwei 2014-02-12 03:06:11 UTC

Thanks for Jiri and Eric comments, it's easy to reproduce it with below two attached python scripts on libvirt-0.10.2-29.el6.1.x86_64 version.

Comment 23 Hu Jianwei 2014-02-12 03:08:55 UTC

Created attachment 862098 [details]
Python scripts for reproducing bug

Comment 25 Hu Jianwei 2014-04-11 07:38:28 UTC

I can reproduce it on libvirt-0.10.2-29.el6, can not reproduce it on below version:

libvirt-0.10.2-31.el6.x86_64
kernel-2.6.32-456.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.423.el6.x86_64

Steps:
1.Download python scripts from attachments and generate disk.xml and its image 
[root@intel-5205-32-2 1043069]# cat disk.xml 
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/libvirt/images/disk1.img'/>
<target dev='vdb' bus='virtio'/>
</disk>
[root@intel-5205-32-2 1043069]# ll /var/lib/libvirt/images/disk1.img
-rw-r--r--. 1 qemu qemu 10485760 Apr 11 13:36 /var/lib/libvirt/images/disk1.img

2.Execute attached python scripts in two different terminals at the same time, running around 1 hour, no crash occurred.

The first terminal:
[root@intel-5205-32-2 1043069]# time ./attach_detach.py
attach disk return: 0
detach disk return: 0
attach disk return: 0
detach disk return: 0
...(clipped)
detach disk return: 0
attach disk return: 0
^Cattach/detach disk fail

real	77m8.458s
user	0m1.613s
sys	0m0.497s

The second terminal:
[root@intel-5205-32-2 1043069]# time ./domstate.py 
(41L, 167936L, 0L, 0L, -1L)
...(clipped)
(41L, 167936L, 0L, 0L, -1L)
libvirt: QEMU Driver error : invalid argument: invalid path: /var/lib/libvirt/images/disk1.img
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(0L, 0L, 0L, 0L, -1L)
(2L, 8192L, 0L, 0L, -1L)
(2L, 8192L, 0L, 0L, -1L)
(2L, 8192L, 0L, 0L, -1L)
(9L, 36864L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(21L, 86016L, 0L, 0L, -1L)
(22L, 90112L, 0L, 0L, -1L)
(22L, 90112L, 0L, 0L, -1L)
(22L, 90112L, 0L, 0L, -1L)
(34L, 139264L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
...(clipped)
(41L, 167936L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
(41L, 167936L, 0L, 0L, -1L)
^CTraceback (most recent call last):
  File "./domstate.py", line 21, in <module>
    print dom.blockStats(path)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1837, in blockStats
    ret = libvirtmod.virDomainBlockStats(self._o, path)
KeyboardInterrupt

real	77m7.032s
user	1m36.901s
sys	0m25.276s

[root@intel-5205-32-2 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     r6                             running
[root@intel-5205-32-2 1043069]# service libvirtd status
libvirtd (pid  11167) is running...

We can get the expected results, changed to Verified.

Comment 27 errata-xmlrpc 2014-10-14 04:19:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1374.html

Note You need to log in before you can comment on or make changes to this bug.