Bug 908073 - Crash changing CDROM device media
Crash changing CDROM device media
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.4
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Osier Yang
Virtualization Bugs
: Regression, ZStream
: 925977 (view as bug list)
Depends On:
Blocks: 919504
  Show dependency treegraph
 
Reported: 2013-02-05 14:18 EST by Daniel Berrange
Modified: 2013-11-21 03:44 EST (History)
12 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-19.el6
Doc Type: Bug Fix
Doc Text:
Cause: Dereference pointers (E.g. disk def) which might be already freed. Consequence: Regressions like libvirtd crashing Fix: Add various checking to avoid dereferencing NULL pointers. And also copy the disk def before changing CD-ROM or Floppy medium. Result: The crash is fixed.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 03:44:55 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Daniel Berrange 2013-02-05 14:18:22 EST
Description of problem:

The SGIO changes in bug 878578 can cause a crash of libvirtd when changing ejectable media. The stack trace (against upstream GIT) is

(gdb) bt
#0  0x00007ffff371fde8 in _IO_vfprintf_internal (s=s@entry=0x7fffe7fa3db0, format=<optimized out>, 
    format@entry=0x7ffff770a66e "Unable to get device ID '%s'", ap=ap@entry=0x7fffe7fa3f98) at vfprintf.c:1615
#1  0x00007ffff37dfeb0 in ___vsnprintf_chk (s=s@entry=0x7fffe7fa43b0 "Unable to get device ID 'G\372\347\377\177", maxlen=<optimized out>, 
    maxlen@entry=1024, flags=flags@entry=1, slen=slen@entry=1024, format=format@entry=0x7ffff770a66e "Unable to get device ID '%s'", 
    args=args@entry=0x7fffe7fa3f98) at vsnprintf_chk.c:63
#2  0x00007ffff7499d7d in vsnprintf (__ap=0x7fffe7fa3f98, __fmt=0x7ffff770a66e "Unable to get device ID '%s'", __n=1024, 
    __s=0x7fffe7fa43b0 "Unable to get device ID 'G\372\347\377\177") at /usr/include/bits/stdio2.h:77
#3  virReportSystemErrorFull (domcode=domcode@entry=0, theerrno=theerrno@entry=14, filename=filename@entry=0x7ffff770a3c3 "util/virutil.c", 
    funcname=funcname@entry=0x7ffff770ace0 <__FUNCTION__.13608> "virGetUnprivSGIOSysfsPath", linenr=linenr@entry=3184, 
    fmt=0x7ffff770a66e "Unable to get device ID '%s'") at util/virerror.c:1317
#4  0x00007ffff74cdcb3 in virGetUnprivSGIOSysfsPath (path=0xbcbcbcbcbcbcbcbc <Address 0xbcbcbcbcbcbcbcbc out of bounds>, sysfs_dir=0x0)
    at util/virutil.c:3182
#5  0x00007ffff74cdd11 in virSetDeviceUnprivSGIO (path=<optimized out>, sysfs_dir=sysfs_dir@entry=0x0, unpriv_sgio=0) at util/virutil.c:3208
#6  0x00007fffdf135785 in qemuSetUnprivSGIO (disk=disk@entry=0x7fffc8002a90) at qemu/qemu_process.c:3473
#7  0x00007fffdf17c2dd in qemuDomainAttachDeviceDiskLive (dev=0x7fffc8000b30, vm=0x7fffd0009aa0, driver=0x7fffd8062140, conn=0x7fffcc000ae0)
    at qemu/qemu_driver.c:5880
#8  qemuDomainAttachDeviceLive (dom=<optimized out>, dev=0x7fffc8000b30, vm=0x7fffd0009aa0) at qemu/qemu_driver.c:5922
#9  qemuDomainModifyDeviceFlags (dom=<optimized out>, xml=<optimized out>, flags=1, action=0) at qemu/qemu_driver.c:6527
#10 0x00007ffff75562d5 in virDomainAttachDevice (domain=domain@entry=0x7fffc80008c0, 
    xml=0x7fffc8000a40 "<disk type='file' device='cdrom'>\n  <source file='/home/berrange/tmp/libvirt-tck/207-disk-media-change/extra2.img'/>\n  <target dev='hdc'/>\n</disk>\n") at libvirt.c:9778
#11 0x000000000042cbb2 in remoteDispatchDomainAttachDevice (server=<optimized out>, msg=<optimized out>, args=0x7fffc8000900, 
    rerr=0x7fffe7fa4c50, client=<optimized out>) at remote_dispatch.h:474
#12 remoteDispatchDomainAttachDeviceHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7fffe7fa4c50, 
    args=0x7fffc8000900, ret=<optimized out>) at remote_dispatch.h:452
#13 0x00007ffff75b06a2 in virNetServerProgramDispatchCall (msg=0x693900, client=0x693e60, server=0x66b490, prog=0x690bd0)
    at rpc/virnetserverprogram.c:432
#14 virNetServerProgramDispatch (prog=0x690bd0, server=server@entry=0x66b490, client=0x693e60, msg=0x693900) at rpc/virnetserverprogram.c:305
#15 0x00007ffff75aa8d8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x66b490)
    at rpc/virnetserver.c:162
#16 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x66b490) at rpc/virnetserver.c:183
#17 0x00007ffff74c3f3e in virThreadPoolWorker (opaque=opaque@entry=0x65adb0) at util/virthreadpool.c:144
#18 0x00007ffff74c35f6 in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161
#19 0x00007ffff3e9ed15 in start_thread (arg=0x7fffe7fa5700) at pthread_create.c:308
#20 0x00007ffff37ca46d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:114


Valgrind shows this:


==5738== Invalid read of size 4
==5738==    at 0x1DC6F2BF: qemuDomainModifyDeviceFlags (qemu_driver.c:5874)
==5738==    by 0x53652D4: virDomainAttachDevice (libvirt.c:9778)
==5738==    by 0x42CBB1: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:474)
==5738==    by 0x53BF6A1: virNetServerProgramDispatch (virnetserverprogram.c:432)
==5738==    by 0x53B98D7: virNetServerHandleJob (virnetserver.c:162)
==5738==    by 0x52D2F3D: virThreadPoolWorker (virthreadpool.c:144)
==5738==    by 0x52D25F5: virThreadHelper (virthreadpthread.c:161)
==5738==    by 0x8B60D14: start_thread (pthread_create.c:308)
==5738==    by 0x926E46C: clone (clone.S:114)
==5738==  Address 0x202cb0f0 is 0 bytes inside a block of size 384 free'd
==5738==    at 0x4C297A6: free (vg_replace_malloc.c:446)
==5738==    by 0x52942F8: virFree (viralloc.c:419)
==5738==    by 0x52F0A9F: virDomainDiskDefFree (domain_conf.c:1049)
==5738==    by 0x1DC1ACF4: qemuDomainChangeEjectableMedia (qemu_hotplug.c:167)
==5738==    by 0x1DC6F2B2: qemuDomainModifyDeviceFlags (qemu_driver.c:5838)
==5738==    by 0x53652D4: virDomainAttachDevice (libvirt.c:9778)
==5738==    by 0x42CBB1: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:474)
==5738==    by 0x53BF6A1: virNetServerProgramDispatch (virnetserverprogram.c:432)
==5738==    by 0x53B98D7: virNetServerHandleJob (virnetserver.c:162)
==5738==    by 0x52D2F3D: virThreadPoolWorker (virthreadpool.c:144)
==5738==    by 0x52D25F5: virThreadHelper (virthreadpthread.c:161)
==5738==    by 0x8B60D14: start_thread (pthread_create.c:308)
==5738== 
==5738== Invalid read of size 4
==5738==    at 0x1DC28768: qemuSetUnprivSGIO (qemu_process.c:3456)
==5738==    by 0x1DC6F2DC: qemuDomainModifyDeviceFlags (qemu_driver.c:5880)
==5738==    by 0x53652D4: virDomainAttachDevice (libvirt.c:9778)
==5738==    by 0x42CBB1: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:474)
==5738==    by 0x53BF6A1: virNetServerProgramDispatch (virnetserverprogram.c:432)
==5738==    by 0x53B98D7: virNetServerHandleJob (virnetserver.c:162)
==5738==    by 0x52D2F3D: virThreadPoolWorker (virthreadpool.c:144)
==5738==    by 0x52D25F5: virThreadHelper (virthreadpthread.c:161)
==5738==    by 0x8B60D14: start_thread (pthread_create.c:308)
==5738==    by 0x926E46C: clone (clone.S:114)
==5738==  Address 0x202cb258 is 360 bytes inside a block of size 384 free'd
==5738==    at 0x4C297A6: free (vg_replace_malloc.c:446)
==5738==    by 0x52942F8: virFree (viralloc.c:419)
==5738==    by 0x52F0A9F: virDomainDiskDefFree (domain_conf.c:1049)
==5738==    by 0x1DC1ACF4: qemuDomainChangeEjectableMedia (qemu_hotplug.c:167)
==5738==    by 0x1DC6F2B2: qemuDomainModifyDeviceFlags (qemu_driver.c:5838)
==5738==    by 0x53652D4: virDomainAttachDevice (libvirt.c:9778)
==5738==    by 0x42CBB1: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:474)
==5738==    by 0x53BF6A1: virNetServerProgramDispatch (virnetserverprogram.c:432)
==5738==    by 0x53B98D7: virNetServerHandleJob (virnetserver.c:162)
==5738==    by 0x52D2F3D: virThreadPoolWorker (virthreadpool.c:144)
==5738==    by 0x52D25F5: virThreadHelper (virthreadpthread.c:161)
==5738==    by 0x8B60D14: start_thread (pthread_create.c:308)
==5738== 
==5738== Invalid read of size 4
==5738==    at 0x1DC287A0: qemuSetUnprivSGIO (qemu_process.c:3463)
==5738==    by 0x1DC6F2DC: qemuDomainModifyDeviceFlags (qemu_driver.c:5880)
==5738==    by 0x53652D4: virDomainAttachDevice (libvirt.c:9778)
==5738==    by 0x42CBB1: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:474)
==5738==    by 0x53BF6A1: virNetServerProgramDispatch (virnetserverprogram.c:432)
==5738==    by 0x53B98D7: virNetServerHandleJob (virnetserver.c:162)
==5738==    by 0x52D2F3D: virThreadPoolWorker (virthreadpool.c:144)
==5738==    by 0x52D25F5: virThreadHelper (virthreadpthread.c:161)
==5738==    by 0x8B60D14: start_thread (pthread_create.c:308)
==5738==    by 0x926E46C: clone (clone.S:114)
==5738==  Address 0x202cb0f4 is 4 bytes inside a block of size 384 free'd
==5738==    at 0x4C297A6: free (vg_replace_malloc.c:446)
==5738==    by 0x52942F8: virFree (viralloc.c:419)
==5738==    by 0x52F0A9F: virDomainDiskDefFree (domain_conf.c:1049)
==5738==    by 0x1DC1ACF4: qemuDomainChangeEjectableMedia (qemu_hotplug.c:167)
==5738==    by 0x1DC6F2B2: qemuDomainModifyDeviceFlags (qemu_driver.c:5838)
==5738==    by 0x53652D4: virDomainAttachDevice (libvirt.c:9778)
==5738==    by 0x42CBB1: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:474)
==5738==    by 0x53BF6A1: virNetServerProgramDispatch (virnetserverprogram.c:432)
==5738==    by 0x53B98D7: virNetServerHandleJob (virnetserver.c:162)
==5738==    by 0x52D2F3D: virThreadPoolWorker (virthreadpool.c:144)
==5738==    by 0x52D25F5: virThreadHelper (virthreadpthread.c:161)
==5738==    by 0x8B60D14: start_thread (pthread_create.c:308)


The problem is this code in qemuDomainAttachDeviceDiskLive:

    if (ret == 0) {
        if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && disk->shared) {
            if (qemuAddSharedDisk(driver->sharedDisks, disk->src) < 0)
                VIR_WARN("Failed to add disk '%s' to shared disk table",
                         disk->src);
        }

        if (qemuSetUnprivSGIO(disk) < 0)
            VIR_WARN("Failed to set unpriv_sgio of disk '%s'", disk->src);
    }

because at the point this runs 'disk' can already have been free'd by the qemuDomainChangeEjectableMedia() method.

I've not directly tested the RHEL libvirt build, but from code inspection it shares the same design flaw as upstream, so I've no reason to thing it won't also crash.

Version-Release number of selected component (if applicable):
0.10.2-18

How reproducible:
Always

Steps to Reproduce:
1. Start a guest with CDROM device
2. Issue a media change
3.
  
Actual results:
See above

Expected results:


Additional info:
Comment 3 dyuan 2013-02-06 00:36:42 EST
libvirtd crash when I start the guest with shareable block cdrom.

<disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <shareable/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>


(gdb) bt
#0  virFileResolveLinkHelper (linkpath=0x0, intermediatePaths=false, resultpath=0x7ffff13171d8) at util/util.c:557
#1  0x0000003f0a6670a8 in virGetDeviceID (path=<value optimized out>, maj=0x7ffff131721c, min=0x7ffff1317218) at util/util.c:3149
#2  0x0000000000477da1 in qemuGetSharedDiskKey (disk_path=0x0) at qemu/qemu_conf.c:757
#3  0x0000000000477ef9 in qemuAddSharedDisk (sharedDisks=0x7fffe401c870, disk_path=<value optimized out>) at qemu/qemu_conf.c:782
#4  0x0000000000481dc4 in qemuProcessStart (conn=0x7fffcc001300, driver=0x7fffe40ba120, vm=0x7fffe4012cc0, migrateFrom=0x0, stdin_fd=-1, 
    stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1) at qemu/qemu_process.c:3856
#5  0x000000000046a9de in qemuDomainObjStart (conn=0x7fffcc001300, driver=0x7fffe40ba120, vm=0x7fffe4012cc0, flags=<value optimized out>)
    at qemu/qemu_driver.c:5610
#6  0x000000000046afd2 in qemuDomainStartWithFlags (dom=0x7fffd00008c0, flags=0) at qemu/qemu_driver.c:5667
#7  0x0000003f0a6f2970 in virDomainCreate (domain=0x7fffd00008c0) at libvirt.c:8303
#8  0x000000000043fbe2 in remoteDispatchDomainCreate (server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, 
    rerr=0x7ffff1317b80, args=<value optimized out>, ret=<value optimized out>) at remote_dispatch.h:1066
#9  remoteDispatchDomainCreateHelper (server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, 
    rerr=0x7ffff1317b80, args=<value optimized out>, ret=<value optimized out>) at remote_dispatch.h:1044
#10 0x0000003f0a73f162 in virNetServerProgramDispatchCall (prog=0x7a13f0, server=0x796010, client=0x79e220, msg=0x79de30)
    at rpc/virnetserverprogram.c:431
#11 virNetServerProgramDispatch (prog=0x7a13f0, server=0x796010, client=0x79e220, msg=0x79de30) at rpc/virnetserverprogram.c:304
#12 0x0000003f0a73fdfe in virNetServerProcessMsg (srv=<value optimized out>, client=0x79e220, prog=<value optimized out>, msg=0x79de30)
    at rpc/virnetserver.c:170
#13 0x0000003f0a74049c in virNetServerHandleJob (jobOpaque=<value optimized out>, opaque=<value optimized out>) at rpc/virnetserver.c:191
#14 0x0000003f0a662c4c in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:144
#15 0x0000003f0a662539 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161
#16 0x0000003ef2807851 in start_thread () from /lib64/libpthread.so.0
#17 0x0000003ef20e890d in clone () from /lib64/libc.so.6
Comment 4 dyuan 2013-02-06 00:43:23 EST
And it works fine with libvirt-0.10.2-12.el6.
Comment 5 Daniel Berrange 2013-02-06 04:47:48 EST
The further problem in this code in qemuDomainAttachDeviceDiskLive:

    if (ret == 0) {
        if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && disk->shared) {
            if (qemuAddSharedDisk(driver->sharedDisks, disk->src) < 0)
                VIR_WARN("Failed to add disk '%s' to shared disk table",
                         disk->src);
        }

        if (qemuSetUnprivSGIO(disk) < 0)
            VIR_WARN("Failed to set unpriv_sgio of disk '%s'", disk->src);
    }

is that it is racy with the guest. By the time we get to this point, we have already assigned the device to the guest, and it can be doing I/O.  We need to do all setup work *prior* to assigning the device to the guest.
Comment 6 Osier Yang 2013-03-07 05:42:52 EST
commit 02b9097274d1330c2e1dca7f598880e09b5c2aa0
Author: Daniel P. Berrange <berrange@redhat.com>
Date:   Mon Feb 25 17:22:14 2013 +0000

    Fix crash changing CDROM media
    
    This change tried to fix a crash with changing CDROM media but
    failed to actually do so
    
      commit d0172d2b1b5d865aaa042070d7c2d00effb2ff8c
      Author: Osier Yang <jyang@redhat.com>
      Date:   Tue Feb 19 20:27:45 2013 +0800
    
        qemu: Remove the shared disk entry if the operation is ejecting or updating
    
    It was still accessing disk->src, when the entire 'disk' object
    has been free'd already. Even if it weren't free'd, accessing
    the 'src' value of virDomainDiskDef is not allowed without
    first validating disk->type is file or block. Just remove the
    broken code entirely.
    
    Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

commit 5c9034bf055d043672be5da8af453e5ddc3906d3
Author: Osier Yang <jyang@redhat.com>
Date:   Thu Feb 21 10:32:15 2013 +0800

    qemu: Fix the memory leak
    
    Found by John Ferlan (coverity script)

commit d0172d2b1b5d865aaa042070d7c2d00effb2ff8c
Author: Osier Yang <jyang@redhat.com>
Date:   Tue Feb 19 20:27:45 2013 +0800

    qemu: Remove the shared disk entry if the operation is ejecting or updating
    
    For both AttachDevice and UpdateDevice APIs, if the disk device
    is 'cdrom' or 'floppy', the operations could be ejecting, updating,
    and inserting. For either ejecting or updating, the shared disk
    entry of the original disk src has to be removed, because it's
    not useful anymore.
    
    And since the original disk def will be changed, new disk def passed
    as argument will be free'ed in qemuDomainChangeEjectableMedia, so
    we need to copy the orignal disk def before
    qemuDomainChangeEjectableMedia, to use it for qemuRemoveSharedDisk.

commit 0db7ff59cc419d9859925e1324f021f59e2fe260
Author: Osier Yang <jyang@redhat.com>
Date:   Tue Feb 19 20:27:44 2013 +0800

    qemu: Move the shared disk adding and sgio setting prior to attaching
    
    The disk def could be free'ed by qemuDomainChangeEjectableMedia,
    which can thus cause crash if we reference the disk pointer. On
    the other hand, we have to remove the added shared disk entry from
    the table on error codepath.

commit d0e4b762042e8f71c24ee93312ee3a131dcd0335
Author: Osier Yang <jyang@redhat.com>
Date:   Tue Feb 19 20:27:43 2013 +0800

    qemu: Update shared disk table when reconnecting qemu process

commit a4504ac184c8fa5c30856a58e2f26f1a5db3fb90
Author: Osier Yang <jyang@redhat.com>
Date:   Wed Feb 20 15:43:55 2013 +0800

    qemu: Record names of domain which uses the shared disk in hash table
    
    The hash entry is changed from "ref" to {ref, @domains}. With this, the
    caller can simply call qemuRemoveSharedDisk, without afraid of removing
    the entry belongs to other domains. qemuProcessStart will obviously
    benifit from it on error codepath (which calls qemuProcessStop to do
    the cleanup).

commit 371df778ebe53e649640d6cd27027856c852e5cc
Author: Osier Yang <jyang@redhat.com>
Date:   Tue Feb 19 20:27:41 2013 +0800

    qemu: Merge qemuCheckSharedDisk into qemuAddSharedDisk
    
    Based on moving various checking into qemuAddSharedDisk, this
    avoids the caller using it in wrong ways. Also this adds two
    new checking for qemuCheckSharedDisk (disk device not 'lun'
    and kernel doesn't support unpriv_sgio simply returns 0).

commit dab878a861cdcdd7bade2fdac672fec7c128cdb7
Author: Osier Yang <jyang@redhat.com>
Date:   Tue Feb 19 20:27:40 2013 +0800

    qemu: Add checking in helpers for sgio setting
    
    This moves the various checking into the helpers, to avoid the
    callers missing the checking.

Commits are now in upstream, move to POST.
Comment 8 Jiri Denemark 2013-03-25 06:22:25 EDT
*** Bug 925977 has been marked as a duplicate of this bug. ***
Comment 10 Luwen Su 2013-07-09 04:33:22 EDT
packages:
libvirt-0.10.2-19.el6.x86_64
kernel-2.6.32-396.el6.x86_64

1.
Start a guest that have

<disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <shareable/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
</disk>

2.
in libvirt-0.10.2-18.el6.x86_64 , the libvirtd crashed
in libvirt-0.10.2-19.el6.x86_64 , everything works fine , guest started , 
libvirtd is running , no error log.

So set this bug VERIFIED
Comment 12 errata-xmlrpc 2013-11-21 03:44:55 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1581.html

Note You need to log in before you can comment on or make changes to this bug.