This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1316370 - libvirtd crash when guest has a authentication volume cdrom with startupPolicy.
libvirtd crash when guest has a authentication volume cdrom with startupPolicy.
Status: ON_QA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.3
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Michal Privoznik
lijuan men
: Upstream
Depends On:
Blocks: 1401400 1473046
  Show dependency treegraph
 
Reported: 2016-03-10 01:11 EST by Pei Zhang
Modified: 2017-09-05 05:31 EDT (History)
7 users (show)

See Also:
Fixed In Version: libvirt-3.7.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pei Zhang 2016-03-10 01:11:24 EST
Description of problem:
Normally when use authenticated volume disk, there is no need to add <auth> when define a guest. It will be added automatically after starting domain. 
I define a guest with a authenticate volume cdrom with startupPolciy, and configure <auth> in <disk> manually, then start guest libvirtd crash.

Version-Release number of selected component (if applicable):
libvirt-1.3.2-1.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Prepare a authentication iscsi pool with an iso volume

2.define a guest with <auth> in disk xml like following
#virsh dumpxml r72|grep disk -A 9

<disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <auth username='libvirt'>
        <secret type='iscsi' usage='libvirtiscsi'/>
      </auth>
      <source pool='iscsi-secret-pool' volume='unit:0:0:1' mode='direct' startupPolicy='optional'/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

then start guest, libvirtd crash.
 
# virsh start r72
error: Disconnected from qemu:///system due to I/O error
error: Failed to start domain r72
error: End of file while reading data: Input/output error

error: One or more references were leaked after disconnect from the hypervisor


3. delete startupPolicy, guest start successfully and cdrom is useful in guest.
<disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
          <auth username='libvirt'>
        <secret type='iscsi' usage='libvirtiscsi'/>
      </auth>
      <source pool='iscsi-secret-pool' volume='unit:0:0:1' mode='direct'/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

# virsh start r72
Domain r72 started

check cdrom in guest, it's useful.

4.delete <auth/> configuration,define and start a guest, it will give an error.
<disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi-secret-pool' volume='unit:0:0:1' mode='direct' startupPolicy='optional'/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

error: Failed to start domain r72
error: XML error: 'startupPolicy' is only valid for 'file' type volume

Actual results:
As step2, libvirtd crash.

Expected results:
Perhaps it should give an error like steps 4. It's invalid configurations.

Additional info:
Program received signal SIGABRT, Aborted.
0x00007f0bd1e2c5f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f0bd1e2c5f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f0bd1e2dce8 in __GI_abort () at abort.c:90
#2  0x00007f0bd1e6c317 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f0bd1f75a28 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f0bd1e74023 in malloc_printerr (ar_ptr=0x7f0bd21b1760 <main_arena>, ptr=<optimized out>, str=0x7f0bd1f73114 "free(): invalid pointer", action=3) at malloc.c:5018
#4  _int_free (av=0x7f0bd21b1760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3842
#5  0x00007f0bd4af626a in virFree (ptrptr=0x7f0bac008400) at util/viralloc.c:582
#6  0x00007f0bd4b4efe3 in virStorageAuthDefFree (authdef=0x7f0bac008400) at util/virstoragefile.c:1504
#7  0x00007f0bd4b508bd in virStorageSourceClear (def=0x7f0bac008670) at util/virstoragefile.c:2053
#8  0x00007f0bd4b4fd63 in virStorageSourceFree (def=0x7f0bac008670) at util/virstoragefile.c:2065
#9  0x00007f0bd4b75e5a in virDomainDiskDefFree (def=0x7f0bac008500) at conf/domain_conf.c:1422
#10 0x00007f0bd4b87d54 in virDomainDefFree (def=0x7f0bac007a40) at conf/domain_conf.c:2448
#11 0x00007f0b8fbabdc8 in qemuProcessStop (driver=driver@entry=0x7f0b701b5db0, vm=vm@entry=0x7f0b70223ac0, reason=reason@entry=VIR_DOMAIN_SHUTOFF_FAILED, flags=flags@entry=0) at qemu/qemu_process.c:5512
#12 0x00007f0b8fbad96f in qemuProcessStart (conn=conn@entry=0x7f0bac00ba70, driver=driver@entry=0x7f0b701b5db0, vm=vm@entry=0x7f0b70223ac0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, 
    migrateFd=migrateFd@entry=-1, migratePath=migratePath@entry=0x0, snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=flags@entry=1) at qemu/qemu_process.c:5191
#13 0x00007f0b8fc09b58 in qemuDomainObjStart (conn=0x7f0bac00ba70, driver=driver@entry=0x7f0b701b5db0, vm=0x7f0b70223ac0, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7409
#14 0x00007f0b8fc0a296 in qemuDomainCreateWithFlags (dom=0x7f0bb4000fe0, flags=0) at qemu/qemu_driver.c:7463
#15 0x00007f0bd4bfb84c in virDomainCreate (domain=domain@entry=0x7f0bb4000fe0) at libvirt-domain.c:6753
#16 0x00007f0bd58505fb in remoteDispatchDomainCreate (server=0x7f0bd63e2ff0, msg=0x7f0bd6403440, args=<optimized out>, rerr=0x7f0bc4f4ac30, client=0x7f0bd6403200) at remote_dispatch.h:3613
#17 remoteDispatchDomainCreateHelper (server=0x7f0bd63e2ff0, client=0x7f0bd6403200, msg=0x7f0bd6403440, rerr=0x7f0bc4f4ac30, args=<optimized out>, ret=0x7f0bb400d5f0) at remote_dispatch.h:3589
#18 0x00007f0bd4c64842 in virNetServerProgramDispatchCall (msg=0x7f0bd6403440, client=0x7f0bd6403200, server=0x7f0bd63e2ff0, prog=0x7f0bd63fe300) at rpc/virnetserverprogram.c:437
#19 virNetServerProgramDispatch (prog=0x7f0bd63fe300, server=server@entry=0x7f0bd63e2ff0, client=0x7f0bd6403200, msg=0x7f0bd6403440) at rpc/virnetserverprogram.c:307
#20 0x00007f0bd4c5fa6d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f0bd63e2ff0) at rpc/virnetserver.c:135
#21 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f0bd63e2ff0) at rpc/virnetserver.c:156
#22 0x00007f0bd4b58b05 in virThreadPoolWorker (opaque=opaque@entry=0x7f0bd63e2a80) at util/virthreadpool.c:145
#23 0x00007f0bd4b58028 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#24 0x00007f0bd21bfdc5 in start_thread (arg=0x7f0bc4f4b700) at pthread_create.c:308
#25 0x00007f0bd1eed28d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Comment 2 Michal Privoznik 2016-06-28 09:46:29 EDT
I've just pushed the fix usptream:

commit ca5d51df27567ef8d77c126815d01c484deb359f
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Tue Jun 28 14:44:57 2016 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Tue Jun 28 15:02:16 2016 +0200

    virStorageTranslateDiskSourcePool: Avoid double free
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1316370
    
    Consider the following disk for a domain:
    
        <disk type='volume' device='cdrom'>
          <driver name='qemu' type='raw'/>
          <auth username='libvirt'>
            <secret type='iscsi' usage='libvirtiscsi'/>
          </auth>
          <source pool='iscsi-secret-pool' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/>
          <target dev='sda' bus='scsi'/>
          <readonly/>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
    
    Now, startupPolicy is currently not allowed for iscsi disks, so
    one would expect an error message to be thrown. But what a
    surprise is waiting for users if they try to start up such
    domain:
    
    ==15724== Invalid free() / delete / delete[] / realloc()
    ==15724==    at 0x4C2B1F0: free (vg_replace_malloc.c:473)
    ==15724==    by 0x54B7A69: virFree (viralloc.c:582)
    ==15724==    by 0x552DC90: virStorageAuthDefFree (virstoragefile.c:1549)
    ==15724==    by 0x552F023: virStorageSourceClear (virstoragefile.c:2055)
    ==15724==    by 0x552F054: virStorageSourceFree (virstoragefile.c:2067)
    ==15724==    by 0x55556AA: virDomainDiskDefFree (domain_conf.c:1562)
    ==15724==    by 0x5557ABE: virDomainDefFree (domain_conf.c:2547)
    ==15724==    by 0x1B43CC42: qemuProcessStop (qemu_process.c:5918)
    ==15724==    by 0x1B43BA2E: qemuProcessStart (qemu_process.c:5511)
    ==15724==    by 0x1B48993E: qemuDomainObjStart (qemu_driver.c:7050)
    ==15724==    by 0x1B489B9A: qemuDomainCreateWithFlags (qemu_driver.c:7104)
    ==15724==    by 0x1B489C01: qemuDomainCreate (qemu_driver.c:7122)
    ==15724==  Address 0x21cfbb90 is 0 bytes inside a block of size 48 free'd
    ==15724==    at 0x4C2B1F0: free (vg_replace_malloc.c:473)
    ==15724==    by 0x54B7A69: virFree (viralloc.c:582)
    ==15724==    by 0x552DC90: virStorageAuthDefFree (virstoragefile.c:1549)
    ==15724==    by 0x12D1C8D4: virStorageTranslateDiskSourcePool (storage_driver.c:3475)
    ==15724==    by 0x1B4396E4: qemuProcessPrepareDomain (qemu_process.c:4896)
    ==15724==    by 0x1B43B880: qemuProcessStart (qemu_process.c:5466)
    ==15724==    by 0x1B48993E: qemuDomainObjStart (qemu_driver.c:7050)
    ==15724==    by 0x1B489B9A: qemuDomainCreateWithFlags (qemu_driver.c:7104)
    ==15724==    by 0x1B489C01: qemuDomainCreate (qemu_driver.c:7122)
    ==15724==    by 0x561CA97: virDomainCreate (libvirt-domain.c:6787)
    ==15724==    by 0x12B6FD: remoteDispatchDomainCreate (remote_dispatch.h:4116)
    ==15724==    by 0x12B61A: remoteDispatchDomainCreateHelper (remote_dispatch.h:4092)
    
    The problem is, in virStorageTranslateDiskSourcePool disk
    def->src->auth is freed, but the pointer is not set to NULL. So
    later, when qemuProcessStop starts to free the domain definition,
    virStorageAuthDefFree() tries to free the memory again, instead
    of jumping out immediately.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

v2.0.0-rc1-40-gca5d51d
Comment 4 lijuan men 2016-07-18 23:06:19 EDT
verify the bug

version:
libvirt-2.0.0-2.el7.x86_64
qemu-kvm-rhev-2.6.0-13.el7.x86_64
kernel-3.10.0-470.el7.x86_64

steps:
1.Prepare a authentication iscsi pool with an iso volume

2.define a guest with <auth> in disk xml like following
#virsh dumpxml bios|grep disk -A 9
 <disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <auth username='rhat'>
        <secret type='iscsi' usage='libvirtiscsi'/>
      </auth>
      <source pool='iscsi' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
3.start the guest
[root@localhost ~]# virsh start bios
error: Failed to start domain bios
error: XML error: 'startupPolicy' is only valid for 'file' type volume
Comment 5 lijuan men 2016-09-22 05:27:23 EDT
Now the result is not the same as comment4.

summary:
if startupPolicy='requisite'/'mandatory' is used with 'block' type volume,the guest can not be booted up.
if startupPolicy='optional' is used with 'block' type volume,the guest will boot up successfully. 


test version:
libvirt-2.0.0-10.el7.x86_64
qemu-kvm-rhev-2.6.0-26.el7.x86_64

steps:

scenario1:use startupPolicy='mandatory' with 'block' type volume

1.Prepare a authentication iscsi pool with an iso volume

2.define a guest with  xml like following
  <disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi' volume='unit:0:0:0' mode='direct' ***startupPolicy='mandatory'***/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

[root@intel-e5530-8-1 ~]# virsh start r73
error: Failed to start domain r73
error: XML error: 'startupPolicy' is only valid for 'file' type volume


scenario2:use startupPolicy='optional' with 'block' type volume

1.define a guest with  xml like following
 <disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
     *** <auth username='redhat'>    ***
    ***     <secret type='iscsi' usage='libvirtiscsi'/> ***
     *** </auth>  ***
      <source pool='iscsi' volume='unit:0:0:0' mode='direct'  ***startupPolicy='optional' ***/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

[root@intel-e5530-8-1 ~]# virsh start r73    --->boot up successfully
Domain r73 started

[root@intel-e5530-8-1 ~]# virsh dumpxml r73 | grep disk -A 9

<disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>        ---->there is no <auth>...</auth> info,is it normal?
      <source pool='iscsi' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

I think the scenario2 is not right. Will I file a new bug to track it? or continue to track it in this bug?
Comment 6 Xuesong Zhang 2016-09-23 02:07:49 EDT
Change back to Assign per above comment 5, and move to RHEL7.4. Please don't hesitate to move it back to ON_QA if it works as design or prefer to fix it in a new bug. Thanks.
Comment 7 Michal Privoznik 2016-09-23 02:43:45 EDT
(In reply to lijuan men from comment #5)
> Now the result is not the same as comment4.
> 
> summary:
> if startupPolicy='requisite'/'mandatory' is used with 'block' type
> volume,the guest can not be booted up.
> if startupPolicy='optional' is used with 'block' type volume,the guest will
> boot up successfully. 
> 
> 
> test version:
> libvirt-2.0.0-10.el7.x86_64
> qemu-kvm-rhev-2.6.0-26.el7.x86_64
> 
> steps:
> 
> scenario1:use startupPolicy='mandatory' with 'block' type volume
> 
> 1.Prepare a authentication iscsi pool with an iso volume
> 
> 2.define a guest with  xml like following
>   <disk type='volume' device='cdrom'>
>       <driver name='qemu' type='raw'/>
>       <source pool='iscsi' volume='unit:0:0:0' mode='direct'
> ***startupPolicy='mandatory'***/>
>       <target dev='sda' bus='scsi'/>
>       <readonly/>
>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>     </disk>
> 
> [root@intel-e5530-8-1 ~]# virsh start r73
> error: Failed to start domain r73
> error: XML error: 'startupPolicy' is only valid for 'file' type volume

This is expected. Our documentation says that startupPolicy is valid only for file based disks:

        (NB, <code>startupPolicy</code> is not valid for "volume" disk unless
         the specified storage volume is of "file" type).


> 
> 
> scenario2:use startupPolicy='optional' with 'block' type volume
> 
> 1.define a guest with  xml like following
>  <disk type='volume' device='cdrom'>
>       <driver name='qemu' type='raw'/>
>      *** <auth username='redhat'>    ***
>     ***     <secret type='iscsi' usage='libvirtiscsi'/> ***
>      *** </auth>  ***
>       <source pool='iscsi' volume='unit:0:0:0' mode='direct' 
> ***startupPolicy='optional' ***/>
>       <target dev='sda' bus='scsi'/>
>       <readonly/>
>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>     </disk>

Yeah, this should report an error to match our documentation.

> 
> [root@intel-e5530-8-1 ~]# virsh start r73    --->boot up successfully
> Domain r73 started
> 
> [root@intel-e5530-8-1 ~]# virsh dumpxml r73 | grep disk -A 9
> 
> <disk type='volume' device='cdrom'>
>       <driver name='qemu' type='raw'/>        ---->there is no
> <auth>...</auth> info,is it normal?

Yes. You need to use 'virsh dumpxml --security-info' to display security sensitive info.

>       <source pool='iscsi' volume='unit:0:0:0' mode='direct'
> startupPolicy='optional'/>
>       <backingStore/>
>       <target dev='sda' bus='scsi'/>
>       <readonly/>
>       <alias name='scsi0-0-0-0'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>     </disk>
> 
> I think the scenario2 is not right. Will I file a new bug to track it? or
> continue to track it in this bug?

Yeah, we can track it here.
Comment 9 Michal Privoznik 2017-03-23 10:30:59 EDT
On a second thought, our documentation also states that:

Since 1.1.2 the startupPolicy is extended to support hard disks besides cdrom and floppy. On guest cold bootup, if a certain disk is not accessible or its disk chain is broken, with startupPolicy 'optional' the guest will drop this disk. 

Therefore I don't think there is something to fix, is there?
Comment 11 lijuan men 2017-05-10 03:55:11 EDT
(In reply to Michal Privoznik from comment #9)
> On a second thought, our documentation also states that:
> 
> Since 1.1.2 the startupPolicy is extended to support hard disks besides
> cdrom and floppy. On guest cold bootup, if a certain disk is not accessible
> or its disk chain is broken, with startupPolicy 'optional' the guest will
> drop this disk. 


the above words are related to the disk,not cdrom/floppy

for cdrom/floppy, libvirt.org said:
1.optional: drop if **missing** at any start attempt 
2.startupPolicy is not valid for "volume" disk unless the specified storage volume is of "file" type

for our scenario,the volume is not missing,it exists. But it is invalid for startupPolicy. So I think ,when starting the guest, outputting some error message is more appropriate 

however,if as you said,the test scenario(guest can be started up with startupPolicy='optional') is expected,the dumpxml info is not right:

1.start the guest with the following xml:
 <disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <auth username='redhat'>
        <secret type='iscsi' usage='libvirtiscsi'/>
      </auth>
      <source pool='iscsipool' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

[root@localhost ~]# virsh start test
Domain test started

2.check the dumpxml info:
[root@localhost ~]# virsh dumpxml test
...
 <disk type='volume' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsipool' volume='unit:0:0:0' mode='direct'/>   --> the source is not dropped
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...
Comment 12 Xuesong Zhang 2017-06-22 04:08:07 EDT
Change back to Assign per above comment 11, and move to RHEL7.5. Please don't hesitate to move it back to ON_QA if it works as design or prefer to fix it in a new bug. Thanks.
Comment 14 Michal Privoznik 2017-08-14 10:35:28 EDT
Patch is upstream for a while:

commit 462c4b66fa70a93a548c4ad4a1103ac9a32b9faf
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Fri Mar 31 15:59:54 2017 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon Apr 3 08:35:57 2017 +0200

    Introduce and use virDomainDiskEmptySource
    
    Currently, if we want to zero out disk source (e,g, due to
    startupPolicy when starting up a domain) we use
    virDomainDiskSetSource(disk, NULL). This works well for file
    based storage (storage type file, dir, or block). But it doesn't
    work at all for other types like volume and network.
    
    So imagine that you have a domain that has a CDROM configured
    which source is a volume from an inactive pool. Because it is
    startupPolicy='optional', the CDROM is empty when the domain
    starts. However, the source element is not cleared out in the
    status XML and thus when the daemon restarts and tries to
    reconnect to the domain it refreshes the disks (which fails - the
    storage pool is still not running) and thus the domain is killed.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>


This should be part of the 3.3.0 release.

Note You need to log in before you can comment on or make changes to this bug.