Bug 1316370
Summary: | libvirtd crash when guest has a authentication volume cdrom with startupPolicy. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pzhang> |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
Status: | CLOSED ERRATA | QA Contact: | lijuan men <lmen> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | dyuan, jferlan, lmen, mprivozn, mzhan, rbalakri, xuzhang |
Target Milestone: | rc | Keywords: | Upstream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-3.7.0-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 10:36:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1401400, 1473046 |
Description
Pei Zhang
2016-03-10 06:11:24 UTC
I've just pushed the fix usptream: commit ca5d51df27567ef8d77c126815d01c484deb359f Author: Michal Privoznik <mprivozn> AuthorDate: Tue Jun 28 14:44:57 2016 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Tue Jun 28 15:02:16 2016 +0200 virStorageTranslateDiskSourcePool: Avoid double free https://bugzilla.redhat.com/show_bug.cgi?id=1316370 Consider the following disk for a domain: <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <auth username='libvirt'> <secret type='iscsi' usage='libvirtiscsi'/> </auth> <source pool='iscsi-secret-pool' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/> <target dev='sda' bus='scsi'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> Now, startupPolicy is currently not allowed for iscsi disks, so one would expect an error message to be thrown. But what a surprise is waiting for users if they try to start up such domain: ==15724== Invalid free() / delete / delete[] / realloc() ==15724== at 0x4C2B1F0: free (vg_replace_malloc.c:473) ==15724== by 0x54B7A69: virFree (viralloc.c:582) ==15724== by 0x552DC90: virStorageAuthDefFree (virstoragefile.c:1549) ==15724== by 0x552F023: virStorageSourceClear (virstoragefile.c:2055) ==15724== by 0x552F054: virStorageSourceFree (virstoragefile.c:2067) ==15724== by 0x55556AA: virDomainDiskDefFree (domain_conf.c:1562) ==15724== by 0x5557ABE: virDomainDefFree (domain_conf.c:2547) ==15724== by 0x1B43CC42: qemuProcessStop (qemu_process.c:5918) ==15724== by 0x1B43BA2E: qemuProcessStart (qemu_process.c:5511) ==15724== by 0x1B48993E: qemuDomainObjStart (qemu_driver.c:7050) ==15724== by 0x1B489B9A: qemuDomainCreateWithFlags (qemu_driver.c:7104) ==15724== by 0x1B489C01: qemuDomainCreate (qemu_driver.c:7122) ==15724== Address 0x21cfbb90 is 0 bytes inside a block of size 48 free'd ==15724== at 0x4C2B1F0: free (vg_replace_malloc.c:473) ==15724== by 0x54B7A69: virFree (viralloc.c:582) ==15724== by 0x552DC90: virStorageAuthDefFree (virstoragefile.c:1549) ==15724== by 0x12D1C8D4: virStorageTranslateDiskSourcePool (storage_driver.c:3475) ==15724== by 0x1B4396E4: qemuProcessPrepareDomain (qemu_process.c:4896) ==15724== by 0x1B43B880: qemuProcessStart (qemu_process.c:5466) ==15724== by 0x1B48993E: qemuDomainObjStart (qemu_driver.c:7050) ==15724== by 0x1B489B9A: qemuDomainCreateWithFlags (qemu_driver.c:7104) ==15724== by 0x1B489C01: qemuDomainCreate (qemu_driver.c:7122) ==15724== by 0x561CA97: virDomainCreate (libvirt-domain.c:6787) ==15724== by 0x12B6FD: remoteDispatchDomainCreate (remote_dispatch.h:4116) ==15724== by 0x12B61A: remoteDispatchDomainCreateHelper (remote_dispatch.h:4092) The problem is, in virStorageTranslateDiskSourcePool disk def->src->auth is freed, but the pointer is not set to NULL. So later, when qemuProcessStop starts to free the domain definition, virStorageAuthDefFree() tries to free the memory again, instead of jumping out immediately. Signed-off-by: Michal Privoznik <mprivozn> v2.0.0-rc1-40-gca5d51d verify the bug version: libvirt-2.0.0-2.el7.x86_64 qemu-kvm-rhev-2.6.0-13.el7.x86_64 kernel-3.10.0-470.el7.x86_64 steps: 1.Prepare a authentication iscsi pool with an iso volume 2.define a guest with <auth> in disk xml like following #virsh dumpxml bios|grep disk -A 9 <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <auth username='rhat'> <secret type='iscsi' usage='libvirtiscsi'/> </auth> <source pool='iscsi' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/> <target dev='sda' bus='scsi'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> 3.start the guest [root@localhost ~]# virsh start bios error: Failed to start domain bios error: XML error: 'startupPolicy' is only valid for 'file' type volume Now the result is not the same as comment4. summary: if startupPolicy='requisite'/'mandatory' is used with 'block' type volume,the guest can not be booted up. if startupPolicy='optional' is used with 'block' type volume,the guest will boot up successfully. test version: libvirt-2.0.0-10.el7.x86_64 qemu-kvm-rhev-2.6.0-26.el7.x86_64 steps: scenario1:use startupPolicy='mandatory' with 'block' type volume 1.Prepare a authentication iscsi pool with an iso volume 2.define a guest with xml like following <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <source pool='iscsi' volume='unit:0:0:0' mode='direct' ***startupPolicy='mandatory'***/> <target dev='sda' bus='scsi'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> [root@intel-e5530-8-1 ~]# virsh start r73 error: Failed to start domain r73 error: XML error: 'startupPolicy' is only valid for 'file' type volume scenario2:use startupPolicy='optional' with 'block' type volume 1.define a guest with xml like following <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> *** <auth username='redhat'> *** *** <secret type='iscsi' usage='libvirtiscsi'/> *** *** </auth> *** <source pool='iscsi' volume='unit:0:0:0' mode='direct' ***startupPolicy='optional' ***/> <target dev='sda' bus='scsi'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> [root@intel-e5530-8-1 ~]# virsh start r73 --->boot up successfully Domain r73 started [root@intel-e5530-8-1 ~]# virsh dumpxml r73 | grep disk -A 9 <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> ---->there is no <auth>...</auth> info,is it normal? <source pool='iscsi' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/> <backingStore/> <target dev='sda' bus='scsi'/> <readonly/> <alias name='scsi0-0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> I think the scenario2 is not right. Will I file a new bug to track it? or continue to track it in this bug? Change back to Assign per above comment 5, and move to RHEL7.4. Please don't hesitate to move it back to ON_QA if it works as design or prefer to fix it in a new bug. Thanks. (In reply to lijuan men from comment #5) > Now the result is not the same as comment4. > > summary: > if startupPolicy='requisite'/'mandatory' is used with 'block' type > volume,the guest can not be booted up. > if startupPolicy='optional' is used with 'block' type volume,the guest will > boot up successfully. > > > test version: > libvirt-2.0.0-10.el7.x86_64 > qemu-kvm-rhev-2.6.0-26.el7.x86_64 > > steps: > > scenario1:use startupPolicy='mandatory' with 'block' type volume > > 1.Prepare a authentication iscsi pool with an iso volume > > 2.define a guest with xml like following > <disk type='volume' device='cdrom'> > <driver name='qemu' type='raw'/> > <source pool='iscsi' volume='unit:0:0:0' mode='direct' > ***startupPolicy='mandatory'***/> > <target dev='sda' bus='scsi'/> > <readonly/> > <address type='drive' controller='0' bus='0' target='0' unit='0'/> > </disk> > > [root@intel-e5530-8-1 ~]# virsh start r73 > error: Failed to start domain r73 > error: XML error: 'startupPolicy' is only valid for 'file' type volume This is expected. Our documentation says that startupPolicy is valid only for file based disks: (NB, <code>startupPolicy</code> is not valid for "volume" disk unless the specified storage volume is of "file" type). > > > scenario2:use startupPolicy='optional' with 'block' type volume > > 1.define a guest with xml like following > <disk type='volume' device='cdrom'> > <driver name='qemu' type='raw'/> > *** <auth username='redhat'> *** > *** <secret type='iscsi' usage='libvirtiscsi'/> *** > *** </auth> *** > <source pool='iscsi' volume='unit:0:0:0' mode='direct' > ***startupPolicy='optional' ***/> > <target dev='sda' bus='scsi'/> > <readonly/> > <address type='drive' controller='0' bus='0' target='0' unit='0'/> > </disk> Yeah, this should report an error to match our documentation. > > [root@intel-e5530-8-1 ~]# virsh start r73 --->boot up successfully > Domain r73 started > > [root@intel-e5530-8-1 ~]# virsh dumpxml r73 | grep disk -A 9 > > <disk type='volume' device='cdrom'> > <driver name='qemu' type='raw'/> ---->there is no > <auth>...</auth> info,is it normal? Yes. You need to use 'virsh dumpxml --security-info' to display security sensitive info. > <source pool='iscsi' volume='unit:0:0:0' mode='direct' > startupPolicy='optional'/> > <backingStore/> > <target dev='sda' bus='scsi'/> > <readonly/> > <alias name='scsi0-0-0-0'/> > <address type='drive' controller='0' bus='0' target='0' unit='0'/> > </disk> > > I think the scenario2 is not right. Will I file a new bug to track it? or > continue to track it in this bug? Yeah, we can track it here. On a second thought, our documentation also states that: Since 1.1.2 the startupPolicy is extended to support hard disks besides cdrom and floppy. On guest cold bootup, if a certain disk is not accessible or its disk chain is broken, with startupPolicy 'optional' the guest will drop this disk. Therefore I don't think there is something to fix, is there? (In reply to Michal Privoznik from comment #9) > On a second thought, our documentation also states that: > > Since 1.1.2 the startupPolicy is extended to support hard disks besides > cdrom and floppy. On guest cold bootup, if a certain disk is not accessible > or its disk chain is broken, with startupPolicy 'optional' the guest will > drop this disk. the above words are related to the disk,not cdrom/floppy for cdrom/floppy, libvirt.org said: 1.optional: drop if **missing** at any start attempt 2.startupPolicy is not valid for "volume" disk unless the specified storage volume is of "file" type for our scenario,the volume is not missing,it exists. But it is invalid for startupPolicy. So I think ,when starting the guest, outputting some error message is more appropriate however,if as you said,the test scenario(guest can be started up with startupPolicy='optional') is expected,the dumpxml info is not right: 1.start the guest with the following xml: <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <auth username='redhat'> <secret type='iscsi' usage='libvirtiscsi'/> </auth> <source pool='iscsipool' volume='unit:0:0:0' mode='direct' startupPolicy='optional'/> <target dev='sda' bus='scsi'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> [root@localhost ~]# virsh start test Domain test started 2.check the dumpxml info: [root@localhost ~]# virsh dumpxml test ... <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <source pool='iscsipool' volume='unit:0:0:0' mode='direct'/> --> the source is not dropped <backingStore/> <target dev='sda' bus='scsi'/> <readonly/> <alias name='scsi0-0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> ... Change back to Assign per above comment 11, and move to RHEL7.5. Please don't hesitate to move it back to ON_QA if it works as design or prefer to fix it in a new bug. Thanks. Patch is upstream for a while: commit 462c4b66fa70a93a548c4ad4a1103ac9a32b9faf Author: Michal Privoznik <mprivozn> AuthorDate: Fri Mar 31 15:59:54 2017 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon Apr 3 08:35:57 2017 +0200 Introduce and use virDomainDiskEmptySource Currently, if we want to zero out disk source (e,g, due to startupPolicy when starting up a domain) we use virDomainDiskSetSource(disk, NULL). This works well for file based storage (storage type file, dir, or block). But it doesn't work at all for other types like volume and network. So imagine that you have a domain that has a CDROM configured which source is a volume from an inactive pool. Because it is startupPolicy='optional', the CDROM is empty when the domain starts. However, the source element is not cleared out in the status XML and thus when the daemon restarts and tries to reconnect to the domain it refreshes the disks (which fails - the storage pool is still not running) and thus the domain is killed. Signed-off-by: Michal Privoznik <mprivozn> This should be part of the 3.3.0 release. Is the following result expected? Summary: Start a guest which has a authentication volume cdrom with startupPolicy(optional),the source element will be dropped,and the type='volume' will also be updated as type='file'. detailed info: version: libvirt-3.9.0-1.el7.x86_64 qemu-kvm-rhev-2.10.0-5.el7.x86_64 steps: 1.Prepare a authentication iscsi pool with an iso volume 2.start a guest with the xml: <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <auth type='chap' username='redhat1'> -->with auth info <secret type='iscsi' usage='libvirtiscsi'/> </auth> <source pool='iscsi1' volume='unit:0:0:0' startupPolicy=**'optional'**/> <target dev='sdc' bus='scsi'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> [root@localhost ~]# virsh start test Domain test started 3.check the xml about the cdrom [root@localhost ~]# virsh dumpxml test | grep cdrom -A 8 <disk type='file' device='cdrom'> -->file type now <target dev='sdc' bus='scsi'/> -->source info is dropped <readonly/> <alias name='scsi0-0-0-2'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> 4.start the guest with the xml: <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> -->no auth info <source pool='iscsi1' volume='unit:0:0:0' startupPolicy=**'optional'**/> <target dev='sdc' bus='scsi'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> [root@localhost ~]# virsh destroy test;virsh start test Domain test destroyed Domain test started 5.check the xml about the cdrom [root@localhost ~]# virsh dumpxml test | grep cdrom -A 8 <disk type='file' device='cdrom'> -->file type now <target dev='sdc' bus='scsi'/> -->source info is dropped <readonly/> <alias name='scsi0-0-0-2'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> (In reply to lijuan men from comment #16) > Is the following result expected? Yes. (In reply to Michal Privoznik from comment #17) > (In reply to lijuan men from comment #16) > > Is the following result expected? > > Yes. thanks for your reply I also test other scenarios,verify the bug the other scenarios: 1.use iscsi pool without authentication 1)Start a guest which has a volume cdrom with startupPolicy(mandatory),start failed 2)Start a guest which has a volume cdrom with startupPolicy(optional),start successfully,the source element will be dropped,and the type='volume' will also be updated as type='file'. 2.use iscsi pool with authentication 1)Start a guest which has a volume cdrom without startupPolicy,start successfully 2)Start a guest with/without auth element ,which has a volume cdrom with startupPolicy(mandatory),start failed 3)Start a guest with/without auth element ,which has a volume cdrom with startupPolicy(requisite),start failed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |