Bug 874860

Summary: libvirt fails to start if storage pool contains image with missing backing file
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: acathrow, dallan, dyasny, dyuan, eblake, honzhang, mzhan, pkrempa, rwu, zpeng
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.2-9.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:26:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 772088    
Bug Blocks: 881827    

Description Eric Blake 2012-11-08 23:13:56 UTC
Description of problem:
The fix for bug 772088 introduced a regression where libvirt can fail to start if it cannot find the backing file of an image in a storage pool

Version-Release number of selected component (if applicable):
v0.10.2-7.el6

How reproducible:
100%

Steps to Reproduce:
1. Create a qcow2 image with a named backing file, within a storage pool of libvirt
2. remove the backing file
3. start libvirt (or do virsh pool-refresh if libvirt is already started)
  
Actual results:
     2012-11-07 12:43:33.279+0000: 22175: info : libvirt version: 1.0.0
     2012-11-07 12:43:33.279+0000: 22175: error : absolutePathFromBaseFile:542 : Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory
     2012-11-07 12:43:33.280+0000: 22175: error : storageDriverAutostart:115 : Failed to autostart storage pool 'default': Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory

Expected results:
Libvirt should still use the pool, and just treat the broken image as though it had no backing file after all.

Additional info:
Fixed by upstream commit:
commit e0c469e58b93f852a72265919703cb6abd3779f8
Author: Philipp Hahn <hahn>
Date:   Wed Nov 7 14:53:49 2012 +0100

    storage: fix broken backing chain
    
    82507838 refactored the code to keep both the raw and canonicalized form
    of the backingStore, which breaks badly when the storage pool contains a
    storage volume, which is missing its backing store file:
     # ./daemon/libvirtd -l
     2012-11-07 12:43:33.279+0000: 22175: info : libvirt version: 1.0.0
     2012-11-07 12:43:33.279+0000: 22175: error : absolutePathFromBaseFile:542 : Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory
     2012-11-07 12:43:33.280+0000: 22175: error : storageDriverAutostart:115 : Failed to autostart storage pool 'default': Can't canonicalize path '/var/lib/libvirt/images/base.qcow2': No such file or directory
    
    This is because virStorageFileGetMetadataFromBuf() aborts with -1 if the
    filename of the backingStore can not be canonicalized:
     #0  absolutePathFromBaseFile () at util/storage_file.c:541
     #1  virStorageFileGetMetadataFromBuf () at util/storage_file.c:728
     #2  virStorageFileGetMetadataFromFD () at util/storage_file.c:932
     #3  virStorageBackendProbeTarget () at storage/storage_backend_fs.c:94
     #4  virStorageBackendFileSystemRefresh () at storage/storage_backend_fs.c:849
     #5  storagePoolStart () at storage/storage_driver.c:700
     #6  virStoragePoolCreate () at libvirt.c:12471
     ...
    
    Treat files which miss their backing file as standalone files.
    
    Signed-off-by: Philipp Hahn <hahn>

Comment 2 Peter Krempa 2012-11-12 23:30:52 UTC
*** Bug 871701 has been marked as a duplicate of this bug. ***

Comment 5 zhe peng 2012-11-21 03:17:59 UTC
I can reproduce this with libvirt-0.10.2-8.el6
will get out of memory error

test this on libvirt-0.10.2-9.el6

step:
 1: create image with backing file
 #qemu-img create -f qcow2 base.img 100M
 #qemu-img create -f qcow2 -b base.img leaf.img 
  check leaf.img
 #qemu-img info leaf.img
image: leaf.img
file format: qcow2
virtual size: 256K (262144 bytes)
disk size: 136K
cluster_size: 65536
backing file: base.img

  2: remove base.img and restart libvirtd, check log
 2012-11-21 17:09:23.868+0000: 16379: info : libvirt version: 0.10.2, package: 9.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2012-11-19-08:43:39, x86-001.build.bos.redhat.com)
2012-11-21 17:09:23.868+0000: 16379: error : absolutePathFromBaseFile:560 : Can't canonicalize path 'base.img': No such file or directory

  3: add this img to a guest
......
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/leaf.img'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
.....

   4: can't start the guest , error msg:
  error: Failed to start domain rhel6.4
  error: Unable to allow access for disk path (null): Bad address

 Hi Peter:
   Is this right? if the img miss backing file as standalone files,i thought it can be used in guest as normal. Please help confirm this, thanks in advance.

Comment 6 Peter Krempa 2012-11-21 10:13:14 UTC
The image cannot be used standalone when you specify that it's a qcow2 image. Libvirt is setting permissions and labels on the complete image chain, so when an image is missing the image chain cannot be brought up.

On the other hand, we aren't checking for NULL pointers in some places and we're lucky that stat isn't upset about dereferencing NULL pointers. (that's the reason for the Bad address error message). But this is a separate issue.

Comment 7 zhe peng 2012-11-21 11:45:42 UTC
Thanks Peter's help, per comment 5, move to verified.

Comment 8 errata-xmlrpc 2013-02-21 07:26:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html