Bug 1333627

Summary: Growing backing file length in qcow2 header causes 'Backing file name too long' error.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: vdsmAssignee: Adam Litke <alitke>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.5CC: alitke, bazulay, eblake, gklein, gveitmic, gwatson, inetkach, jsuchane, lsurette, mkalinin, nsoffer, pkrempa, ratamir, srevivo, tnisan, trichard, ycui, ykaul, ylavi
Target Milestone: ovirt-4.0.0-betaKeywords: ZStream
Target Release: 4.0.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, when creating snapshots the backing volume reference included an image directory component. Each time a snapshot was merged, the image directory component of the backing volume reference was duplicated until the backing volume reference became too long. In this state the image could not be used. Now, since all volumes of an image always reside in the same directory the image directory component has been removed from new backing volume references. Now the backing volume references do not grow in length after snapshots are merged.
Story Points: ---
Clone Of:
: 1336367 (view as bug list) Environment:
Last Closed: 2016-08-23 20:15:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1336367    
Attachments:
Description Flags
Test case none

Description Germano Veit Michel 2016-05-06 04:20:58 UTC
Description of problem:

Certain actions can result in the qcow2 backing file metadata growing and exceeding 1023 or 4095 character limit. When this happens qemu is unable to open the file (BZ 1321744) and all operations with the image fail. 

By the way, this is NOT the resulting path after concatenating the strings. It's the exact string in the qcow2 header under certain conditions.

Is there any reason to have this, as the backing file path in the qcow2 header?
../<DISK ID>/../<DISK ID>/../<DISK ID>/../<DISK ID>/<PARENT ID> ?

Why not a simple?
../<DISK ID>/<PARENT ID>

Can't we do something on this side to prevent that? There will always be a limit (1023, 4095 or something else), so this condition can always be reached and is not desirable.

Version-Release number of selected component (if applicable):
qemu-img-rhev-2.3.0-31.el7_2.10.x86_64
vdsm-4.17.26-0.el7ev.noarch
libvirt-daemon-1.2.17-13.el7_2.4.x86_64

How reproducible:
100%, but I am quite sure there are other flows that reach the same condition. 

Steps to Reproduce:
1. Create a VM with 1 Disk, Raw
2. Snapshot it 3 times
   (Base) [A] <- [B] <- [C] <- [D] (Leaf)
3. Remove B *

* actually C is merged into B and C is removed.

Actual results:
QCOW2 Backing File QCOW2 Metadata for D:
../<DISK ID>/../<DISK ID>/B

Expected results:
QCOW2 Backing File Metadata for D:
../<DISK ID>/B

Additional info:
Every time this happens, the backing file string length gets a big longer. Until BZ 1321744 is reached and VM will fails to start/migrate/snapshot... And there is no way to fix it without using a patched qemu-img.

Real World Example:

[717cb382-5a8b-4bc4-9447-19b84978d755] {Base} 
<- [cb234ed0-df1c-453c-b70f-396837a679f2] {S1 - Snapshot to be Removed}
<- [bbca024b-b9a5-4e10-a7ea-280d04329ca3] {S2}
<- [d1a23fbd-9fb2-4a6b-91b8-974f65b50907] {Leaf}

Resulting Chain:

[717cb382-5a8b-4bc4-9447-19b84978d755] {Base} 
<- [cb234ed0-df1c-453c-b70f-396837a679f2] {S2}
<- [d1a23fbd-9fb2-4a6b-91b8-974f65b50907] {Leaf}

d1a23fdb now contains '../DISK/../DISK/cb234ed0'

I would expect it to contain '../DISK/cb234ed0'

Before Removal:

# su vdsm -s /bin/sh -c 'qemu-img info --backing-chain d1a23fbd-9fb2-4a6b-91b8-974f65b50907'
image: d1a23fbd-9fb2-4a6b-91b8-974f65b50907
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 4.6M
cluster_size: 65536
backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/bbca024b-b9a5-4e10-a7ea-280d04329ca3
backing file format: qcow2
Format specific information:
    compat: 0.10
    refcount bits: 16

image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/bbca024b-b9a5-4e10-a7ea-280d04329ca3
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 1.2M
cluster_size: 65536
backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2 (actual path: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2)
backing file format: qcow2
Format specific information:
    compat: 0.10
    refcount bits: 16

image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 1.0M
cluster_size: 65536
backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755 (actual path: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755)
backing file format: raw
Format specific information:
    compat: 0.10
    refcount bits: 16

image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 1.4G




After Removal:

# su vdsm -s /bin/sh -c 'qemu-img info --backing-chain d1a23fbd-9fb2-4a6b-91b8-974f65b50907'
image: d1a23fbd-9fb2-4a6b-91b8-974f65b50907
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 4.6M
cluster_size: 65536
backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2
backing file format: qcow2
Format specific information:
    compat: 0.10
    refcount bits: 16

image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 1.9M
cluster_size: 65536
backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755 (actual path: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755)
backing file format: raw
Format specific information:
    compat: 0.10
    refcount bits: 16

image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 1.4G

Comment 1 Germano Veit Michel 2016-05-06 04:38:32 UTC
See also: BZ 1082630

Comment 2 Germano Veit Michel 2016-05-06 04:39:40 UTC
See also: BZ 1321744 (already linked in description)

Comment 3 Adam Litke 2016-05-06 20:53:42 UTC
Created attachment 1154739 [details]
Test case

This test case illustrates the problem by creating and merging many snapshots until the backing file field in the qcow volume overflows.  Restarting the VM seems to resolve the problem (at least in this test case).

Comment 4 Adam Litke 2016-05-06 20:54:40 UTC
Does restarting the VM fix the vm in the customer environment?  (I realize this isn't a permanent fix.)

Comment 5 Germano Veit Michel 2016-05-08 23:00:58 UTC
Hmm. Well, all the cases I have seen until know were running scripts (create new snapshot, remove older one) running daily. When they noticed it was not working there was already a bunch of failed creations and removals on top of the overflows. The chain in these cases was in a quite ugly shape (qcow != storage domain != database) so it's hard to say if restarting the VM in the first place could have fixed it in the first place.

Comment 6 Germano Veit Michel 2016-05-08 23:35:40 UTC
I just did a quick test in my env. It does not look like it made any difference as the backing file field remains the same.

Adam, by fixing it, do you mean the backing file field goes back to the initial length?

----%<------
Steps to Reproduce:
1. Create a VM with 1 Disk, Raw
2. Snapshot it 3 times
   (Base) [A] <- [B] <- [C] <- [D] (Leaf)
3. Remove B
---->%------

D info after B Removal:

backing file: ../3fa44096-c261-435f-8e96-b89a47f05e09/../3fa44096-c261-435f-8e96-b89a47f05e09/62eae983-9b49-4cae-b743-37ad2de7d21b

D info after Shutdown:

backing file: ../3fa44096-c261-435f-8e96-b89a47f05e09/../3fa44096-c261-435f-8e96-b89a47f05e09/62eae983-9b49-4cae-b743-37ad2de7d21b

D info after Startup:

backing file: ../3fa44096-c261-435f-8e96-b89a47f05e09/../3fa44096-c261-435f-8e96-b89a47f05e09/62eae983-9b49-4cae-b743-37ad2de7d21b

Comment 7 Adam Litke 2016-05-09 13:20:13 UTC
Thanks for the info.  Until we can get a fix for this in qemu I could probably write a script to fix the qcow chain using 'qemu-img rebase -u'.  The main issue with this is it can only be run when a VM is off and it won't work if the chain has already gotten too long.

If it makes sense, we could run this process automatically in vdsm when preparing an image to run a VM.

To avoid allowing the chain getting too long we could add a check in vdsm and not allow any merge operations to be run on the vm once the backing chain field gets close to the known limit.  At that time we can return an error that the VM must be powered off and back on again before any additional merge operations will be allowed.

I wish we could just have a real fix in qemu...

Comment 8 Peter Krempa 2016-05-09 14:59:38 UTC
(In reply to Adam Litke from comment #7)
> I wish we could just have a real fix in qemu...

The growth of the length of the relative backing file path is not really caused by qemu itself (at least when used via libvirt). Qemu can just work the issue around by allowing even longer strings but the string will still grow.

The problem is caused by these factors:

1) oVirt always puts the directory of the image into the relative path while doing a snapshot. As seen above, the directory name is present multiple times in the path which indicates that the files are actually all in the same directory.

2) When doing a block commit qemu requires libvirt to pass the actual relative path between the images. Libvirt does this by lookin whether all images have only relative links and then removes the last component of the path and concatenates it with the next image link in chain. This is done to keep the actual relative relationship between the images as qemu would re-create it. Also the qemu tools are doing the same approach.

3) Symbolic links. If a component of a path leads to a symbolic link the link is expanded in the place. The next "../" is then taken from the point where the link leads to. This results into the fact that /A/../A/B may not be simplified into /A/B as /A is a symlink to C/D/E/F the actual resolved path is /C/D/E/A/B

4) NFS and root squashing + other storage technologies. It's not trivial to check that a file is not a symlink in some cases.

So a simple solution would be for oVirt to stop using silly paths with the directory in the relative backing file link especially since they are in the same directory.

The harder one would be to check if a backing store component is a symlink or not and drop it if not, but that still may not be possible in some cases. (e.g. spawning a process with correct uid/gid for every link component to check on NFS is not really desirable).

Comment 9 Nir Soffer 2016-05-09 19:38:35 UTC
(In reply to Peter Krempa from comment #8)
Stopping using silly relative paths would be the best solution.

One issue is that on block storage, the base image is not always in the same directory. On file storage, we create a hardlink for template volumes so the
base image is always in the same directory as the other volumes.

In block storage, there are no directories - all the volumes are just lvs, but
to keep uniform paths for file and block disks, we create a directory for 
each image in the host where the image is located, and create a symlink to 
the lv (/dev/vgname/lvname) in the directory.

To remove the ../image-uuid/ part of the relative path, we need to create
additional symlink for templates in the temporary directories for prepared
block based images, so the relative path to the base image is always at
the same directory.

Comment 10 Adam Litke 2016-05-09 21:06:56 UTC
Possible simple solution to avoid backing chain growth posted for review.

Comment 11 Adam Litke 2016-05-11 13:28:05 UTC
(In reply to Nir Soffer from comment #9)
> (In reply to Peter Krempa from comment #8)
> Stopping using silly relative paths would be the best solution.
> 
> One issue is that on block storage, the base image is not always in the same
> directory. On file storage, we create a hardlink for template volumes so the
> base image is always in the same directory as the other volumes.
> 
> In block storage, there are no directories - all the volumes are just lvs,
> but
> to keep uniform paths for file and block disks, we create a directory for 
> each image in the host where the image is located, and create a symlink to 
> the lv (/dev/vgname/lvname) in the directory.
> 
> To remove the ../image-uuid/ part of the relative path, we need to create
> additional symlink for templates in the temporary directories for prepared
> block based images, so the relative path to the base image is always at
> the same directory.

Actually it seems we already do this.  I am now testing the fix with all the flows I am aware of.  Hopefully we can resolve this as easily as that one-liner patch.

Comment 12 Yaniv Lavi 2016-05-16 06:52:32 UTC
should this be modified?

Comment 13 Germano Veit Michel 2016-05-16 06:59:18 UTC
Adam, I was looking at your patch.

Does it mean we can actually fix (unsafe rebase) customer's overflown chain directly to '<volID>' instead of '../<imgID>/<volID>' ?

Would the same be true for 3.5?

Thanks all for your work on this.

Comment 16 Adam Litke 2016-05-25 13:30:18 UTC
(In reply to Germano Veit Michel from comment #13)
> Adam, I was looking at your patch.
> 
> Does it mean we can actually fix (unsafe rebase) customer's overflown chain
> directly to '<volID>' instead of '../<imgID>/<volID>' ?

Yes, in 3.6 and beyond this is safe to do.

> 
> Would the same be true for 3.5?

I have not tested this on 3.5 so would not be comfortable recommending it there without more research.

> 
> Thanks all for your work on this.

Comment 20 Kevin Alon Goldblatt 2016-07-28 12:25:13 UTC
Tested with the following code:
-------------------------------------
rhevm-4.0.2-0.1.rc.el7ev.noarch
vdsm-4.18.8-1.el7ev.x86_64

Verified with the following scenario:
-------------------------------------

Steps to Reproduce:
1. Create a VM with 1 Disk, Raw
2. Snapshot it 3 times
   (Base) [A] <- [B] <- [C] <- [D] (Leaf)
3. Remove B *


Actual results:
Both cold merge and live merge of snapshots do not result in the qcow2 backing file metadata growing

Examples after cold and then live merge

After Cold Merge
------------------
backing file: 79b08653-9db2-4edf-ab2f-6267549bb76d (actual path: /dev/29c00ce2-d434-43a8-910f-69daf79b833a/79b08653-9db2-4edf-ab2f-6267549bb76d)



After Live Merge
-----------------
backing file: 79b08653-9db2-4edf-ab2f-6267549bb76d (actual path: /dev/29c00ce2-d434-43a8-910f-69daf79b833a/79b08653-9db2-4edf-ab2f-6267549bb76d)

Moving to VERIFIED!

Comment 22 errata-xmlrpc 2016-08-23 20:15:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1671.html