Description of problem: Certain actions can result in the qcow2 backing file metadata growing and exceeding 1023 or 4095 character limit. When this happens qemu is unable to open the file (BZ 1321744) and all operations with the image fail. By the way, this is NOT the resulting path after concatenating the strings. It's the exact string in the qcow2 header under certain conditions. Is there any reason to have this, as the backing file path in the qcow2 header? ../<DISK ID>/../<DISK ID>/../<DISK ID>/../<DISK ID>/<PARENT ID> ? Why not a simple? ../<DISK ID>/<PARENT ID> Can't we do something on this side to prevent that? There will always be a limit (1023, 4095 or something else), so this condition can always be reached and is not desirable. Version-Release number of selected component (if applicable): qemu-img-rhev-2.3.0-31.el7_2.10.x86_64 vdsm-4.17.26-0.el7ev.noarch libvirt-daemon-1.2.17-13.el7_2.4.x86_64 How reproducible: 100%, but I am quite sure there are other flows that reach the same condition. Steps to Reproduce: 1. Create a VM with 1 Disk, Raw 2. Snapshot it 3 times (Base) [A] <- [B] <- [C] <- [D] (Leaf) 3. Remove B * * actually C is merged into B and C is removed. Actual results: QCOW2 Backing File QCOW2 Metadata for D: ../<DISK ID>/../<DISK ID>/B Expected results: QCOW2 Backing File Metadata for D: ../<DISK ID>/B Additional info: Every time this happens, the backing file string length gets a big longer. Until BZ 1321744 is reached and VM will fails to start/migrate/snapshot... And there is no way to fix it without using a patched qemu-img. Real World Example: [717cb382-5a8b-4bc4-9447-19b84978d755] {Base} <- [cb234ed0-df1c-453c-b70f-396837a679f2] {S1 - Snapshot to be Removed} <- [bbca024b-b9a5-4e10-a7ea-280d04329ca3] {S2} <- [d1a23fbd-9fb2-4a6b-91b8-974f65b50907] {Leaf} Resulting Chain: [717cb382-5a8b-4bc4-9447-19b84978d755] {Base} <- [cb234ed0-df1c-453c-b70f-396837a679f2] {S2} <- [d1a23fbd-9fb2-4a6b-91b8-974f65b50907] {Leaf} d1a23fdb now contains '../DISK/../DISK/cb234ed0' I would expect it to contain '../DISK/cb234ed0' Before Removal: # su vdsm -s /bin/sh -c 'qemu-img info --backing-chain d1a23fbd-9fb2-4a6b-91b8-974f65b50907' image: d1a23fbd-9fb2-4a6b-91b8-974f65b50907 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 4.6M cluster_size: 65536 backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/bbca024b-b9a5-4e10-a7ea-280d04329ca3 backing file format: qcow2 Format specific information: compat: 0.10 refcount bits: 16 image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/bbca024b-b9a5-4e10-a7ea-280d04329ca3 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 1.2M cluster_size: 65536 backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2 (actual path: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2) backing file format: qcow2 Format specific information: compat: 0.10 refcount bits: 16 image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 1.0M cluster_size: 65536 backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755 (actual path: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755) backing file format: raw Format specific information: compat: 0.10 refcount bits: 16 image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755 file format: raw virtual size: 10G (10737418240 bytes) disk size: 1.4G After Removal: # su vdsm -s /bin/sh -c 'qemu-img info --backing-chain d1a23fbd-9fb2-4a6b-91b8-974f65b50907' image: d1a23fbd-9fb2-4a6b-91b8-974f65b50907 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 4.6M cluster_size: 65536 backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2 backing file format: qcow2 Format specific information: compat: 0.10 refcount bits: 16 image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/cb234ed0-df1c-453c-b70f-396837a679f2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 1.9M cluster_size: 65536 backing file: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755 (actual path: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755) backing file format: raw Format specific information: compat: 0.10 refcount bits: 16 image: ../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/../a5a7b896-4003-4971-ad6f-f34c5a708df4/717cb382-5a8b-4bc4-9447-19b84978d755 file format: raw virtual size: 10G (10737418240 bytes) disk size: 1.4G
See also: BZ 1082630
See also: BZ 1321744 (already linked in description)
Created attachment 1154739 [details] Test case This test case illustrates the problem by creating and merging many snapshots until the backing file field in the qcow volume overflows. Restarting the VM seems to resolve the problem (at least in this test case).
Does restarting the VM fix the vm in the customer environment? (I realize this isn't a permanent fix.)
Hmm. Well, all the cases I have seen until know were running scripts (create new snapshot, remove older one) running daily. When they noticed it was not working there was already a bunch of failed creations and removals on top of the overflows. The chain in these cases was in a quite ugly shape (qcow != storage domain != database) so it's hard to say if restarting the VM in the first place could have fixed it in the first place.
I just did a quick test in my env. It does not look like it made any difference as the backing file field remains the same. Adam, by fixing it, do you mean the backing file field goes back to the initial length? ----%<------ Steps to Reproduce: 1. Create a VM with 1 Disk, Raw 2. Snapshot it 3 times (Base) [A] <- [B] <- [C] <- [D] (Leaf) 3. Remove B ---->%------ D info after B Removal: backing file: ../3fa44096-c261-435f-8e96-b89a47f05e09/../3fa44096-c261-435f-8e96-b89a47f05e09/62eae983-9b49-4cae-b743-37ad2de7d21b D info after Shutdown: backing file: ../3fa44096-c261-435f-8e96-b89a47f05e09/../3fa44096-c261-435f-8e96-b89a47f05e09/62eae983-9b49-4cae-b743-37ad2de7d21b D info after Startup: backing file: ../3fa44096-c261-435f-8e96-b89a47f05e09/../3fa44096-c261-435f-8e96-b89a47f05e09/62eae983-9b49-4cae-b743-37ad2de7d21b
Thanks for the info. Until we can get a fix for this in qemu I could probably write a script to fix the qcow chain using 'qemu-img rebase -u'. The main issue with this is it can only be run when a VM is off and it won't work if the chain has already gotten too long. If it makes sense, we could run this process automatically in vdsm when preparing an image to run a VM. To avoid allowing the chain getting too long we could add a check in vdsm and not allow any merge operations to be run on the vm once the backing chain field gets close to the known limit. At that time we can return an error that the VM must be powered off and back on again before any additional merge operations will be allowed. I wish we could just have a real fix in qemu...
(In reply to Adam Litke from comment #7) > I wish we could just have a real fix in qemu... The growth of the length of the relative backing file path is not really caused by qemu itself (at least when used via libvirt). Qemu can just work the issue around by allowing even longer strings but the string will still grow. The problem is caused by these factors: 1) oVirt always puts the directory of the image into the relative path while doing a snapshot. As seen above, the directory name is present multiple times in the path which indicates that the files are actually all in the same directory. 2) When doing a block commit qemu requires libvirt to pass the actual relative path between the images. Libvirt does this by lookin whether all images have only relative links and then removes the last component of the path and concatenates it with the next image link in chain. This is done to keep the actual relative relationship between the images as qemu would re-create it. Also the qemu tools are doing the same approach. 3) Symbolic links. If a component of a path leads to a symbolic link the link is expanded in the place. The next "../" is then taken from the point where the link leads to. This results into the fact that /A/../A/B may not be simplified into /A/B as /A is a symlink to C/D/E/F the actual resolved path is /C/D/E/A/B 4) NFS and root squashing + other storage technologies. It's not trivial to check that a file is not a symlink in some cases. So a simple solution would be for oVirt to stop using silly paths with the directory in the relative backing file link especially since they are in the same directory. The harder one would be to check if a backing store component is a symlink or not and drop it if not, but that still may not be possible in some cases. (e.g. spawning a process with correct uid/gid for every link component to check on NFS is not really desirable).
(In reply to Peter Krempa from comment #8) Stopping using silly relative paths would be the best solution. One issue is that on block storage, the base image is not always in the same directory. On file storage, we create a hardlink for template volumes so the base image is always in the same directory as the other volumes. In block storage, there are no directories - all the volumes are just lvs, but to keep uniform paths for file and block disks, we create a directory for each image in the host where the image is located, and create a symlink to the lv (/dev/vgname/lvname) in the directory. To remove the ../image-uuid/ part of the relative path, we need to create additional symlink for templates in the temporary directories for prepared block based images, so the relative path to the base image is always at the same directory.
Possible simple solution to avoid backing chain growth posted for review.
(In reply to Nir Soffer from comment #9) > (In reply to Peter Krempa from comment #8) > Stopping using silly relative paths would be the best solution. > > One issue is that on block storage, the base image is not always in the same > directory. On file storage, we create a hardlink for template volumes so the > base image is always in the same directory as the other volumes. > > In block storage, there are no directories - all the volumes are just lvs, > but > to keep uniform paths for file and block disks, we create a directory for > each image in the host where the image is located, and create a symlink to > the lv (/dev/vgname/lvname) in the directory. > > To remove the ../image-uuid/ part of the relative path, we need to create > additional symlink for templates in the temporary directories for prepared > block based images, so the relative path to the base image is always at > the same directory. Actually it seems we already do this. I am now testing the fix with all the flows I am aware of. Hopefully we can resolve this as easily as that one-liner patch.
should this be modified?
Adam, I was looking at your patch. Does it mean we can actually fix (unsafe rebase) customer's overflown chain directly to '<volID>' instead of '../<imgID>/<volID>' ? Would the same be true for 3.5? Thanks all for your work on this.
(In reply to Germano Veit Michel from comment #13) > Adam, I was looking at your patch. > > Does it mean we can actually fix (unsafe rebase) customer's overflown chain > directly to '<volID>' instead of '../<imgID>/<volID>' ? Yes, in 3.6 and beyond this is safe to do. > > Would the same be true for 3.5? I have not tested this on 3.5 so would not be comfortable recommending it there without more research. > > Thanks all for your work on this.
Tested with the following code: ------------------------------------- rhevm-4.0.2-0.1.rc.el7ev.noarch vdsm-4.18.8-1.el7ev.x86_64 Verified with the following scenario: ------------------------------------- Steps to Reproduce: 1. Create a VM with 1 Disk, Raw 2. Snapshot it 3 times (Base) [A] <- [B] <- [C] <- [D] (Leaf) 3. Remove B * Actual results: Both cold merge and live merge of snapshots do not result in the qcow2 backing file metadata growing Examples after cold and then live merge After Cold Merge ------------------ backing file: 79b08653-9db2-4edf-ab2f-6267549bb76d (actual path: /dev/29c00ce2-d434-43a8-910f-69daf79b833a/79b08653-9db2-4edf-ab2f-6267549bb76d) After Live Merge ----------------- backing file: 79b08653-9db2-4edf-ab2f-6267549bb76d (actual path: /dev/29c00ce2-d434-43a8-910f-69daf79b833a/79b08653-9db2-4edf-ab2f-6267549bb76d) Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1671.html