Bug 530134 - RFE - In-place backing file format change
Summary: RFE - In-place backing file format change
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.4.z
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 528977 533951 556459 556823
TreeView+ depends on / blocked
 
Reported: 2009-10-21 16:40 UTC by Igor Lvovsky
Modified: 2013-03-01 04:51 UTC (History)
7 users (show)

Fixed In Version: kvm-83-147.el5
Doc Type: Enhancement
Doc Text:
Clone Of:
: 556459 (view as bug list)
Environment:
Last Closed: 2010-03-30 07:55:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0271 0 normal SHIPPED_LIVE Important: kvm security, bug fix and enhancement update 2010-03-29 13:19:48 UTC

Description Igor Lvovsky 2009-10-21 16:40:10 UTC
Description of problem:

First we need create RAW image with:  
                qemu-img create -f raw base_img
After that we create chain of external QCOW2 snapshots with:
                qemu-img create -f qcow2 -F raw -b base_img sn1
                qemu-img create -f qcow2 -F qcow2 -b sn1 sn2 
So, now we have a chain:     
     base_img(raw)->sn1(qcow2)->sn2(qcow2)

Now I want remove the 'base_img' without breaking a whole chain. I can do it by committing 'sn1' into the 'base_img', deleting the 'sn1' snapshot and after that renaming 'base_img' as 'sn1'. In that case we will get a new chain:
            sn1(raw)->sn2(qcow2)

So, it's looks OK, because of metadata of snapshot 'sn2' still contains the write pointer to the backing file ('sn1'). But, now we have a different problem with 'sn2'. Its metadata except of pointer to backing file contains the format (raw/qcow2) of this file and we have not way to change it. It's mean that 'sn2'
think that its backing file 'sn1' is 'qcow2' file when actually 'sn1' is 'raw'.
At this point we lost a chain.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Dor Laor 2009-10-22 13:18:50 UTC
Does the problem exist on NFS too or only raw partition?

Comment 2 Kevin Wolf 2009-10-23 13:13:24 UTC
Well, then obviously your method of merging base_img and s1 wasn't right. You're not supposed to change the backing file. If you really want to do this, you either need to convert base_img to qcow2 or you need to convert sn2 and put the right backing file in (the latter would be the "official" way).

But where is the bug? It's a missing feature at best - however a not very commonly needed one, I think.

Comment 3 Igor Lvovsky 2009-10-26 13:26:38 UTC
Yes, the problem exist also in NFS.

Actually, you right probably it's not a bug, but missing feature.

However, we have a feature in RHEV-M that allow to user remove the snapshot from any place in the chain. It's mean we must be able to change backing file.
I know that our way it's not a 100% right, but it was a kind of workaround.
We can't use 'qemu-img convert' as you advised, it's very expensive from IO perspective for us.

Actually we need some 'qemu-img merge' operation or at least verb to change backing file format in snapshot's metadata without convert a whole snapshot.

In addition I think we should add to 'qemu-img convert -F' flag beside '-B' flag (if we allow to change backing file of snapshot, we should allow to change its format too)

Comment 4 Kevin Wolf 2009-10-26 13:56:15 UTC
(In reply to comment #3)
> Yes, the problem exist also in NFS.
> 
> Actually, you right probably it's not a bug, but missing feature.
> 
> However, we have a feature in RHEV-M that allow to user remove the snapshot
> from any place in the chain. It's mean we must be able to change backing file.
> I know that our way it's not a 100% right, but it was a kind of workaround.
> We can't use 'qemu-img convert' as you advised, it's very expensive from IO
> perspective for us.

But currently it's the only way to do it. You can't expose a feature that doesn't exist.

> Actually we need some 'qemu-img merge' operation or at least verb to change
> backing file format in snapshot's metadata without convert a whole snapshot.

This topic has been discussed upstream, and basically we can't do much about it because changing the backing file means that the qcow2 header size changes. And we can't simply extend the header if there is already some data directly after the header. The qcow2 file format isn't designed to allow this.

> In addition I think we should add to 'qemu-img convert -F' flag beside '-B'
> flag (if we allow to change backing file of snapshot, we should allow to change
> its format too)  

Yes, I think this makes sense. I'll leave the bug open for this one (even though it isn't a bug either), but I'm afraid you main feature wish isn't going to be fulfilled.

Comment 5 Kevin Wolf 2009-10-27 11:44:37 UTC
(In reply to comment #4)
> > In addition I think we should add to 'qemu-img convert -F' flag beside '-B'
> > flag (if we allow to change backing file of snapshot, we should allow to change
> > its format too)  
> 
> Yes, I think this makes sense.

Well, no, it doesn't. qemu-img convert -o backing_fmt=qcow2 does exist and provides this desired functionality.

Comment 6 Ayal Baron 2009-10-28 09:27:27 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Yes, the problem exist also in NFS.
> > 
> > Actually, you right probably it's not a bug, but missing feature.
> > 
> > However, we have a feature in RHEV-M that allow to user remove the snapshot
> > from any place in the chain. It's mean we must be able to change backing file.
> > I know that our way it's not a 100% right, but it was a kind of workaround.
> > We can't use 'qemu-img convert' as you advised, it's very expensive from IO
> > perspective for us.
> 
> But currently it's the only way to do it. You can't expose a feature that
> doesn't exist.
> 
> > Actually we need some 'qemu-img merge' operation or at least verb to change
> > backing file format in snapshot's metadata without convert a whole snapshot.
> 
> This topic has been discussed upstream, and basically we can't do much about it
> because changing the backing file means that the qcow2 header size changes. And
> we can't simply extend the header if there is already some data directly after
> the header. The qcow2 file format isn't designed to allow this.

From what I understand this is not quite correct, there are 8 bytes reserved for the backing file format and only 3 used (RAW).  Changing this to 'qcow2' still stays within the 8 bytes limit.  In fact for any format name <= 8b this will always work.  
In addition, IIUC the space allocated for the header is "cluster block size" which is also almost always larger than the header size so we can even change the backing file name within certain limits (and or support >=8b format change).  
The easy way to avoid problems is simply to fail if there isn't enough space in which case we will have to fall back to convert.
Using convert might entail long downtime on the VM.  When we are talking about production servers, users will not accept this.
Avoiding snapshot collapse means suffering a performance hit which is also unreasonable as far as the users are concerned.



> 
> > In addition I think we should add to 'qemu-img convert -F' flag beside '-B'
> > flag (if we allow to change backing file of snapshot, we should allow to change
> > its format too)  
> 
> Yes, I think this makes sense. I'll leave the bug open for this one (even
> though it isn't a bug either), but I'm afraid you main feature wish isn't going
> to be fulfilled.

Comment 11 Miki Kenneth 2009-11-05 15:33:32 UTC
As 2.2 is now aligned with 5.5, we can skip the z stream, but we need it to be fixed.

Comment 16 lihuang 2010-02-07 11:43:34 UTC
According to my understanding of "deleting snapshot from chains" and the rebase patch. test 3 test cases:
        base ---> sn1 ---> sn2
1) no rebase          :commit sn1 to base, delete sn1,and rename base to sn1
2) rebase safe mode   :commit sn1 to base, rebase sn2,delete sn1.(w/o -u)
3) rebase unsafe mode :commit sn1 to base, delete sn1,rebase sn2 (w/ -u)

According to the format of base image.test 3 scenarios.
1) raw ---> sn1 ---> sn2
2) qcow2 ---> sn1 ---> sn2
3) base0 --...--> snapshot ---> sn1 ---> sn2

result :
+----------------------------------------------------------------------+
|           | pre-patched |               patched (kvm 157)            |
|           |-------------+------------+--------------+----------------|
|           | no rebase   | no rebase  | rebase(safe) | rebase(unsafe) |
|-----------+-------------+------------+--------------+----------------|
|1 raw      | FAILED      | FAILED     | PASS         | PASS           |
|-----------+-------------+------------+--------------+----------------|
|2 qcow2    | PASS        | PASS       | PASS         | PASS           |
|-----------+-------------+------------+--------------+----------------|
|3 snapshot | PASS        | PASS       | PASS         | PASS           |
+----------------------------------------------------------------------+

Comment 17 lihuang 2010-02-07 11:48:17 UTC
script using qemu-io for this bug.

------------------------------------------------------
#sh delete.sh raw|qcow2|snapshot rebase|urebase
fmt=qcow2
rebase=0
urebase=0

if [ ${1}X = "raw"X ]; then
   mkdir -p raw
   cd raw
   rm -rf *
   qemu-img create -f raw base 3G
   fmt=raw
elif [ ${1}X = "qcow2"X ]; then
   mkdir -p qcow2
   cd qcow2
   rm -rf *
   qemu-img create -f qcow2 base 3G
elif [ ${1}X = "snapshot"X ]; then
   mkdir -p snapshot
   cd snapshot
   rm -rf *
   qemu-img create -f raw base0 3G
   qemu-img create -F raw -f qcow2 -b base0 base
else
   exit;
fi

if [ ${2}X = "rebase"X ]; then
   rebase=1
elif [ ${2}X = "urebase"X ]; then
   urebase=1
fi
   
qemu-io base <<EOF
write 0 10M -P 65
write 10M 10M -P 97
quit
EOF

qemu-img create -F $fmt -f qcow2 -b base sn1
qemu-io sn1 <<EOF
write 20M 10M -P 61
write 30M 10M -P 83
quit
EOF

qemu-img create -f qcow2 -F qcow2 -b sn1 sn2
qemu-io sn2 <<EOF
write 40M 10M -P 42
write 50M 10M -P 81
quit
EOF

qemu-img commit -f qcow2 sn1 
mkdir -p tmp

if [ $rebase -ne 0 ]; then
  qemu-img rebase -b base -F $fmt sn2
  mv sn1 tmp
elif [ $urebase -ne 0 ]; then
  mv sn1 tmp
  qemu-img rebase -u -b base -F $fmt sn2
else
  mv sn1 tmp
  mv base sn1
fi

qemu-io sn2 <<EOF
read   0 10M -P 65
read 10M 10M -P 97
read 20M 10M -P 61
read 30M 10M -P 83
read 40M 10M -P 42
read 50M 10M -P 81
quit
EOF

Comment 18 lihuang 2010-02-07 12:02:33 UTC
ALL PASSed tests in comment#16 also be test with installed guest. ( RHEL5.4 32bit ).

steps :
1. start vm with base image. run #dd if=/dev/urandom of=dd.base bs=1M count=512 ; cksum dd.base.

2. shutdown vm, and create snapshot sn1. qemu-img create -F $fmt -f qcow2 -b base sn1

3. start vm with sn1 image. run #dd if=/dev/urandom of=dd.sn1 bs=1M count=512 ; cksum dd.sn1

4. shutdown vm, and create snapshot sn2. qemu-img create -F qcow2 -f qcow2 -b sn1 sn2

5. start vm with sn2 image. run #dd if=/dev/urandom of=dd.sn2 bs=1M count=512 ; cksum dd.sn2

6. shutdown vm.

7. commit sn1 to base. by #qemu-img commit -f qcow2 sn1.

8. "rebase" sn2  -- "rename/safe rebase/unsafe rebase" mentioned in comment#16.

9. start vm with sn2.
   --> vm could be start.
   --> dd.base/dd.sn1/dd.sn2 are saved and cksum unchanged.

Comment 19 lihuang 2010-02-07 12:09:09 UTC
additional test for PATCH :Introduce BDRV_O_NO_BACKING/
->PASS.

Comment 22 errata-xmlrpc 2010-03-30 07:55:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0271.html


Note You need to log in before you can comment on or make changes to this bug.