Bug 1139707

Summary: qcow2 images created on rhel7 hosts are unreadable on rhel6 hosts
Product: Red Hat Enterprise Virtualization Manager Reporter: Yedidyah Bar David <didi>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.1-1CC: acanan, amureini, areis, bazulay, didi, djasa, eblake, ecohen, fsimonce, gklein, iheim, kwolf, lpeer, mkalinin, nsoffer, pdwyer, rjones, scohen, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1142691 (view as bug list) Environment:
Last Closed: 2015-02-11 21:12:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1142691, 1147536, 1164308, 1164311    

Description Yedidyah Bar David 2014-09-09 13:27:37 UTC
Description of problem:

In a dc/cluster with a rhel7 spm and other rhel6 hosts, if a disk image is created by rhel7, snapshot creation attempts fail with:

2014-09-09 12:57:11.098+0000: 28325: error : qemuMonitorJSONCheckError:357 : internal error unable to execute QEMU command 'transaction': '' uses a qcow2 feature which is not supported by this qemu version: QCOW version 3

Version-Release number of selected component (if applicable):

On rhel7:
qemu-kvm-1.5.3-60.el7_0.5.x86_64
vdsm-4.14.13-1.el7ev.x86_64

On rhel6:
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64
vdsm-4.14.11-1.el6ev.x86_64

How reproducible:

Always, I think

Steps to Reproduce:
1. Create a VM with rhel7 host as spm
2. Start the VM on a rhel6 host
3. Try to create a snapshot (tried both with and without memory, both failed)

Actual results:

snapshot creation fails

Expected results:

snapshot creation succeeds :-)

Either that (somehow make the engine realize that there are also rhel6 hosts and pass to vdsm/libvirt/qemu an option to create the image compatible with rhel6), or at least prevent that.

Additional info:

Comment 2 Federico Simoncelli 2014-09-09 13:48:20 UTC
It seems that since rhel7 we'll have to use compat=0.10 on image creation.

Since the "compat" option is not present in rhel6 either we decide to use it only with newer qemu-img binaries or we request to backport the option to rhel6 as well.

Ademar what do you think, can you clone this bz to qemu-kvm rhel6 to add the support of the option?

Comment 5 Ademar Reis 2014-09-09 14:34:03 UTC
(In reply to Federico Simoncelli from comment #2)
> It seems that since rhel7 we'll have to use compat=0.10 on image creation.
> 
> Since the "compat" option is not present in rhel6 either we decide to use it
> only with newer qemu-img binaries or we request to backport the option to
> rhel6 as well.
> 
> Ademar what do you think, can you clone this bz to qemu-kvm rhel6 to add the
> support of the option?

Can you check what qemu-img version is present and then use compat=0.10 only when it's an old version (below qemu-1.1)? This is the libvirt way of dealing with old qemu versions and that's what we encourage.

Old qemu-img versions create images in the 0.10 format anyway, so a command line option would be a NOP and will be confusing for users who run it by hand (as it won't support compat=1.1).

Comment 6 Yedidyah Bar David 2014-09-10 08:38:02 UTC
Now verified a workaround:

Locate the problematic image file (sadly this isn't that easy to do - I had to search logs for that), then convert it to the 0.10 format, on the rhel7 host, with:

qemu-img convert -O qcow2 -o compat=0.10 $src $tempdest
mv $tempdest $src

Comment 8 Kevin Wolf 2014-09-11 06:38:14 UTC
(In reply to Yedidyah Bar David from comment #6)
> qemu-img convert -O qcow2 -o compat=0.10 $src $tempdest
> mv $tempdest $src

You can also use 'qemu-img amend -f qcow2 -o compat=0.10 $src', which is a much
quicker operation and doesn't involve copying data.

(In reply to Ademar Reis from comment #5)
> (In reply to Federico Simoncelli from comment #2)
> Can you check what qemu-img version is present and then use compat=0.10 only
> when it's an old version (below qemu-1.1)? This is the libvirt way of
> dealing with old qemu versions and that's what we encourage.

Actually, libvirt wouldn't check the version number, but check whether compat
is in the list printed by 'qemu-img create -f qcow2 -o "?" dummy', which works
more reliably in downstreams that backport random features.

Comment 9 Nir Soffer 2014-09-11 08:29:56 UTC
(In reply to Kevin Wolf from comment #8)
> Actually, libvirt wouldn't check the version number, but check whether compat
> is in the list printed by 'qemu-img create -f qcow2 -o "?" dummy', which
> works
> more reliably in downstreams that backport random features.

Can promise that this api will be more stable then the image format?

Comment 10 Allon Mureinik 2014-09-11 08:43:30 UTC
(In reply to Kevin Wolf from comment #8)
> (In reply to Yedidyah Bar David from comment #6)
> > qemu-img convert -O qcow2 -o compat=0.10 $src $tempdest
> > mv $tempdest $src
> 
> You can also use 'qemu-img amend -f qcow2 -o compat=0.10 $src', which is a
> much
> quicker operation and doesn't involve copying data.
> 
> (In reply to Ademar Reis from comment #5)
> > (In reply to Federico Simoncelli from comment #2)
> > Can you check what qemu-img version is present and then use compat=0.10 only
> > when it's an old version (below qemu-1.1)? This is the libvirt way of
> > dealing with old qemu versions and that's what we encourage.
> 
> Actually, libvirt wouldn't check the version number, but check whether compat
> is in the list printed by 'qemu-img create -f qcow2 -o "?" dummy', which
> works
> more reliably in downstreams that backport random features.

Nir, can we bang out a quick fix based on this?

Comment 11 Nir Soffer 2014-09-11 08:47:25 UTC
(In reply to Allon Mureinik from comment #10)
> (In reply to Kevin Wolf from comment #8)
Waiting for confirmation from Kevin about the stability of this api.

Comment 12 Nir Soffer 2014-09-11 09:05:09 UTC
Kevin, qemu-img manual say:

       options
           is a comma separated list of format specific options in a
           name=value format. Use "-o ?" for an overview of the options
           supported by the used format or see the format descriptions
           below for details.

Running this produce this output:

$ qemu-img create -f qcow2 -o "?" dummy
Supported options:
size             Virtual disk size
compat           Compatibility level (0.10 or 1.1)
backing_file     File name of a base image
backing_fmt      Image format of the base image
encryption       Encrypt the image
cluster_size     qcow2 cluster size
preallocation    Preallocation mode (allowed values: off, metadata)
lazy_refcounts   Postpone refcount updates

But there is no specification of the output of this command ("-o ?"). This means that we depend on parsing undocumented format to understand the capabilities fo qemu-img.

For example, this does not give the list of compat values availale (they are listed in a user readable form, not in a machine readable form).

I don't think we can depend on it.

Comment 13 Kevin Wolf 2014-09-11 11:02:37 UTC
(In reply to Nir Soffer from comment #9)
> Can promise that this api will be more stable then the image format?

We're generally very careful to only extend APIs and not take things away. And
we're aware that in some places, output originally meant for humans is parsed
by management tools because it's the only way to get information. In such cases
we treat this as a stable API.

libvirt is doing a lot of things like this. If you're not confident enough to
have such checks in RHEV-M code, perhaps libvirt can offer the functionality
that you need and will abstract away any changes in qemu. I'm copying Eric
Blake for this

Comment 14 Eric Blake 2014-09-11 12:19:15 UTC
(In reply to Kevin Wolf from comment #8)

> (In reply to Ademar Reis from comment #5)
> > (In reply to Federico Simoncelli from comment #2)
> > Can you check what qemu-img version is present and then use compat=0.10 only
> > when it's an old version (below qemu-1.1)? This is the libvirt way of
> > dealing with old qemu versions and that's what we encourage.
> 
> Actually, libvirt wouldn't check the version number, but check whether compat
> is in the list printed by 'qemu-img create -f qcow2 -o "?" dummy', which
> works
> more reliably in downstreams that backport random features.

If you use libvirt to create the files, libvirt ALREADY supports the notion of requesting a compat level of 0.10 in a way that works with both old qemu (omitted to qemu-img, because it was unsupported but the default) and new qemu (supplied to qemu-img, to force back-compat behavior).

http://libvirt.org/formatstorage.html#StorageVolTarget documents the <compat> tag to virDomainStorageVolCreateXML

Comment 15 Eric Blake 2014-09-11 12:24:28 UTC
Here's how libvirt portably probes for whether -o compat works:

http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend.c;h=00cfe74c8824912811ff153b5d34082b06d578c7;hb=HEAD#l672

Comment 16 Allon Mureinik 2014-09-11 12:40:19 UTC
If this is good enough for libvirt it should be good enough for VDSM.

Comment 17 Nir Soffer 2014-09-11 18:01:16 UTC
Thanks Kevin and Eric, we will use the same check as libvirt does.

Comment 18 Nir Soffer 2014-09-11 23:52:14 UTC
Didi, can you check the attached patch on your environment?

Comment 19 Nir Soffer 2014-09-13 18:49:36 UTC
Kevin, woul you like to review the attach patch?
http://gerrit.ovirt.org/32836

Comment 20 Kevin Wolf 2014-09-15 08:15:51 UTC
Looks good to me.

Comment 22 Nir Soffer 2014-09-18 08:38:09 UTC
How to verify:

1. Setup a cluster with rhel7 and rhel6.5 hosts
2. Activate rhel7 host
3. Create vm with one disk using thin provisioning
4. Put host to maintenance
5. Start rhel6 host
6. Start vm
7. create live sanpshot

Additional verification:
Check that image is using qcow2 comapt=0.10 (on rhel7 machine):
qemu-img info /path/to/volume

Comment 23 Federico Simoncelli 2014-09-23 12:31:58 UTC
Kevin do you know if "create" is the only command affected by this?

For example what about "convert" and "rebase"? E.g.:

$ qemu-img convert -f qcow2 -O qcow2 input_0.10_image.qcow2 output.qcow2

Is "convert" going to use compat=0.10 as the source or is it going to use 1.1 (when available)?

What about rebase?

Is there any guarantee that the compat version in input is always maintained in the output?

Thanks.

Comment 25 Kevin Wolf 2014-09-23 14:11:26 UTC
(In reply to Federico Simoncelli from comment #23)
> Kevin do you know if "create" is the only command affected by this?

The rule of thumb is that anything that creates a new image gets the same
defaults as qemu-img create. This includes convert (the target image is newly
created, except with -n), but not rebase (all images already exist and are only
modified). Consequently, convert does have an -o option and rebase doesn't.

It also potentially includes things like live snapshots, though I think you
don't make use of qemu's functionality to create a new image there, but rather
invoke qemu-img create manually.

Comment 26 Federico Simoncelli 2014-09-23 17:43:03 UTC
(In reply to Kevin Wolf from comment #25)
> (In reply to Federico Simoncelli from comment #23)
> > Kevin do you know if "create" is the only command affected by this?
> 
> The rule of thumb is that anything that creates a new image gets the same
> defaults as qemu-img create. This includes convert (the target image is newly
> created, except with -n), but not rebase (all images already exist and are
> only
> modified). Consequently, convert does have an -o option and rebase doesn't.

Nir you may want to check this because it could be that this bz is not fully addressed by the current patch ON_QA.

Even if, by any chance, it is fixed in practice (glitch in "convert")... I trust Kevin when he says that "anything that creates a new image gets the same defaults as qemu-img create".

Comment 27 Nir Soffer 2014-09-23 20:32:46 UTC
I checked qemu-img on Fedora 20. According to the online help, only create, convert and amend have a -o options parameter.

$ qemu-img convert -f qcow2 -o ?
Supported options:
size             Virtual disk size

$ qemu-img amend -f qcow2 -o ?
Supported options:
size             Virtual disk size
compat           Compatibility level (0.10 or 1.1)
backing_file     File name of a base image
backing_fmt      Image format of the base image
encryption       Encrypt the image
cluster_size     qcow2 cluster size
preallocation    Preallocation mode (allowed values: off, metadata)
lazy_refcounts   Postpone refcount updates

$ rpm -q qemu-img
qemu-img-2.1.1-1.fc20.x86_64

So I think that we should use the same logic when running these commands, and add a "-o compat=0.10" if compat is supported by the command. It seems that convert does not support this option currently, but if it does support this option, we will be covered.

Kevin, can you confirm that this is the right way to handle this?

Comment 28 Nir Soffer 2014-09-23 20:38:02 UTC
Let me correct my self - convert do support compat if -O qcow2 is specified:

$ qemu-img convert -f qcow2 -O qcow2 -o ?
Supported options:
size             Virtual disk size
compat           Compatibility level (0.10 or 1.1)
backing_file     File name of a base image
backing_fmt      Image format of the base image
encryption       Encrypt the image
cluster_size     qcow2 cluster size
preallocation    Preallocation mode (allowed values: off, metadata)
lazy_refcounts   Postpone refcount updates

Comment 29 Kevin Wolf 2014-09-24 09:09:36 UTC
(In reply to Nir Soffer from comment #27)
> So I think that we should use the same logic when running these commands,
> and add a "-o compat=0.10" if compat is supported by the command. It seems
> that convert does not support this option currently, but if it does support
> this option, we will be covered.
> 
> Kevin, can you confirm that this is the right way to handle this?

Yes for create and convert (you noticed -O in convert yourself).

The case is a bit different with amend, because that doesn't create a new
image, but rather updates an existing one. If you don't specify compat there,
it will leave the image format version as it is. If you do specify it, you can
upgrade/downgrade existing images.

Comment 30 Elad 2014-09-30 15:33:56 UTC
I managed to create a snapshot for a VM that was started on a RHEL6.5 host with 1 disk created while the SPM was a RHEL7 host. (according to comment #22)

Verified using vt4

Comment 32 Allon Mureinik 2014-11-29 07:17:14 UTC
*** Bug 1168958 has been marked as a duplicate of this bug. ***

Comment 33 David Jaša 2014-12-01 00:55:29 UTC
(In reply to Kevin Wolf from comment #8)
> ...
> You can also use 'qemu-img amend -f qcow2 -o compat=0.10 $src', which is a
> much
> quicker operation and doesn't involve copying data.
> 

Thanks! So the way to fix the images (on iSCSI SD) for me was to:
1) shutdown all VMs, put hosts to maintenance
2) on .el7 host, activate all LVs on the respective domain:
    lvchange -ay SD_UUID
3) look at images info:
    for LV in /rhev/data-center/<DC_UUID>/<SD_UUID>/images/*/* ; do qemu-img info $LV ; done
4) convert images from previous point with qcow2 format and "compat: 1.1"
    for LV in LV1 ... LVi ; do qemu-img amend -f qcow2 -o compat=0.10 $LV ; done
5) verify format:
    for LV in LV1 ... LVi ; do qemu-img info $LV ; done
6) deactivate LVs again:
    lvchange -an /dev/f3897d4d-b938-45f1-b8ac-ff6a1dd8b509/[0-9a-f]*
7) activate the hosts again. Now the VMs can start on RHEL 6 hosts again

It would be nice to automate this procedure though.

Comment 34 Allon Mureinik 2014-12-01 08:36:21 UTC
(In reply to David Jaša from comment #33)
> (In reply to Kevin Wolf from comment #8)
> > ...
> > You can also use 'qemu-img amend -f qcow2 -o compat=0.10 $src', which is a
> > much
> > quicker operation and doesn't involve copying data.
> > 
> 
> Thanks! So the way to fix the images (on iSCSI SD) for me was to:
> 1) shutdown all VMs, put hosts to maintenance
> 2) on .el7 host, activate all LVs on the respective domain:
>     lvchange -ay SD_UUID
> 3) look at images info:
>     for LV in /rhev/data-center/<DC_UUID>/<SD_UUID>/images/*/* ; do qemu-img
> info $LV ; done
> 4) convert images from previous point with qcow2 format and "compat: 1.1"
>     for LV in LV1 ... LVi ; do qemu-img amend -f qcow2 -o compat=0.10 $LV ;
> done
> 5) verify format:
>     for LV in LV1 ... LVi ; do qemu-img info $LV ; done
> 6) deactivate LVs again:
>     lvchange -an /dev/f3897d4d-b938-45f1-b8ac-ff6a1dd8b509/[0-9a-f]*
> 7) activate the hosts again. Now the VMs can start on RHEL 6 hosts again
> 
> It would be nice to automate this procedure though.
Agreed.

Marina - Customers who used RHEL 7 with RHEV 3.4 prior to 3.4.4 may have encountered this issue. 
Perhaps GSS could provide a tool/kbase for this?

Comment 35 Marina Kalinin 2014-12-01 23:21:05 UTC
Allon, thank you.
Will do.

Comment 36 Marina Kalinin 2014-12-02 16:32:31 UTC
https://access.redhat.com/solutions/1284573

Comment 39 errata-xmlrpc 2015-02-11 21:12:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html