Bug 1284580 - Cannot export VM with RAM snapshots
Summary: Cannot export VM with RAM snapshots
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-3.6.1
: 3.6.1
Assignee: Daniel Erez
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-23 15:50 UTC by Daniel Erez
Modified: 2016-02-10 17:29 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1282239
Environment:
Last Closed: 2016-01-13 14:39:11 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-3.6.z+
rule-engine: blocker+
ylavi: planning_ack+
tnisan: devel_ack+
acanan: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 48768 0 master MERGED core: live snapshot - create VM metadata in RAW Never
oVirt gerrit 49021 0 ovirt-engine-3.6 MERGED core: live snapshot - create VM metadata in RAW Never

Description Daniel Erez 2015-11-23 15:50:57 UTC
+++ This bug was initially created as a clone of Bug #1282239 +++

Description of problem:
VM with snapshot that contains memory state cannot be exported.

Version-Release number of selected component (if applicable):
3.6 (7a891290ffac4bcc4a0e119481b2c1b7ac0254e0)

How reproducible:
100%

Steps to Reproduce:
1. Create a VM
2. Take snapshot with memory
3. Export the VM (without collapse snapshots)

Actual results:
Export fails

Expected results:
VM is exported to the export domain

Additional info:
This is regression that is caused by the addition of cinder.
In CopyImageGroupCommand#canDoAction we fetch the disk to be exported from the DB in order to validate the disk storage type. The problem is that in 3.6 memory snapshots are not represented as disks in the DB, therefore the canDoAction method returns false (without any exist reason).

--- Additional comment from Arik on 2015-11-15 17:16:18 EST ---

The posted patch eliminates the can-do-action failure for memory volumes, but the problem remains. This time the error seems in the host, the metadata volume is not created in qcow2 format for some reason:

2bba5363-25ea-4a23-afdd-7bb96e6e10b9::ERROR::2015-11-15 23:56:39,282::image::490::Storage.Image::(_interImagesCopy) Copy image error: image=45f5db63-bf54-4631-bf8a-c5f1fb099796, src domain=1118b4b4-828a
-41b6-95fc-c79c3e4d27dd, dst domain=6f94de72-f824-4512-b7d9-8e5abe2d88b6
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 481, in _interImagesCopy
    self._wait_for_qemuimg_operation(operation)
  File "/usr/share/vdsm/storage/image.py", line 138, in _wait_for_qemuimg_operation
    operation.wait(self._QEMU_LOGGING_INTERVAL)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 283, in wait
    raise QImgError(self._command.returncode, "", self.error)
QImgError: ecode=1, stdout=, stderr=qemu-img: Could not open '/rhev/data-center/00000001-0001-0001-0001-00000000011a/1118b4b4-828a-41b6-95fc-c79c3e4d27dd/images/45f5db63-bf54-4631-bf8a-c5f1fb099796/9aa7
82de-2f18-4a0a-865f-e573f0605453': Image is not in qcow2 format
, message=None

--- Additional comment from Arik on 2015-11-15 17:20 EST ---



--- Additional comment from Daniel Erez on 2015-11-16 08:25:03 EST ---

Hi Arik,

Which versions of vdsm/qemu are you using? I've tried to reproduce the issue and getting a different error:
"
222fc76c-de1b-4220-86b8-4fe629768fa1::ERROR::2015-11-16 14:59:44,479::task::866::Storage.TaskManager.Task::(_setError) Task=`222fc76c-de1b-4220-86b8-4fe629768fa1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 332, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1557, in moveImage
    vmUUID, op, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 507, in move
    self._interImagesCopy(destDom, srcSdUUID, imgUUID, chains)
  File "/usr/share/vdsm/storage/image.py", line 463, in _interImagesCopy
    raise se.CopyImageError()
CopyImageError: low level Image copy failed: ()
222fc76c-de1b-4220-86b8-4fe629768fa1::DEBUG::2015-11-16 14:59:44,479::task::885::Storage.TaskManager.Task::(_run) Task=`222fc76c-de1b-4220-86b8-4fe629768fa1`::Task._run: 222fc76c-de1b-4220-86b8-4fe629768fa1 () {} failed - stopping task
"

Version:
vdsm-4.17.10.1-0.el7ev.noarch
qemu-img-rhev-2.3.0-31.el7_2.1.x86_64

@Nir - what do you think? An issue in qemu?

--- Additional comment from Arik on 2015-11-16 08:33:04 EST ---

(In reply to Daniel Erez from comment #3)
It may be the same problem since this error appears in my log as well, check few lines above to see if the errors I quoted appear as well

The versions I'm using:
vddm-4.17.999-50.git67f4b2b.f22
qemu-img-2.4.0-2.fc22

--- Additional comment from Nir Soffer on 2015-11-16 12:53:21 EST ---

(In reply to Daniel Erez from comment #3)
> Hi Arik,
> 
> Which versions of vdsm/qemu are you using? I've tried to reproduce the issue
> and getting a different error:
...
> CopyImageError: low level Image copy failed: ()
...

> @Nir - what do you think? An issue in qemu?

There is not enough info in this error to tell anything. Check
the error of the qemu-img command.

--- Additional comment from Daniel Erez on 2015-11-17 05:55:55 EST ---

Hi Kevin,

It seems we're getting 'Image is not in qcow2 format' error [1], after invoking 'qemu-img convert' [2]. Now the file format is indeed raw ('qemu-img info' shows that). The thing is that it worked fine on earlier version (qemu-img-rhev-0.12.1.2-2.448.el6_6.x86_64) - probably just silently ignored? While it fails on a newer version (qemu-img-rhev-2.3.0-31.el7_2.1.x86_64). Was there any change between those versions that might lead to it?

Thanks!

[1]
QImgError: ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/00000001-0001-0001-0001-00000000001d/b4889450-5b53-481a-ae9a-f63decac46de/images/258ff498-e006-4982-b981-2fa2595d6604/8ce3a35b-3b22-4deb-ab57-3bb56b49c6fa': Image is not in qcow2 format"], message=None

[2]
/usr/bin/qemu-img convert -t none -T none -f qcow2 /rhev/data-center/00000001-0001-0001-0001-00000000001d/b4889450-5b53-481a-ae9a-f63decac46de/images/258ff498-e006-4982-b981-2fa2595d6604/8ce3a35b-3b22-4deb-ab57-3bb56b49c6fa -O qcow2 -o compat=0.10 /rhev/data-center/mnt/derez1.usersys:_home_data_export1/12133f75-b1ab-4f00-b7ef-87c2c0f235be/images/258ff498-e006-4982-b981-2fa2595d6604/8ce3a35b-3b22-4deb-ab57-3bb56b49c6fa

--- Additional comment from Kevin Wolf on 2015-11-17 09:57:17 EST ---

Let me see if I understand this correctly:

1. qemu-img info shows format qcow2 for the source image
2. qemu-img convert is used to copy from source to destination, both qcow2
3. qemu-img info shows format raw for the source image

Note that qemu-img convert opens the source file read-only, so if you see what
I described above, can you confirm that no other action was performed between
1. and 3.? This looks rather unlikely.

Is the destination image detected as raw as well or is it correctly qcow2?

Can you post a hexdump of the first 512 bytes? (hexdump -C -n 512 <path>)

--- Additional comment from Arik on 2015-11-17 11:28:34 EST ---

(In reply to Kevin Wolf from comment #7)
Our question is different.

We try to convert an image using qemu-img convert. We specify that the image is qcow2 although qemu-img info shows that it is actually raw.

qemu-img-rhev-0.12.1.2-2.448.el6_6.x86_64 is willing to convert the image.
qemu-img-rhev-2.3.0-31.el7_2.1.x86_64 fails with (a correct) error of "Image is not in qcow2 format".

We wonder how comes that the previous version succeeded and the newer fails. Did you add a missing validation in qemu-img to check that?

--- Additional comment from Daniel Erez on 2015-11-17 11:37:46 EST ---

(In reply to Arik from comment #8)
> (In reply to Kevin Wolf from comment #7)
> Our question is different.
> 
> We try to convert an image using qemu-img convert. We specify that the image
> is qcow2 although qemu-img info shows that it is actually raw.
> 
> qemu-img-rhev-0.12.1.2-2.448.el6_6.x86_64 is willing to convert the image.
> qemu-img-rhev-2.3.0-31.el7_2.1.x86_64 fails with (a correct) error of "Image
> is not in qcow2 format".
> 
> We wonder how comes that the previous version succeeded and the newer fails.
> Did you add a missing validation in qemu-img to check that?

Turns out that on older versions (3.5), we simply used 'dd' to copy qcow2 images, hence, there was no error (since the image was mistakenly been identified as cow...). Now, on new versions, we use 'qemu-img convert' both for raw and cow. So, is it fine to simply remove the source format parameter ('-f qcow2')? I.e. so qemu-img could identify the source format automatically.

--- Additional comment from Daniel Erez on 2015-11-17 17:34:52 EST ---

Hi Kevin,

I've tried to run qemu-img convert on the VM metadata file, which is a 10kb raw image, and the process just hangs.

I.e.
Input:
/usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -p -t none -T none
/rhev/data-center/47cb1d79-872b-4c78-bd62-8179069b85c2/7956f54a-1a71-4a5e-8229-0edef6b175e0/images/f00f914c-3d78-48b6-bb91-6735cfcf5eb1/166a5d45-b022-47d5-a811-02f2cb3dffcf
-O raw /rhev/data-center/mnt/derez1.usersys:_home_data_export2/82b34343-5b4c-47a9-b2ca-3d54f9cca2c3/images/f00f914c-3d78-48b6-bb91-6735cfcf5eb1/166a5d45-b022-47d5-a811-02f2cb3dffcf

Output:
    (0.00/100%)

* strace on the process is an infinite loop of:
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120

BTW, I guessed it's related to [1], but a 101kb lead to the same result...

What do you think?

[1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1410288

--- Additional comment from Daniel Erez on 2015-11-17 17:53 EST ---



--- Additional comment from Daniel Erez on 2015-11-17 17:54 EST ---



--- Additional comment from Daniel Erez on 2015-11-17 17:57:29 EST ---

After some further testing, convert does work only when executed directly from source location. I.e.

/usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -p -t none -T none
166a5d45-b022-47d5-a811-02f2cb3dffcf
-O raw /rhev/data-center/mnt/derez1.usersys:_home_data_export2/82b34343-5b4c-47a9-b2ca-3d54f9cca2c3/images/f00f914c-3d78-48b6-bb91-6735cfcf5eb1/166a5d45-b022-47d5-a811-02f2cb3dffcf
    (100.00/100%)

* strace logs of both scenarios are attached.

--- Additional comment from Kevin Wolf on 2015-11-18 04:57:25 EST ---

(In reply to Daniel Erez from comment #9)
> Turns out that on older versions (3.5), we simply used 'dd' to copy qcow2
> images, hence, there was no error (since the image was mistakenly been
> identified as cow...). Now, on new versions, we use 'qemu-img convert' both
> for raw and cow. So, is it fine to simply remove the source format parameter
> ('-f qcow2')? I.e. so qemu-img could identify the source format
> automatically.

No, you always need to specify -f. For the occasional manual use case where you
know the image, omitting it and relying on probing is fine, but you must never
do that in management software. A raw image could start with a qcow2 header
(after all, the guest can write anything it wants to it) and you still want it
to be treated as raw.

(In reply to Daniel Erez from comment #10)
> BTW, I guessed it's related to [1], but a 101kb lead to the same result...
> 
> What do you think?
> 
> [1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1410288

The Ubuntu bug says that it's fixed in qemu 2.2, so it's probably different.
I tried to reproduce and indeed this hangs:

$ dd if=/dev/zero of=/tmp/test.raw bs=5474 count=1
$ ./qemu-img convert -p -t none -T none -O raw /tmp/test.raw /tmp/dest.raw

The reason seems to be that the source file size isn't aligned to a sector
boundary like a valid raw image would be. In upstream it does work (with the
destination image size rounded up to the next full sector), but the lesson to
learn is that you should only use qemu-img convert with disk images, never with
random other files.

We will automatically get the upstream fix in qemu-kvm-rhev 7.3 as we rebase,
but if you need it in 7.2.z, please clone the Fedora bug (bug 1229394). I'll
already clone the bug for plain RHEL qemu-kvm because the bug exists there as
well and I can't rely on a rebase there.

--- Additional comment from Nir Soffer on 2015-11-18 05:17:59 EST ---

(In reply to Kevin Wolf from comment #14)
> $ dd if=/dev/zero of=/tmp/test.raw bs=5474 count=1
> $ ./qemu-img convert -p -t none -T none -O raw /tmp/test.raw /tmp/dest.raw
> 
> The reason seems to be that the source file size isn't aligned to a sector
> boundary like a valid raw image would be. In upstream it does work (with the
> destination image size rounded up to the next full sector), but the lesson to
> learn is that you should only use qemu-img convert with disk images, never
> with
> random other files.

This image was created using qemu-img create:

  qemu-img create -f qcow2 foo 10240

Then its contents was replaced by writing vdsm metadata:

  with open('foo', 'w') as f:
      f.write(data)

This call truncate the file and write new data. Does it change the
alignment of the original image?

Should we use directio io instead when writing raw image data?

For example (using pseudo code):

    data += padding # make it multiple of 512 bytes

    cat data | dd of=foo oflag=direct

I know that creating a qcow2 image when we want raw image is lame,
we are fixing this.

--- Additional comment from Kevin Wolf on 2015-11-18 05:26:35 EST ---

Do you actually pass this as a disk to a guest? If no, you shouldn't be using
qemu-img at all, because it's a tool for disk images, not for random files.

But if you must, just make sure that you don't change the file size, i.e. open
the file in a mode that doesn't truncate ("r+" for Python's open(), I guess;
conv=notrunc for dd) and make sure that the written data isn't larger than the
image file already is.

--- Additional comment from Nir Soffer on 2015-11-18 05:52:43 EST ---

(In reply to Kevin Wolf from comment #16)
> Do you actually pass this as a disk to a guest? 

No, this is a metadata file that the guest will never see.

--- Additional comment from Daniel Erez on 2015-11-18 10:23:02 EST ---

Thanks Kevin! I've cloned the bug to qemu-kvm-rhev: https://bugzilla.redhat.com/show_bug.cgi?id=1283278

Comment 1 Daniel Erez 2015-11-23 15:53:39 UTC
This bug is for addressing the issue of VM metadata creation by engine. The file should be created as RAW/Preallocated.

Comment 2 Red Hat Bugzilla Rules Engine 2015-11-23 16:30:48 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 Red Hat Bugzilla Rules Engine 2015-11-23 16:30:48 UTC
This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset.
Please set the correct milestone or add the z-stream flag.

Comment 4 Red Hat Bugzilla Rules Engine 2015-11-23 18:21:44 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Aharon Canan 2016-01-03 15:17:38 UTC
Verified both export and import VM with RAM snapshot - both work.


vdsm-4.17.15-0.el7ev.noarch
rhevm-3.6.2-0.1.el6.noarch
qemu-img-rhev-2.3.0-31.el7_2.5.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64

Comment 6 Sandro Bonazzola 2016-01-13 14:39:11 UTC
oVirt 3.6.1 has been released, closing current release


Note You need to log in before you can comment on or make changes to this bug.