Bug 1282239 - Cannot export VM with RAM snapshots
Summary: Cannot export VM with RAM snapshots
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-3.6.1
: 3.6.0
Assignee: Daniel Erez
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-15 17:49 UTC by Arik
Modified: 2016-03-10 12:02 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1284580 (view as bug list)
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm log (16.04 MB, text/plain)
2015-11-15 22:20 UTC, Arik
no flags Details
convert stuck strace (708.45 KB, text/plain)
2015-11-17 22:53 UTC, Daniel Erez
no flags Details
convert works strace (115.24 KB, text/plain)
2015-11-17 22:54 UTC, Daniel Erez
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 48580 0 master MERGED core: eliminate CDA on export RAM snapshot 2020-03-03 10:44:28 UTC
oVirt gerrit 48666 0 ovirt-engine-3.6 MERGED core: eliminate CDA on export RAM snapshot 2020-03-03 10:44:28 UTC
oVirt gerrit 48768 0 master MERGED core: live snapshot - create VM metadata in RAW 2020-03-03 10:44:27 UTC
oVirt gerrit 48776 0 master MERGED vm: snapshot - use r+ to open vm conf file 2020-03-03 10:44:27 UTC
oVirt gerrit 48907 0 master ABANDONED image: fall-back to dd - workaround for bz#1282239 2020-03-03 10:44:27 UTC
oVirt gerrit 48912 0 ovirt-3.6 MERGED vm: snapshot - use r+ to open vm conf file 2020-03-03 10:44:27 UTC
oVirt gerrit 49002 0 master MERGED image: copy - set VM metadata images format to RAW 2020-03-03 10:44:27 UTC
oVirt gerrit 49144 0 ovirt-3.6 MERGED image: copy - set VM metadata images format to RAW 2020-03-03 10:44:27 UTC

Description Arik 2015-11-15 17:49:20 UTC
Description of problem:
VM with snapshot that contains memory state cannot be exported.

Version-Release number of selected component (if applicable):
3.6 (7a891290ffac4bcc4a0e119481b2c1b7ac0254e0)

How reproducible:
100%

Steps to Reproduce:
1. Create a VM
2. Take snapshot with memory
3. Export the VM (without collapse snapshots)

Actual results:
Export fails

Expected results:
VM is exported to the export domain

Additional info:
This is regression that is caused by the addition of cinder.
In CopyImageGroupCommand#canDoAction we fetch the disk to be exported from the DB in order to validate the disk storage type. The problem is that in 3.6 memory snapshots are not represented as disks in the DB, therefore the canDoAction method returns false (without any exist reason).

Comment 1 Arik 2015-11-15 22:16:18 UTC
The posted patch eliminates the can-do-action failure for memory volumes, but the problem remains. This time the error seems in the host, the metadata volume is not created in qcow2 format for some reason:

2bba5363-25ea-4a23-afdd-7bb96e6e10b9::ERROR::2015-11-15 23:56:39,282::image::490::Storage.Image::(_interImagesCopy) Copy image error: image=45f5db63-bf54-4631-bf8a-c5f1fb099796, src domain=1118b4b4-828a
-41b6-95fc-c79c3e4d27dd, dst domain=6f94de72-f824-4512-b7d9-8e5abe2d88b6
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 481, in _interImagesCopy
    self._wait_for_qemuimg_operation(operation)
  File "/usr/share/vdsm/storage/image.py", line 138, in _wait_for_qemuimg_operation
    operation.wait(self._QEMU_LOGGING_INTERVAL)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 283, in wait
    raise QImgError(self._command.returncode, "", self.error)
QImgError: ecode=1, stdout=, stderr=qemu-img: Could not open '/rhev/data-center/00000001-0001-0001-0001-00000000011a/1118b4b4-828a-41b6-95fc-c79c3e4d27dd/images/45f5db63-bf54-4631-bf8a-c5f1fb099796/9aa7
82de-2f18-4a0a-865f-e573f0605453': Image is not in qcow2 format
, message=None

Comment 2 Arik 2015-11-15 22:20:41 UTC
Created attachment 1094656 [details]
vdsm log

Comment 3 Daniel Erez 2015-11-16 13:25:03 UTC
Hi Arik,

Which versions of vdsm/qemu are you using? I've tried to reproduce the issue and getting a different error:
"
222fc76c-de1b-4220-86b8-4fe629768fa1::ERROR::2015-11-16 14:59:44,479::task::866::Storage.TaskManager.Task::(_setError) Task=`222fc76c-de1b-4220-86b8-4fe629768fa1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 332, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1557, in moveImage
    vmUUID, op, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 507, in move
    self._interImagesCopy(destDom, srcSdUUID, imgUUID, chains)
  File "/usr/share/vdsm/storage/image.py", line 463, in _interImagesCopy
    raise se.CopyImageError()
CopyImageError: low level Image copy failed: ()
222fc76c-de1b-4220-86b8-4fe629768fa1::DEBUG::2015-11-16 14:59:44,479::task::885::Storage.TaskManager.Task::(_run) Task=`222fc76c-de1b-4220-86b8-4fe629768fa1`::Task._run: 222fc76c-de1b-4220-86b8-4fe629768fa1 () {} failed - stopping task
"

Version:
vdsm-4.17.10.1-0.el7ev.noarch
qemu-img-rhev-2.3.0-31.el7_2.1.x86_64

@Nir - what do you think? An issue in qemu?

Comment 4 Arik 2015-11-16 13:33:04 UTC
(In reply to Daniel Erez from comment #3)
It may be the same problem since this error appears in my log as well, check few lines above to see if the errors I quoted appear as well

The versions I'm using:
vddm-4.17.999-50.git67f4b2b.f22
qemu-img-2.4.0-2.fc22

Comment 5 Nir Soffer 2015-11-16 17:53:21 UTC
(In reply to Daniel Erez from comment #3)
> Hi Arik,
> 
> Which versions of vdsm/qemu are you using? I've tried to reproduce the issue
> and getting a different error:
...
> CopyImageError: low level Image copy failed: ()
...

> @Nir - what do you think? An issue in qemu?

There is not enough info in this error to tell anything. Check
the error of the qemu-img command.

Comment 6 Daniel Erez 2015-11-17 10:55:55 UTC
Hi Kevin,

It seems we're getting 'Image is not in qcow2 format' error [1], after invoking 'qemu-img convert' [2]. Now the file format is indeed raw ('qemu-img info' shows that). The thing is that it worked fine on earlier version (qemu-img-rhev-0.12.1.2-2.448.el6_6.x86_64) - probably just silently ignored? While it fails on a newer version (qemu-img-rhev-2.3.0-31.el7_2.1.x86_64). Was there any change between those versions that might lead to it?

Thanks!

[1]
QImgError: ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/00000001-0001-0001-0001-00000000001d/b4889450-5b53-481a-ae9a-f63decac46de/images/258ff498-e006-4982-b981-2fa2595d6604/8ce3a35b-3b22-4deb-ab57-3bb56b49c6fa': Image is not in qcow2 format"], message=None

[2]
/usr/bin/qemu-img convert -t none -T none -f qcow2 /rhev/data-center/00000001-0001-0001-0001-00000000001d/b4889450-5b53-481a-ae9a-f63decac46de/images/258ff498-e006-4982-b981-2fa2595d6604/8ce3a35b-3b22-4deb-ab57-3bb56b49c6fa -O qcow2 -o compat=0.10 /rhev/data-center/mnt/derez1.usersys:_home_data_export1/12133f75-b1ab-4f00-b7ef-87c2c0f235be/images/258ff498-e006-4982-b981-2fa2595d6604/8ce3a35b-3b22-4deb-ab57-3bb56b49c6fa

Comment 7 Kevin Wolf 2015-11-17 14:57:17 UTC
Let me see if I understand this correctly:

1. qemu-img info shows format qcow2 for the source image
2. qemu-img convert is used to copy from source to destination, both qcow2
3. qemu-img info shows format raw for the source image

Note that qemu-img convert opens the source file read-only, so if you see what
I described above, can you confirm that no other action was performed between
1. and 3.? This looks rather unlikely.

Is the destination image detected as raw as well or is it correctly qcow2?

Can you post a hexdump of the first 512 bytes? (hexdump -C -n 512 <path>)

Comment 8 Arik 2015-11-17 16:28:34 UTC
(In reply to Kevin Wolf from comment #7)
Our question is different.

We try to convert an image using qemu-img convert. We specify that the image is qcow2 although qemu-img info shows that it is actually raw.

qemu-img-rhev-0.12.1.2-2.448.el6_6.x86_64 is willing to convert the image.
qemu-img-rhev-2.3.0-31.el7_2.1.x86_64 fails with (a correct) error of "Image is not in qcow2 format".

We wonder how comes that the previous version succeeded and the newer fails. Did you add a missing validation in qemu-img to check that?

Comment 9 Daniel Erez 2015-11-17 16:37:46 UTC
(In reply to Arik from comment #8)
> (In reply to Kevin Wolf from comment #7)
> Our question is different.
> 
> We try to convert an image using qemu-img convert. We specify that the image
> is qcow2 although qemu-img info shows that it is actually raw.
> 
> qemu-img-rhev-0.12.1.2-2.448.el6_6.x86_64 is willing to convert the image.
> qemu-img-rhev-2.3.0-31.el7_2.1.x86_64 fails with (a correct) error of "Image
> is not in qcow2 format".
> 
> We wonder how comes that the previous version succeeded and the newer fails.
> Did you add a missing validation in qemu-img to check that?

Turns out that on older versions (3.5), we simply used 'dd' to copy qcow2 images, hence, there was no error (since the image was mistakenly been identified as cow...). Now, on new versions, we use 'qemu-img convert' both for raw and cow. So, is it fine to simply remove the source format parameter ('-f qcow2')? I.e. so qemu-img could identify the source format automatically.

Comment 10 Daniel Erez 2015-11-17 22:34:52 UTC
Hi Kevin,

I've tried to run qemu-img convert on the VM metadata file, which is a 10kb raw image, and the process just hangs.

I.e.
Input:
/usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -p -t none -T none
/rhev/data-center/47cb1d79-872b-4c78-bd62-8179069b85c2/7956f54a-1a71-4a5e-8229-0edef6b175e0/images/f00f914c-3d78-48b6-bb91-6735cfcf5eb1/166a5d45-b022-47d5-a811-02f2cb3dffcf
-O raw /rhev/data-center/mnt/derez1.usersys:_home_data_export2/82b34343-5b4c-47a9-b2ca-3d54f9cca2c3/images/f00f914c-3d78-48b6-bb91-6735cfcf5eb1/166a5d45-b022-47d5-a811-02f2cb3dffcf

Output:
    (0.00/100%)

* strace on the process is an infinite loop of:
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120
lseek(7, 5120, SEEK_HOLE)               = 5474
lseek(7, 5120, SEEK_DATA)               = 5120

BTW, I guessed it's related to [1], but a 101kb lead to the same result...

What do you think?

[1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1410288

Comment 11 Daniel Erez 2015-11-17 22:53:56 UTC
Created attachment 1095737 [details]
convert stuck strace

Comment 12 Daniel Erez 2015-11-17 22:54:31 UTC
Created attachment 1095738 [details]
convert works strace

Comment 13 Daniel Erez 2015-11-17 22:57:29 UTC
After some further testing, convert does work only when executed directly from source location. I.e.

/usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -p -t none -T none
166a5d45-b022-47d5-a811-02f2cb3dffcf
-O raw /rhev/data-center/mnt/derez1.usersys:_home_data_export2/82b34343-5b4c-47a9-b2ca-3d54f9cca2c3/images/f00f914c-3d78-48b6-bb91-6735cfcf5eb1/166a5d45-b022-47d5-a811-02f2cb3dffcf
    (100.00/100%)

* strace logs of both scenarios are attached.

Comment 14 Kevin Wolf 2015-11-18 09:57:25 UTC
(In reply to Daniel Erez from comment #9)
> Turns out that on older versions (3.5), we simply used 'dd' to copy qcow2
> images, hence, there was no error (since the image was mistakenly been
> identified as cow...). Now, on new versions, we use 'qemu-img convert' both
> for raw and cow. So, is it fine to simply remove the source format parameter
> ('-f qcow2')? I.e. so qemu-img could identify the source format
> automatically.

No, you always need to specify -f. For the occasional manual use case where you
know the image, omitting it and relying on probing is fine, but you must never
do that in management software. A raw image could start with a qcow2 header
(after all, the guest can write anything it wants to it) and you still want it
to be treated as raw.

(In reply to Daniel Erez from comment #10)
> BTW, I guessed it's related to [1], but a 101kb lead to the same result...
> 
> What do you think?
> 
> [1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1410288

The Ubuntu bug says that it's fixed in qemu 2.2, so it's probably different.
I tried to reproduce and indeed this hangs:

$ dd if=/dev/zero of=/tmp/test.raw bs=5474 count=1
$ ./qemu-img convert -p -t none -T none -O raw /tmp/test.raw /tmp/dest.raw

The reason seems to be that the source file size isn't aligned to a sector
boundary like a valid raw image would be. In upstream it does work (with the
destination image size rounded up to the next full sector), but the lesson to
learn is that you should only use qemu-img convert with disk images, never with
random other files.

We will automatically get the upstream fix in qemu-kvm-rhev 7.3 as we rebase,
but if you need it in 7.2.z, please clone the Fedora bug (bug 1229394). I'll
already clone the bug for plain RHEL qemu-kvm because the bug exists there as
well and I can't rely on a rebase there.

Comment 15 Nir Soffer 2015-11-18 10:17:59 UTC
(In reply to Kevin Wolf from comment #14)
> $ dd if=/dev/zero of=/tmp/test.raw bs=5474 count=1
> $ ./qemu-img convert -p -t none -T none -O raw /tmp/test.raw /tmp/dest.raw
> 
> The reason seems to be that the source file size isn't aligned to a sector
> boundary like a valid raw image would be. In upstream it does work (with the
> destination image size rounded up to the next full sector), but the lesson to
> learn is that you should only use qemu-img convert with disk images, never
> with
> random other files.

This image was created using qemu-img create:

  qemu-img create -f qcow2 foo 10240

Then its contents was replaced by writing vdsm metadata:

  with open('foo', 'w') as f:
      f.write(data)

This call truncate the file and write new data. Does it change the
alignment of the original image?

Should we use directio io instead when writing raw image data?

For example (using pseudo code):

    data += padding # make it multiple of 512 bytes

    cat data | dd of=foo oflag=direct

I know that creating a qcow2 image when we want raw image is lame,
we are fixing this.

Comment 16 Kevin Wolf 2015-11-18 10:26:35 UTC
Do you actually pass this as a disk to a guest? If no, you shouldn't be using
qemu-img at all, because it's a tool for disk images, not for random files.

But if you must, just make sure that you don't change the file size, i.e. open
the file in a mode that doesn't truncate ("r+" for Python's open(), I guess;
conv=notrunc for dd) and make sure that the written data isn't larger than the
image file already is.

Comment 17 Nir Soffer 2015-11-18 10:52:43 UTC
(In reply to Kevin Wolf from comment #16)
> Do you actually pass this as a disk to a guest? 

No, this is a metadata file that the guest will never see.

Comment 18 Daniel Erez 2015-11-18 15:23:02 UTC
Thanks Kevin! I've cloned the bug to qemu-kvm-rhev: https://bugzilla.redhat.com/show_bug.cgi?id=1283278

Comment 19 Aharon Canan 2016-01-03 15:17:31 UTC
Verified both export and import VM with RAM snapshot - both work.


vdsm-4.17.15-0.el7ev.noarch
rhevm-3.6.2-0.1.el6.noarch
qemu-img-rhev-2.3.0-31.el7_2.5.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64

Comment 20 Allon Mureinik 2016-03-10 10:40:30 UTC
RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE

Comment 21 Allon Mureinik 2016-03-10 10:40:36 UTC
RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE

Comment 22 Allon Mureinik 2016-03-10 10:46:05 UTC
RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE

Comment 23 Allon Mureinik 2016-03-10 12:02:40 UTC
RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE


Note You need to log in before you can comment on or make changes to this bug.