Bug 1596619 - Snapshot of Cinder (Ceph RBD) disk fails
Summary: Snapshot of Cinder (Ceph RBD) disk fails
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.2.6.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.3.5
: ---
Assignee: Daniel Erez
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-29 09:58 UTC by matthias.leopold
Modified: 2019-06-23 11:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-23 11:55:17 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.3+


Attachments (Terms of Use)
engine log (12.08 KB, text/plain)
2018-06-29 09:58 UTC, matthias.leopold
no flags Details
engine log (12.29 KB, text/plain)
2018-07-03 15:26 UTC, matthias.leopold
no flags Details
cinder API log (12.05 KB, text/plain)
2018-07-03 15:27 UTC, matthias.leopold
no flags Details
cinder volume log (6.49 KB, text/plain)
2018-07-03 15:28 UTC, matthias.leopold
no flags Details
VDSM log (97.78 KB, text/plain)
2018-07-03 15:28 UTC, matthias.leopold
no flags Details
libvirtd log (1.57 MB, application/x-gzip)
2018-07-04 10:01 UTC, matthias.leopold
no flags Details
qemu log (27.62 KB, text/plain)
2018-07-04 10:02 UTC, matthias.leopold
no flags Details
engine log 20180704 (11.68 KB, text/plain)
2018-07-04 19:46 UTC, matthias.leopold
no flags Details
cinder API 20180704 (14.04 KB, text/plain)
2018-07-04 19:47 UTC, matthias.leopold
no flags Details
cinder volume 20180704 (10.80 KB, text/plain)
2018-07-04 19:48 UTC, matthias.leopold
no flags Details
VDSM log 20180704 (486.93 KB, application/x-xz)
2018-07-04 19:50 UTC, matthias.leopold
no flags Details

Description matthias.leopold 2018-06-29 09:58:55 UTC
Created attachment 1455487 [details]
engine log

Description of problem:
A snapshot of a VM with Cinder (Ceph RBD) disks fails. In Cinder the snapshot seems to be created successfully, but is immediately deleted.

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.4.5-1.el7.noarch
openstack-cinder-11.1.1-1.el7.noarch
librbd1-12.2.5-0.el7.x86_64

How reproducible:
Set up oVirt 4.2.4 with Enterprise Linux (CentOS) hosts and ovirt-engine hosted in VMware, primary storage is iSCSI.
Set up Cinder in oVirt VM, configure backend with rbd.RBDDriver connected to Ceph 12.2.5 cluster.
Setup OpenStack Block Storage Provider with Cinder VM in oVirt.
Create VM with disk in Cinder storage domain.

Steps to Reproduce:
1. Navigate to Snapshots Tab of VM
2. Create Snapshot (with or without memory)
3.

Actual results:
Snapshot is visible during exection of command in tab, but disappears immediately. Events messages:
"Failed to complete snapshot 'hallo' creation for VM 'ovirt-test01.srv-ubuntu'."
"Failed to create live snapshot 'hallo' for VM 'ovirt-test01.srv-ubuntu'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency."



Expected results:
Snapshot succeeds.


Additional info:
Cinder snapshots did work in oVirt 4.1.x.
In oVirt 4.2.2 the Cinder snapshot failed with same error messages in logs, but didn't get deleted and was actually usable.
Snapshots also fail with VMs with Red Hat OS, not only Ubuntu.

Comment 1 matthias.leopold 2018-07-03 07:52:50 UTC
Can you confirm this is a bug in oVirt?
If yes: will it be fixed in oVirt 4.2.x?
Is there a workaround?
Or is this a result of misconfiguration (Cinder?) on my side?

Due to the described behaviour I'm forced to stay with oVirt 4.2.2 at the moment, because Cinder snapshots are working "somehow" there. This is a very unpleasant situation, that's why I'm asking.

Thanks a lot
Matthias

Comment 2 Daniel Erez 2018-07-03 08:55:40 UTC
(In reply to matthias.leopold from comment #1)
> Can you confirm this is a bug in oVirt?
> If yes: will it be fixed in oVirt 4.2.x?
> Is there a workaround?
> Or is this a result of misconfiguration (Cinder?) on my side?
> 
> Due to the described behaviour I'm forced to stay with oVirt 4.2.2 at the
> moment, because Cinder snapshots are working "somehow" there. This is a very
> unpleasant situation, that's why I'm asking.
> 
> Thanks a lot
> Matthias

According to the engine log [1], the failure is during live snapshot creation. Is it working as expected when creating a snapshot for a non-running VM? Is it working when creating a snapshot directly from Cinder? Can you please attach the vdsm log, and also cinder/ceph logs if available.

2018-06-29 11:02:00,677+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-27) [5fc77d3f-6262-441b-ad01-4b2d5faf5cc4] Command 'SnapshotVDSCommand(HostName = ov-test-04-01, SnapshotVDSCommandParameters:{hostId='d8794e95-3f89-4b1a-9bec-12ccf6db0cb1', vmId='da77bcdd-f2ea-40f6-af00-8245ac477b08'})' execution failed: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48
2018-06-29 11:02:00,677+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-27) [5fc77d3f-6262-441b-ad01-4b2d5faf5cc4] FINISH, SnapshotVDSCommand, log id: 40f0d3a3
2018-06-29 11:02:00,689+02 WARN  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-27) [5fc77d3f-6262-441b-ad01-4b2d5faf5cc4] Could not perform live snapshot due to error, VM will still be configured to the new created snapshot: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48 (Failed with error SNAPSHOT_FAILED and code 48)

Comment 3 matthias.leopold 2018-07-03 15:26:37 UTC
Created attachment 1456264 [details]
engine log

Comment 4 matthias.leopold 2018-07-03 15:27:30 UTC
Created attachment 1456266 [details]
cinder API log

Comment 5 matthias.leopold 2018-07-03 15:28:12 UTC
Created attachment 1456267 [details]
cinder volume log

Comment 6 matthias.leopold 2018-07-03 15:28:46 UTC
Created attachment 1456268 [details]
VDSM log

Comment 7 matthias.leopold 2018-07-03 15:42:17 UTC
> Is it working as expected when creating a snapshot for a non-running VM?
This works for creating the snapshot as expected, deleting the snapshot also appeared to work (no error in oVirt), later i discovered that the snapshot didn't actually get deleted in Cinder (engine log: "Snapshot '<UNKNOWN>' deletion for VM 'ovirt-test01.srv-ubuntu' has been completed."). This would probably be a case for another bug report.

> Is it working when creating a snapshot directly from Cinder?
Yes, took a snaphot while VM was running.

> Can you please attach the vdsm log, and also cinder/ceph logs if available.
I attached engine log, vdsm log, cinder api/volume log from another try, this time with a VM running CentOS (to be sure there are no issues with ovirt-guest-agent). Try was with a single Cinder disk (there are other disks in the VM) and without memory.
I didn't think of looking at vdsm log, there you can see an error, although it doesn't give a clue to me.

Comment 8 Daniel Erez 2018-07-03 16:02:46 UTC
Looks like an error in a lower level [1]. 
Can you please attach libvirt and qemu logs?
btw, is live snapshot working without Cinder disks?

[1] 2018-07-03 16:31:29,001+0200 ERROR (jsonrpc/6) [virt.vm] (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Unable to take snapshot (vm:4565)

Comment 9 matthias.leopold 2018-07-04 10:01:28 UTC
Created attachment 1456461 [details]
libvirtd log

Comment 10 matthias.leopold 2018-07-04 10:02:00 UTC
Created attachment 1456462 [details]
qemu log

Comment 11 matthias.leopold 2018-07-04 10:10:01 UTC
i attached libvirtd and qemu logs from a new snapshot try

live snapshots for disks on iSCSI do work

Comment 12 Daniel Erez 2018-07-04 11:51:06 UTC
(In reply to matthias.leopold from comment #10)
> Created attachment 1456462 [details]
> qemu log

Seems that ceph.conf file can not be found [1]. Are the ceph monitors reachable from the host? What is ceph's health status? 

[1] 
2018-07-04 11:41:12.705482 7fd335e30040 -1 Errors while parsing config file!
2018-07-04 11:41:12.705485 7fd335e30040 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-07-04 11:41:12.705485 7fd335e30040 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-07-04 11:41:12.705485 7fd335e30040 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2018-07-04T09:41:12.777352Z qemu-kvm: -drive file=/rhev/data-center/mnt/blockSD/ce86cad9-67d2-44e2-b4ab-f445f856e274/images/7d8b4f11-9f6a-490c-b847-a4b9f7a70c31/4c27d800-b040-4616-b2d5-e07a2476d51e,format=qcow2,if=none,id=drive-ua-7d8b4f11-9f6a-490c-b847-a4b9f7a70c31,serial=7d8b4f11-9f6a-490c-b847-a4b9f7a70c31,cache=none,discard=unmap,werror=stop,rerror=stop,aio=native: 'serial' is deprecated, please use the corresponding option of '-device' instead
2018-07-04T09:41:12.778804Z qemu-kvm: -drive file=rbd:ovirt-test/volume-afd7eebc-b488-4385-9f8b-cdfe590a48ee:id=ovirt-test_user_rbd:auth_supported=cephx\;none:mon_host=xxx.yyy.216.45\:6789\;xxx.yyy.216.54\:6789\;xxx.yyy.216.55\:6789,file.password-secret=ua-afd7eebc-b488-4385-9f8b-cdfe590a48ee-secret0,format=raw,if=none,id=drive-ua-afd7eebc-b488-4385-9f8b-cdfe590a48ee,serial=afd7eebc-b488-4385-9f8b-cdfe590a48ee,cache=writeback,discard=unmap,werror=stop,rerror=stop,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead

Comment 13 matthias.leopold 2018-07-04 13:31:55 UTC
Ceph health status is OK.
Ceph monitors are reachable of course, else qemu-kvm couldn't use any ceph disk.
These error messages about missing ceph.conf files have always been there since oVirt 4.1 (where we began), when snapshotting Cinder disks still worked. They don't seem to matter. Ceph configuration hasn't changed in the meantime. Still i created a basic /etc/ceph/ceph.conf, error message is gone from qemu log, but snapshots still fail.

Comment 14 Daniel Erez 2018-07-04 15:02:18 UTC
(In reply to matthias.leopold from comment #13)
> Ceph health status is OK.
> Ceph monitors are reachable of course, else qemu-kvm couldn't use any ceph
> disk.
> These error messages about missing ceph.conf files have always been there
> since oVirt 4.1 (where we began), when snapshotting Cinder disks still
> worked. They don't seem to matter. Ceph configuration hasn't changed in the
> meantime. Still i created a basic /etc/ceph/ceph.conf, error message is gone
> from qemu log, but snapshots still fail.

ok, so need to understand why snapshot failed in libvirt. Can you please attach libvirt log from around '2018-07-03 16:31:29' (which is when it failed on vdsm[1]). The attached libvirt log starts from '2018-07-04 09:38:42'.

[1] 2018-07-03 16:31:29,001+0200 INFO  (jsonrpc/6) [virt.vm] (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Enabling drive monitoring (drivemonitor:54)
2018-07-03 16:31:29,001+0200 DEBUG (jsonrpc/6) [api] FINISH snapshot response={'status': {'message': 'Snapshot failed', 'code': 48}} (api:135)
2018-07-03 16:31:29,001+0200 INFO  (jsonrpc/6) [api.virt] FINISH snapshot return={'status': {'message': 'Snapshot failed', 'code': 48}} from=::ffff:xxx.yyy.215.6,41368, flow_id=501bfa6c-c6fb-42c5-b650-0d772d9476f1, vmId=4a8c9902-f9ab-490f-b1dd-82d9aee63b5f (api:52)

Comment 15 matthias.leopold 2018-07-04 19:46:34 UTC
Created attachment 1456600 [details]
engine log 20180704

Comment 16 matthias.leopold 2018-07-04 19:47:38 UTC
Created attachment 1456601 [details]
cinder API 20180704

Comment 17 matthias.leopold 2018-07-04 19:48:51 UTC
Created attachment 1456602 [details]
cinder volume 20180704

Comment 18 matthias.leopold 2018-07-04 19:50:12 UTC
Created attachment 1456603 [details]
VDSM log 20180704

Comment 19 matthias.leopold 2018-07-04 19:53:18 UTC
i uploaded
- engine.log
- vdsm.log
- cinder volume/api log
corresponding to the previous libvirtd.log

thanks for your efforts

Comment 20 Daniel Erez 2018-07-04 20:07:07 UTC
So according to vdsm log [1], the libvirt error during live snapshot is "unsupported configuration: nothing selected for snapshot".
@Francesco - do you perhaps know what could trigger this? (maybe related to bug 1106419 ?)

[1]
2018-07-04 11:57:44,490+0200 INFO  (jsonrpc/1) [virt.vm] (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Taking a live snapshot (drives=, memory=True) (vm:4561)
2018-07-04 11:57:44,499+0200 ERROR (jsonrpc/1) [virt.vm] (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Unable to take snapshot (vm:4565)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4562, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2585, in snapshotCreateXML
    if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self)
libvirtError: unsupported configuration: nothing selected for snapshot

Comment 21 matthias.leopold 2018-07-04 20:16:04 UTC
by the way, googling for "libvirtError: unsupported configuration: nothing selected for snapshot" i found this (without solution):
https://lists.ovirt.org/pipermail/users/2016-April/072477.html

Comment 22 matthias.leopold 2018-07-04 20:22:47 UTC
in my case 
"2018-07-04 11:42:58,026+0200 INFO  (jsonrpc/7) [virt.vm] (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Taking a live snapshot (drives=, memory=True) (vm:4561)"
doesn't sound right, "drives=," ??

Comment 23 Francesco Romani 2018-07-16 15:19:37 UTC
(In reply to Daniel Erez from comment #20)
> So according to vdsm log [1], the libvirt error during live snapshot is
> "unsupported configuration: nothing selected for snapshot".
> @Francesco - do you perhaps know what could trigger this? (maybe related to
> bug 1106419 ?)
> 
> [1]
> 2018-07-04 11:57:44,490+0200 INFO  (jsonrpc/1) [virt.vm]
> (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Taking a live snapshot
> (drives=, memory=True) (vm:4561)
> 2018-07-04 11:57:44,499+0200 ERROR (jsonrpc/1) [virt.vm]
> (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Unable to take snapshot
> (vm:4565)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4562, in
> snapshot
>     self._dom.snapshotCreateXML(snapxml, snapFlags)
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98,
> in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
> line 130, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92,
> in wrapper
>     return func(inst, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2585, in
> snapshotCreateXML
>     if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed',
> dom=self)
> libvirtError: unsupported configuration: nothing selected for snapshot

Nope, I don't remember anything about issues like this

Comment 24 Francesco Romani 2018-07-16 15:22:49 UTC
(In reply to matthias.leopold from comment #22)
> in my case 
> "2018-07-04 11:42:58,026+0200 INFO  (jsonrpc/7) [virt.vm]
> (vmId='4a8c9902-f9ab-490f-b1dd-82d9aee63b5f') Taking a live snapshot
> (drives=, memory=True) (vm:4561)"
> doesn't sound right, "drives=," ??

This *may be* OK if someone just requested a snapshot of the VM memory, which is a legal request AFAIR.

Comment 25 matthias.leopold 2018-09-20 10:26:09 UTC
After some time and upgrading to oVirt 4.2.6 i looked at the snapshot behaviour for Cinder/Ceph disks in my installations again. The situation is the same, a live snapshot of a VM with Cinder/Ceph disks fails with the following in vdsm.log:

2018-09-20 10:46:15,880+0200 INFO  (jsonrpc/6) [api.virt] START snapshot(snapDrives=[], snapMemory=None, frozen=True) from=::ffff:149.148.215.6,56044, flow_id=4cb7705d-2260-44ba-b2c8-cc8e1e2400a3, vmId=da77bcdd-f2ea-40f6-af00-8245ac477b08 (api:46)
2018-09-20 10:46:15,880+0200 INFO  (jsonrpc/6) [virt.vm] (vmId='da77bcdd-f2ea-40f6-af00-8245ac477b08') <?xml version='1.0' encoding='utf-8'?>
<domainsnapshot><disks /></domainsnapshot> (vm:4598)
2018-09-20 10:46:15,880+0200 INFO  (jsonrpc/6) [virt.vm] (vmId='da77bcdd-f2ea-40f6-af00-8245ac477b08') Disabling drive monitoring (drivemonitor:58)
2018-09-20 10:46:15,880+0200 INFO  (jsonrpc/6) [virt.vm] (vmId='da77bcdd-f2ea-40f6-af00-8245ac477b08') Taking a live snapshot (drives=, memory=True) (vm:4614)
2018-09-20 10:46:15,887+0200 ERROR (jsonrpc/6) [virt.vm] (vmId='da77bcdd-f2ea-40f6-af00-8245ac477b08') Unable to take snapshot (vm:4618)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4615, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2585, in snapshotCreateXML
    if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self)
libvirtError: unsupported configuration: nothing selected for snapshot
2018-09-20 10:46:15,887+0200 INFO  (jsonrpc/6) [virt.vm] (vmId='da77bcdd-f2ea-40f6-af00-8245ac477b08') Enabling drive monitoring (drivemonitor:54)
2018-09-20 10:46:15,887+0200 INFO  (jsonrpc/6) [api.virt] FINISH snapshot return={'status': {'message': 'Snapshot failed', 'code': 48}} from=::ffff:149.148.215.6,56044, flow_id=4cb7705d-2260-44ba-b2c8-cc8e1e2400a3, vmId=da77bcdd-f2ea-40f6-af00-8245ac477b08 (api:52)
2018-09-20 10:46:15,887+0200 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.snapshot failed (error 48) in 0.01 seconds (__init__:573)

I have two additional observations:
- this only happens for snapshots without memory (was already the case with 4.2.2)
- "drives=" in vdsm.log seems to be OK, this indicates that there are only Cinder/Ceph disks involved
- "memory=True" in vdsm.log is always set, regardless what i choose in Web GUI, this also true for snapshots with storage domains of other type

All this happens in both my DEV and PROD oVirt installation.

Can you reproduce this?
Do you need additional logs?

Comment 26 Sandro Bonazzola 2019-01-28 09:40:40 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 27 Tal Nisan 2019-03-17 13:00:35 UTC
Hi Matthias, in 4.3 we're introducing a tech preview of Cinder support through Cinderlib which is tested on Ceph as well, is it possible for you to upgrade?

Comment 28 matthias.leopold 2019-03-19 09:56:25 UTC
I've been following the changelogs of 4.3.x series concerning cinderlib with much interest. 
I'm willing to upgrade our test environnment to 4.3.x, the question is when. Should I wait for ovirt-4.3.4 (Target Milestone) or can I do it before?
I'm expecting that when I upgrade now I will lose existing cinder disk definitions and will have to create cinder disks from scratch, is that correct? This situation would still be ok for the test environment.
Before I can upgrade the production environment I need a rock solid path for migration of the current cinder configuration data, but I know this will be at some distant time in the future.

Thanks a lot
Matthias

Comment 29 matthias.leopold 2019-04-01 09:57:52 UTC
I upgraded my test environment to 4.3.2 (and realized that it's still possible to use old style "OpenStack Block Storage" domains).
I now started to test cinderlib in oVirt and just posted my experiences to the mailing list. This bug can be closed, please answer to my questions on the mailing list.

Thanks again
Matthias

Comment 30 Tal Nisan 2019-06-23 11:55:17 UTC
Closing due to comment #29


Note You need to log in before you can comment on or make changes to this bug.