Bug 1697496 - "attach_volume error=Managed Volume is already attached." when migrating VM with Managed Block Storage (Ceph RBD)
Summary: "attach_volume error=Managed Volume is already attached." when migrating VM w...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.3.2
Hardware: x86_64
OS: Linux
unspecified
medium vote
Target Milestone: ovirt-4.3.5
: 4.3.5.2
Assignee: Benny Zlotnik
QA Contact: Shir Fishbain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-08 14:07 UTC by matthias.leopold
Modified: 2019-07-30 14:08 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.3.5.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-30 14:08:26 UTC
oVirt Team: Storage
pm-rhel: ovirt-4.3+


Attachments (Terms of Use)
engine, vdsm, supervdsm logs (1.03 MB, application/x-tar)
2019-04-08 14:07 UTC, matthias.leopold
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 99436 'None' 'MERGED' 'core: detach MBS disks upon live migration failure' 2019-12-03 10:43:18 UTC
oVirt gerrit 100910 'None' 'MERGED' 'core: detach MBS disks upon live migration failure' 2019-12-03 10:43:18 UTC

Description matthias.leopold 2019-04-08 14:07:54 UTC
Created attachment 1553646 [details]
engine, vdsm, supervdsm logs

Description of problem:
Migration of VM with Managed Block Storage disk (Ceph RBD, non OS disk) fails with "EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001)" in engine.log.
Reason seems to be "error=Managed Volume is already attached." in vdsm.log of receiving host.

Version-Release number of selected component (if applicable):
vdsm-4.30.11-1.el7.x86_64

How reproducible:
Start migration of VM with Managed Block Storage disk in oVirt 4.3.2

Steps to Reproduce:
1. set up oVirt 4.3.2 with ManagedBlockDomainSupported=true
2. install openstack-cinder + cinderlib on engine host
3. install python2-os-brick on hypervisor hosts
4. create "Managed Block Storage" domain
5. create VM with OS disk on iSCSI storage and secondary disk on Managed Block Storage
6. start VM
6. try to migrate the VM

Actual results:
Migration fails

Expected results:
Migration succeeds

Additional info:
CentOS 7
cinderlib (0.3.9)
openstack-cinder-13.0.4-1.el7.noarch
ceph 12.2.11

Comment 1 Benny Zlotnik 2019-04-08 14:27:40 UTC
Thanks for the report!

We already fixed some of the error handling to provide a clearer error, instead of NPEs


If you need to workaround this, you can manually detach the volume by running the following on the relevant host:

$ vdsm-client ManagedVolume detach_volume vol_id=<vol_id>


I couldn't find how the 2f053070-f5b7-4f04-856c-87a56d70cd75 volume was already attached to target ov-test-04-01, was it attached previously?

Comment 2 matthias.leopold 2019-04-08 14:35:37 UTC
Yes, the log starts when VM is already running. I didn't think of that (just used timestamp 15:** because it was handy at that time). Do you need logs for the whole process? I can only provide them tomorrow.

Comment 3 Benny Zlotnik 2019-04-08 14:42:07 UTC
(In reply to matthias.leopold from comment #2)
> Yes, the log starts when VM is already running. I didn't think of that (just
> used timestamp 15:** because it was handy at that time). Do you need logs
> for the whole process? I can only provide them tomorrow.

yes, it will be useful

Comment 4 matthias.leopold 2019-04-09 12:16:06 UTC
I turns out that the original migration error was 

libvirtError: Unsafe migration: Migration may lead to data corruption if disks use cache != none or cache != directsync

This stems from our using "viodiskcache=writeback" custom VM property. If I understand this correctly this isn't needed anymore with kernel rbd devices as used with cinderlib.

The error "attach_volume error=Managed Volume is already attached." is a follow up after the first failed migration, when there is a leftover rbdmapped device.
The problem is resolved, this ticket can be closed.

thank you
Matthias

Comment 5 Benny Zlotnik 2019-04-10 08:30:51 UTC
Thanks Matthias!

Feel free to report any issue you encounter

Comment 6 Nir Soffer 2019-04-12 15:31:08 UTC
Benny, why did we have leftover managed volume after migration failure?

Smells like a bug in engine cleanup after migration.

Comment 7 Benny Zlotnik 2019-04-15 12:24:16 UTC
(In reply to Nir Soffer from comment #6)
> Benny, why did we have leftover managed volume after migration failure?
> 
> Smells like a bug in engine cleanup after migration.
we have a bug for this issue

Comment 8 Benny Zlotnik 2019-05-05 11:21:34 UTC
Reopening, I must have confused this with another bug since I can't find it

Comment 9 RHEL Program Management 2019-06-18 11:10:21 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 10 Shir Fishbain 2019-07-03 16:32:04 UTC
I can't verify this bug because it's impossible to start VM with Ceph driver, I opened a new bug in this issue :
https://bugzilla.redhat.com/show_bug.cgi?id=1726758

Comment 11 Benny Zlotnik 2019-07-04 08:39:21 UTC
As Freddy stated in comment #2, it's a host configuration issue
I posted a fix for the error handling, but it should not block the verification of this bug

Comment 12 Shir Fishbain 2019-07-04 12:06:34 UTC
Verified - Migration succeeds

The storage domain was missing the rbd_ceph_conf property. Benny fixed it in the ceph.conf file of QE env.

ovirt-engine-4.3.5.2-0.1.el7.noarch
vdsm-4.30.22-1.el7ev.x86_64
Cinderlib version : 0.9.0

Comment 13 Sandro Bonazzola 2019-07-30 14:08:26 UTC
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.