Bug 1697496

Summary: "attach_volume error=Managed Volume is already attached." when migrating VM with Managed Block Storage (Ceph RBD)
Product: [oVirt] ovirt-engine Reporter: matthias.leopold
Component: GeneralAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED CURRENTRELEASE QA Contact: Shir Fishbain <sfishbai>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.2CC: bugs, bzlotnik, nsoffer, tnisan
Target Milestone: ovirt-4.3.5Keywords: Reopened
Target Release: 4.3.5.2Flags: pm-rhel: ovirt-4.3+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.5.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-30 14:08:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine, vdsm, supervdsm logs none

Description matthias.leopold 2019-04-08 14:07:54 UTC
Created attachment 1553646 [details]
engine, vdsm, supervdsm logs

Description of problem:
Migration of VM with Managed Block Storage disk (Ceph RBD, non OS disk) fails with "EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001)" in engine.log.
Reason seems to be "error=Managed Volume is already attached." in vdsm.log of receiving host.

Version-Release number of selected component (if applicable):
vdsm-4.30.11-1.el7.x86_64

How reproducible:
Start migration of VM with Managed Block Storage disk in oVirt 4.3.2

Steps to Reproduce:
1. set up oVirt 4.3.2 with ManagedBlockDomainSupported=true
2. install openstack-cinder + cinderlib on engine host
3. install python2-os-brick on hypervisor hosts
4. create "Managed Block Storage" domain
5. create VM with OS disk on iSCSI storage and secondary disk on Managed Block Storage
6. start VM
6. try to migrate the VM

Actual results:
Migration fails

Expected results:
Migration succeeds

Additional info:
CentOS 7
cinderlib (0.3.9)
openstack-cinder-13.0.4-1.el7.noarch
ceph 12.2.11

Comment 1 Benny Zlotnik 2019-04-08 14:27:40 UTC
Thanks for the report!

We already fixed some of the error handling to provide a clearer error, instead of NPEs


If you need to workaround this, you can manually detach the volume by running the following on the relevant host:

$ vdsm-client ManagedVolume detach_volume vol_id=<vol_id>


I couldn't find how the 2f053070-f5b7-4f04-856c-87a56d70cd75 volume was already attached to target ov-test-04-01, was it attached previously?

Comment 2 matthias.leopold 2019-04-08 14:35:37 UTC
Yes, the log starts when VM is already running. I didn't think of that (just used timestamp 15:** because it was handy at that time). Do you need logs for the whole process? I can only provide them tomorrow.

Comment 3 Benny Zlotnik 2019-04-08 14:42:07 UTC
(In reply to matthias.leopold from comment #2)
> Yes, the log starts when VM is already running. I didn't think of that (just
> used timestamp 15:** because it was handy at that time). Do you need logs
> for the whole process? I can only provide them tomorrow.

yes, it will be useful

Comment 4 matthias.leopold 2019-04-09 12:16:06 UTC
I turns out that the original migration error was 

libvirtError: Unsafe migration: Migration may lead to data corruption if disks use cache != none or cache != directsync

This stems from our using "viodiskcache=writeback" custom VM property. If I understand this correctly this isn't needed anymore with kernel rbd devices as used with cinderlib.

The error "attach_volume error=Managed Volume is already attached." is a follow up after the first failed migration, when there is a leftover rbdmapped device.
The problem is resolved, this ticket can be closed.

thank you
Matthias

Comment 5 Benny Zlotnik 2019-04-10 08:30:51 UTC
Thanks Matthias!

Feel free to report any issue you encounter

Comment 6 Nir Soffer 2019-04-12 15:31:08 UTC
Benny, why did we have leftover managed volume after migration failure?

Smells like a bug in engine cleanup after migration.

Comment 7 Benny Zlotnik 2019-04-15 12:24:16 UTC
(In reply to Nir Soffer from comment #6)
> Benny, why did we have leftover managed volume after migration failure?
> 
> Smells like a bug in engine cleanup after migration.
we have a bug for this issue

Comment 8 Benny Zlotnik 2019-05-05 11:21:34 UTC
Reopening, I must have confused this with another bug since I can't find it

Comment 9 RHEL Program Management 2019-06-18 11:10:21 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 10 Shir Fishbain 2019-07-03 16:32:04 UTC
I can't verify this bug because it's impossible to start VM with Ceph driver, I opened a new bug in this issue :
https://bugzilla.redhat.com/show_bug.cgi?id=1726758

Comment 11 Benny Zlotnik 2019-07-04 08:39:21 UTC
As Freddy stated in comment #2, it's a host configuration issue
I posted a fix for the error handling, but it should not block the verification of this bug

Comment 12 Shir Fishbain 2019-07-04 12:06:34 UTC
Verified - Migration succeeds

The storage domain was missing the rbd_ceph_conf property. Benny fixed it in the ceph.conf file of QE env.

ovirt-engine-4.3.5.2-0.1.el7.noarch
vdsm-4.30.22-1.el7ev.x86_64
Cinderlib version : 0.9.0

Comment 13 Sandro Bonazzola 2019-07-30 14:08:26 UTC
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.