Bug 1161021

Summary: [BLOCKED] Attach of Imported Storage Domain (still attached in the meta data) fails when calling force detach with json RPC
Product: [Retired] oVirt Reporter: Maor <mlipchuk>
Component: ovirt-engine-coreAssignee: Maor <mlipchuk>
Status: CLOSED WORKSFORME QA Contact: Pavel Stehlik <pstehlik>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5CC: amureini, bazulay, bugs, ecohen, gklein, iheim, lsurette, mlipchuk, raul.laansoo, rbalakri, s.kieske, yeylon
Target Milestone: ---   
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-18 07:33:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1162283, 1193195    
Attachments:
Description Flags
engine log
none
vdsm.log
none
dmesg.log
none
node logs none

Description Maor 2014-11-06 07:45:50 UTC
Created attachment 954332 [details]
engine log

Description of problem:
The user tried to import an existing FC storage domain and failed with the following errors in the engine:

[DetachStorageDomainVDSCommand] (ajp--127.0.0.1-8702-2) [69fba16c] Could not force detach domain 46243ce5-face-483e-9a40-7daea77d82a3 on pool 4e14574d-9472-4e4a-a44a-140acbb790bb. error: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DetachStorageDomainVDS, error = detach() takes exactly 5 arguments (3 given), code = -32603
2014-11-04 09:47:31,980 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp--127.0.0.1-8702-2) [69fba16c] FINISH, DetachStorageDomainVDSCommand, log id: 5b4d7f9a

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a Storage Domain with FC on a Data Center using Host with json RPC
2. Destroy the domain while it is still attached to the Data Center
3. Try to import the Storage Domain to another Data Center

Actual results:
Fails to attach the Storage Domain since the detach is failed in the VDSM on the error mentioned above.

Expected results:
The Storage Domain should be attached

Additional info:
The error only reproduce when the user use json RPC.

Comment 1 Barak 2014-11-06 12:51:52 UTC
Maor,

1 - I assume it works with XML-RPC ?
2 - what is the Destroy operation mentioned in #2 of the description ?
3 - was it detached first from the initial ovirt setup it was attached to ?

Comment 2 Raul Laansoo 2014-11-06 14:01:27 UTC
1. Yes, I can import existing FC storage domain when I de-select 'Use JSON protocol' in host settings. Without this, I get the error.
2. It basically means that I do live snapshot of all LUNs on NetAPP and then attach those LUNs to some other FC datacenter.
3. No, it's recovery scenario. No orderly detach from initial oVirt data center.

Comment 3 Allon Mureinik 2014-11-06 15:56:14 UTC
Maor, can you add VDSM's log too please?

Comment 4 Piotr Kliczewski 2014-11-07 11:28:32 UTC
Please provide vdsm logs and specify vdsm version that you used for testing.

Comment 5 Raul Laansoo 2014-11-07 11:44:34 UTC
Created attachment 954905 [details]
vdsm.log

Attached logs.

vdsm-4.16.0-3.git601f786.el6

Comment 6 Piotr Kliczewski 2014-11-07 13:19:48 UTC
It seems that you are using quite old version of vdsm. Can you please update and retest?

Comment 7 Raul Laansoo 2014-11-07 13:59:11 UTC
Created attachment 954930 [details]
dmesg.log

Yes, I know. This is because another problem I experience. Every time I configure node with latest vdsm packages (either install them on bare CentOS or use oVirt node image), after adding node to the engine, all my storage devices go offline -- all multipath devices fail or disks got corrupted after vdsm service restart. I have attached the output of dmesg of one such case.

Comment 8 Piotr Kliczewski 2014-11-07 14:19:13 UTC
Compared the code with the logs that you provided and I can see that patches from beginning of August are missing so you vdsm is older. In the meantime we experienced the issue that you found but it was fixed.

I think that we need to fix your issue with latest vdsm and the problem that you reported in this bug will go away.

Allon can you help to solve multipath issue?

Comment 9 Raul Laansoo 2014-11-07 15:52:42 UTC
Created attachment 954997 [details]
node logs

I have attached all the logs from failed node.

Comment 10 Allon Mureinik 2014-11-10 15:32:19 UTC
(In reply to Raul Laansoo from comment #7)
> Created attachment 954930 [details]
> dmesg.log
> 
> Yes, I know. This is because another problem I experience. Every time I
> configure node with latest vdsm packages (either install them on bare CentOS
> or use oVirt node image), after adding node to the engine, all my storage
> devices go offline -- all multipath devices fail or disks got corrupted
> after vdsm service restart. I have attached the output of dmesg of one such
> case.

(In reply to Piotr Kliczewski from comment #8)
> Allon can you help to solve multipath issue?

Maor, please take this one.

Comment 11 Maor 2014-11-10 15:51:07 UTC
(In reply to Allon Mureinik from comment #10)
> (In reply to Raul Laansoo from comment #7)
> > Created attachment 954930 [details]
> > dmesg.log
> > 
> > Yes, I know. This is because another problem I experience. Every time I
> > configure node with latest vdsm packages (either install them on bare CentOS
> > or use oVirt node image), after adding node to the engine, all my storage
> > devices go offline -- all multipath devices fail or disks got corrupted
> > after vdsm service restart. I have attached the output of dmesg of one such
> > case.
> 
> (In reply to Piotr Kliczewski from comment #8)
> > Allon can you help to solve multipath issue?
> 
> Maor, please take this one.

I prefer to open a new bug on that, this bug is only relevant for the json scenario.
Raul, can you please open a new bug describing the steps to reproduce?

Comment 13 Barak 2014-11-12 16:40:26 UTC
Changed the whiteboard to storage - the infra (json-rpc) related issues were already fixed.

Comment 14 Raul Laansoo 2014-11-18 07:00:28 UTC
I have now fixed the problem with multipath and attaching storage domain works. I see only one error:

2014-11-18 08:51:25,711 ERROR [org.ovirt.engine.core.bll.GetUnregisteredDisksQuery] (ajp--127.0.0.1-8702-1) [48368e54] Could not get popula
ted disk, reason: null

Is it at all possible to recover VMs from v3.4 storage domain imported into 3.5 datacenter or is this only possible with v3.5 storage domains?

Comment 15 Maor 2014-11-18 07:31:28 UTC
(In reply to Raul Laansoo from comment #14)
> I have now fixed the problem with multipath and attaching storage domain
> works. I see only one error:
> 
> 2014-11-18 08:51:25,711 ERROR
> [org.ovirt.engine.core.bll.GetUnregisteredDisksQuery]
> (ajp--127.0.0.1-8702-1) [48368e54] Could not get popula
> ted disk, reason: null

Great!
This is a redundant warning, you can ignore this it should not influence the process.
There is already an open bug on this: https://bugzilla.redhat.com/1136808
and also a patch for fixing it: http://gerrit.ovirt.org/#/c/33154/

> 
> Is it at all possible to recover VMs from v3.4 storage domain imported into
> 3.5 datacenter or is this only possible with v3.5 storage domains?

You can only recover Storage Domains which contained OVF_STORE disk, currently this is only supported for 3.5 Data Center.

I've changed the wiki to indicate this:
http://www.ovirt.org/Features/ImportStorageDomain#General_Functionality

Comment 16 Maor 2014-11-18 07:33:40 UTC
Since it seems that attaching storage domain works with json and it does not reproduce, I'm closing the bug.

Please reopen if you see that it reproduce again.