Bug 1037439 - [engine-backend] Engine copyImage request to vdsm is transmitted with sdUUID value of an inactive SD
Summary: [engine-backend] Engine copyImage request to vdsm is transmitted with sdUUID ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.4.0
Assignee: Liron Aravot
QA Contact: Ori Gofen
URL:
Whiteboard: storage
Depends On: 1084789
Blocks: 1053100
TreeView+ depends on / blocked
 
Reported: 2013-12-03 07:36 UTC by Elad
Modified: 2016-05-26 01:48 UTC (History)
10 users (show)

Fixed In Version: org.ovirt.engine-root-3.4.0-16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-12 14:06:17 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
amureini: Triaged+


Attachments (Terms of Use)
logs and screenshot (814.98 KB, application/x-gzip)
2013-12-03 07:36 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 26262 0 master MERGED core: handle 'source' domains when adding a vm from template 2020-05-06 18:40:54 UTC
oVirt gerrit 27111 0 ovirt-engine-3.4 MERGED core: handle 'source' domains when adding a vm from template 2020-05-06 18:40:54 UTC

Description Elad 2013-12-03 07:36:47 UTC
Created attachment 831931 [details]
logs and screenshot

Description of problem:
I tried to create a VM from template, which has 1 disk that exists on 2 storage domains. One of the domains is in maintenance and the other one is active.
Engine request to vdsm of copyImage was sent with sdUUID value of the inactive domain.


Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.37.beta1.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. have a template and copy it to another storage domain
2. maintenance the domain that the original template disk is located on
3. create a VM from the template

Actual results:

Engine sends a sdUUID value of the inactive domain as part of the copyImage command:

2013-12-02 17:48:16,915 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (ajp-/127.0.0.1:8702-7) [104393f0] -- copyImage parameters:
                sdUUID=81da9f74-8c9f-46da-abcf-09cb1a77e230
                spUUID=c9fcc2ba-c091-4e29-861e-db132ea6e4b8
                vmGUID=2971ae1f-ce1b-4309-9854-590f0e290b17
                srcImageGUID=d6516ce8-64f8-48d0-9674-a91f8d57e658
                srcVolUUID=3177ebfc-3f47-4419-9f70-c72ea83842de
                dstImageGUID=8349f1b5-30f6-4de6-89f1-52362780e17f
                dstVolUUID=7e1b76b4-88a3-41d0-87d2-2c5026e8c47d
                descr=
                dstSdUUID=cdcc8f69-a462-4532-93d6-68520a70d9ff


domain 81da9f74-8c9f-46da-abcf-09cb1a77e230 is in maintenance:

[root@elad-is ~]# su - postgres -c "psql -U postgres engine -c  'select status,id from storage_domains;'"  | less -S

 status |                  id
--------+--------------------------------------
      0 | 39485c04-1786-4828-85b5-c16fc79b385f
      6 | 4390fcd2-b4fb-4c40-aa9b-ad864a786095
      0 | 2734a84d-47b4-4062-9e29-c53f223602dc
      6 | e1673e6e-c722-4c03-b0e9-cad1ea4dd49e
      3 | cdcc8f69-a462-4532-93d6-68520a70d9ff
      0 | d941cb4c-bb64-4e65-a7d0-b58939bfb190
        | 952fb609-ce68-4b37-95bd-9eb84efaf5fd
      3 | 3bbf19c4-3a0d-4a91-883a-9824245659ee
      0 | 87195d7a-f6de-4e3b-81d1-653e6f5485e9
      0 | 1be37dd3-2d0a-4f88-b362-7e488988ea35
      0 | 49830799-1362-462b-89b9-f980d4375d38
      3 | 228e9d0b-2ca3-4111-83c5-91d8443164fd
        | 71f8337f-bd06-42f6-844e-32a371d65663
      3 | 7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6
      0 | 7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6
        | 91924d83-1f46-4f36-9b79-42943f799ed7
      6 | 81da9f74-8c9f-46da-abcf-09cb1a77e230
(17 rows)

vdsm response to engine with code 205 error:

2013-12-02 17:48:17,183 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (ajp-/127.0.0.1:8702-7) [104393f0] Command org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand throw Vdc Bll exception. With error message VdcBLLException: VolumeCreationError (Failed with error VolumeCreationError and code 205)


The operation fails with:

2013-12-02 17:48:17,208 ERROR [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (ajp-/127.0.0.1:8702-7) [104393f0] Command org.ovirt.engine.core.bll.AddVmFromTemplateCommand throw exception: javax.ejb.EJBTransactionRolledbackException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction

Expected results:
Engine should choose to copy the image from a valid domain.

Additional info: logs and screenshot

Comment 1 Vered Volansky 2013-12-03 19:33:30 UTC
In AddVmFromTemplateCommand.buildCreateCloneOfTemplateParameters() -
params.setStorageDomainId(disk.getStorageIds().get(0)) .
Needs a verification of the storage domain status and possibly iterating further.
Should also fail if there's no active domain.

Comment 2 Allon Mureinik 2013-12-04 06:25:00 UTC
Both canDoAction and execute should have go over all the SDs the disk is present on and select the first(?) active one.
If there is none, CDA should fail.

As step 2, we can consider a smarter logic here, like making sure all disks are copied from the same domain (less points of failure), or from different domains (poor man's striping).

Comment 3 Ayal Baron 2014-02-16 09:58:24 UTC
Tal, any update on this issue?

Comment 4 Tal Nisan 2014-02-16 10:24:30 UTC
Not yet, will look into it this week

Comment 5 Liron Aravot 2014-03-27 09:21:00 UTC
Seems like the issue is virtish - on related add vm (from template) flows/hirerchy there are checks/initializations that should be added on the right places in that hirerchy with coorporation with the virt feature/supported resource allocation selections.

adding needinfo? on michal before moving it.

Comment 6 Michal Skrivanek 2014-03-27 13:55:28 UTC
anything related to disks is still "storage", even in create from template flows. Resource allocation is SLA:)
If you got a fix already please add a virt maintainer to the review, if not yet please check with Omer, AFAIK it should not be related to resource allocation, but I may be wrong

Comment 7 Ori Gofen 2014-05-12 13:53:35 UTC
blocked due to https://bugzilla.redhat.com/show_bug.cgi?id=1084789

Comment 8 Allon Mureinik 2014-05-14 13:47:21 UTC
(In reply to Ori from comment #7)
> blocked due to https://bugzilla.redhat.com/show_bug.cgi?id=1084789

Please use the "Depends On" field for these kind of updates.
BTW, bug 1084789 is also ON_QA, so this one is probably up for verification too.

Comment 9 Ori Gofen 2014-05-18 08:31:42 UTC
verified on av9.1

Comment 10 Itamar Heim 2014-06-12 14:06:17 UTC
Closing as part of 3.4.0


Note You need to log in before you can comment on or make changes to this bug.