Created attachment 831931 [details] logs and screenshot Description of problem: I tried to create a VM from template, which has 1 disk that exists on 2 storage domains. One of the domains is in maintenance and the other one is active. Engine request to vdsm of copyImage was sent with sdUUID value of the inactive domain. Version-Release number of selected component (if applicable): rhevm-3.3.0-0.37.beta1.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1. have a template and copy it to another storage domain 2. maintenance the domain that the original template disk is located on 3. create a VM from the template Actual results: Engine sends a sdUUID value of the inactive domain as part of the copyImage command: 2013-12-02 17:48:16,915 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (ajp-/127.0.0.1:8702-7) [104393f0] -- copyImage parameters: sdUUID=81da9f74-8c9f-46da-abcf-09cb1a77e230 spUUID=c9fcc2ba-c091-4e29-861e-db132ea6e4b8 vmGUID=2971ae1f-ce1b-4309-9854-590f0e290b17 srcImageGUID=d6516ce8-64f8-48d0-9674-a91f8d57e658 srcVolUUID=3177ebfc-3f47-4419-9f70-c72ea83842de dstImageGUID=8349f1b5-30f6-4de6-89f1-52362780e17f dstVolUUID=7e1b76b4-88a3-41d0-87d2-2c5026e8c47d descr= dstSdUUID=cdcc8f69-a462-4532-93d6-68520a70d9ff domain 81da9f74-8c9f-46da-abcf-09cb1a77e230 is in maintenance: [root@elad-is ~]# su - postgres -c "psql -U postgres engine -c 'select status,id from storage_domains;'" | less -S status | id --------+-------------------------------------- 0 | 39485c04-1786-4828-85b5-c16fc79b385f 6 | 4390fcd2-b4fb-4c40-aa9b-ad864a786095 0 | 2734a84d-47b4-4062-9e29-c53f223602dc 6 | e1673e6e-c722-4c03-b0e9-cad1ea4dd49e 3 | cdcc8f69-a462-4532-93d6-68520a70d9ff 0 | d941cb4c-bb64-4e65-a7d0-b58939bfb190 | 952fb609-ce68-4b37-95bd-9eb84efaf5fd 3 | 3bbf19c4-3a0d-4a91-883a-9824245659ee 0 | 87195d7a-f6de-4e3b-81d1-653e6f5485e9 0 | 1be37dd3-2d0a-4f88-b362-7e488988ea35 0 | 49830799-1362-462b-89b9-f980d4375d38 3 | 228e9d0b-2ca3-4111-83c5-91d8443164fd | 71f8337f-bd06-42f6-844e-32a371d65663 3 | 7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6 0 | 7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6 | 91924d83-1f46-4f36-9b79-42943f799ed7 6 | 81da9f74-8c9f-46da-abcf-09cb1a77e230 (17 rows) vdsm response to engine with code 205 error: 2013-12-02 17:48:17,183 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (ajp-/127.0.0.1:8702-7) [104393f0] Command org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand throw Vdc Bll exception. With error message VdcBLLException: VolumeCreationError (Failed with error VolumeCreationError and code 205) The operation fails with: 2013-12-02 17:48:17,208 ERROR [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (ajp-/127.0.0.1:8702-7) [104393f0] Command org.ovirt.engine.core.bll.AddVmFromTemplateCommand throw exception: javax.ejb.EJBTransactionRolledbackException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction Expected results: Engine should choose to copy the image from a valid domain. Additional info: logs and screenshot
In AddVmFromTemplateCommand.buildCreateCloneOfTemplateParameters() - params.setStorageDomainId(disk.getStorageIds().get(0)) . Needs a verification of the storage domain status and possibly iterating further. Should also fail if there's no active domain.
Both canDoAction and execute should have go over all the SDs the disk is present on and select the first(?) active one. If there is none, CDA should fail. As step 2, we can consider a smarter logic here, like making sure all disks are copied from the same domain (less points of failure), or from different domains (poor man's striping).
Tal, any update on this issue?
Not yet, will look into it this week
Seems like the issue is virtish - on related add vm (from template) flows/hirerchy there are checks/initializations that should be added on the right places in that hirerchy with coorporation with the virt feature/supported resource allocation selections. adding needinfo? on michal before moving it.
anything related to disks is still "storage", even in create from template flows. Resource allocation is SLA:) If you got a fix already please add a virt maintainer to the review, if not yet please check with Omer, AFAIK it should not be related to resource allocation, but I may be wrong
blocked due to https://bugzilla.redhat.com/show_bug.cgi?id=1084789
(In reply to Ori from comment #7) > blocked due to https://bugzilla.redhat.com/show_bug.cgi?id=1084789 Please use the "Depends On" field for these kind of updates. BTW, bug 1084789 is also ON_QA, so this one is probably up for verification too.
verified on av9.1
Closing as part of 3.4.0