Created attachment 956342 [details] logs from Jenkins, incldes logs from host and from engine Description of problem: One of our Jenkins job fails with: 04:49:22 2014-11-08 04:49:23,240 - MainThread - api_utils - ERROR - Failed to delete element: 04:49:22 Status: 400 04:49:22 Reason: Bad Request 04:49:22 Detail: [Cannot remove VM: Storage Domain cannot be accessed. 04:49:22 -Please check that at least one Host is operational and Data Center state is up.] 04:49:22 04:49:22 2014-11-08 04:49:23,242 - MainThread - vms - ERROR - Response code is not valid, expected is: [200, 202, 204], actual is: 400 In engine.log we see: 2014-11-08 04:49:14,900 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-6) Operation Failed: [Cannot attach Storage. There is no activ e Host in the Data Center.] 2014-11-08 04:49:17,034 ERROR [org.ovirt.engine.core.bll.AddDiskCommand] (ajp-/127.0.0.1:8702-2) [disks_create_758c5dcd-fc85-4371] Error during CanDoActionFailure.: java.lang.Nu llPointerException at org.ovirt.engine.core.bll.AddDiskCommand.setAndValidateDiskProfiles(AddDiskCommand.java:601) [bll.jar:] at org.ovirt.engine.core.bll.AddDiskCommand.canDoAction(AddDiskCommand.java:121) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalCanDoAction(CommandBase.java:744) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:338) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:369) [bll.jar:] at sun.reflect.GeneratedMethodAccessor237.invoke(Unknown Source) [:1.7.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71] Version-Release number of selected component (if applicable): rhev 3.5 vt9 How reproducible: Always fails this job: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_read_only_disks-nfs/9/consoleFull Steps to Reproduce: Not sure about the steps, according to engine.log and the console putput from Jenkins, seems to happen while adding a disk: AddVmFromScratchCommand The job supposed to test the RO disk feature Actual results: The failure casues the automation job to fail Additional info: logs from Jenkins, incldes logs from host and from engine
I had this same error on another automatic test, run on 3.4 (av13): http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-reg_vms-nfs/60/consoleFull It failed vm removal.
Also had this same problem on another 3.4 test (av13), http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-templates-nfs/71/consoleFull (see 18:16:35 Detail: [Cannot remove VM: Storage Domain cannot be accessed.) which failed VM removal. but on next test run, test PASS: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-templates-nfs/72/
setAndValidateDiskProfiles is an sla flow, moving to sla so the subject matter experts can investigate.
Vered, the NPE in disk profiles caused because no SD was provided. this was solved in bug 1168525. Ilanit has mentioned in comment 1 and comment 2 the same issue occurs also in 3.4. I can close the bug as duplicate, but once you clear the NPE, I sense there's a storage issue or a problem with the test, what do you say?
Hi Gilad, you can go ahead and close. This bug doesn't have enough info regarding the flow / reproduction before the NPE as it is. So even though you're probably right, we'll just have to get to it when we get a clear bug or stumble on it ourselves. Thanks for the heads up.
These are the steps that were executed as part of this automation job: - Create a VM and insall OS - Attach a second RO disk to the VM and hotplug it - Kill qemu process of the VM - Start the VM again link to the test case in TCMS: https://tcms.engineering.redhat.com/case/334921/ The operation of disk creation failed as we can learn from the job console log: 04:49:01 DiskNotFound: Disk virtio_cow_True_disk was not found in vm's Global_vm_1 disk collection Console log: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_read_only_disks-nfs/9/consoleFull
According to Elad's comment, moving back to storage.
The automation tries to deactivate an inactive disk: 04:41:22 2014-11-08 04:41:23,189 - MainThread - api_utils - ERROR - Failed to syncAction element: 04:41:22 Status: 409 04:41:22 Reason: Conflict 04:41:22 Detail: [Disk is already deactivated.] 04:41:22 In any event, this has nothing to do with the NPE that was originally reported. Elad - if there's a real, consistent, failure in the automation, please open a new bugs and supply the logs. Closing this one as a dup, as Gilad suggested. *** This bug has been marked as a duplicate of bug 1168525 ***