Bug 1162756

Summary: [engine-backend] [automation] AddDiskCommand throws a NullPointerException
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Nobody <nobody>
Status: CLOSED DUPLICATE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: amureini, dfediuck, ecohen, gchaplik, gklein, iheim, istein, lpeer, lsurette, mavital, rbalakri, Rhev-m-bugs, tnisan, vered, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-09 12:33:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs from Jenkins, incldes logs from host and from engine none

Description Elad 2014-11-11 16:06:26 UTC
Created attachment 956342 [details]
logs from Jenkins, incldes logs from host and from engine

Description of problem:
One of our Jenkins job fails with:

04:49:22 2014-11-08 04:49:23,240 - MainThread - api_utils - ERROR - Failed to delete element:
04:49:22 	Status: 400
04:49:22 	Reason: Bad Request
04:49:22 	Detail: [Cannot remove VM: Storage Domain cannot be accessed.
04:49:22 -Please check that at least one Host is operational and Data Center state is up.]
04:49:22 
04:49:22 2014-11-08 04:49:23,242 - MainThread - vms - ERROR - Response code is not valid, expected is: [200, 202, 204], actual is: 400 

In engine.log we see:

2014-11-08 04:49:14,900 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-6) Operation Failed: [Cannot attach Storage. There is no activ
e Host in the Data Center.]
2014-11-08 04:49:17,034 ERROR [org.ovirt.engine.core.bll.AddDiskCommand] (ajp-/127.0.0.1:8702-2) [disks_create_758c5dcd-fc85-4371] Error during CanDoActionFailure.: java.lang.Nu
llPointerException
        at org.ovirt.engine.core.bll.AddDiskCommand.setAndValidateDiskProfiles(AddDiskCommand.java:601) [bll.jar:]
        at org.ovirt.engine.core.bll.AddDiskCommand.canDoAction(AddDiskCommand.java:121) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalCanDoAction(CommandBase.java:744) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:338) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:369) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor237.invoke(Unknown Source) [:1.7.0_71]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71]

 

Version-Release number of selected component (if applicable):
rhev 3.5 vt9

How reproducible:
Always fails this job:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_read_only_disks-nfs/9/consoleFull

Steps to Reproduce:
Not sure about the steps, according to engine.log and the console putput from Jenkins, seems to happen while adding a disk:

AddVmFromScratchCommand

The job supposed to test the RO disk feature 

Actual results:
The failure casues the automation job to fail


Additional info:
logs from Jenkins, incldes logs from host and from engine

Comment 1 Ilanit Stein 2014-11-11 16:43:52 UTC
I had this same error on another automatic test, run on 3.4 (av13):
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-reg_vms-nfs/60/consoleFull

It failed vm removal.

Comment 2 Ilanit Stein 2014-11-11 17:11:53 UTC
Also had this same problem on another 3.4 test (av13), http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-templates-nfs/71/consoleFull
(see 18:16:35 	Detail: [Cannot remove VM: Storage Domain cannot be accessed.)
which failed VM removal.
but on next test run, test PASS:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-templates-nfs/72/

Comment 3 Vered Volansky 2014-11-19 12:02:07 UTC
setAndValidateDiskProfiles is an sla flow, moving to sla so the subject matter experts can investigate.

Comment 4 Gilad Chaplik 2014-12-08 16:15:45 UTC
Vered, the NPE in disk profiles caused because no SD was provided. this was solved in bug 1168525. Ilanit has mentioned in comment 1 and comment 2 the same issue occurs also in 3.4.

I can close the bug as duplicate, but once you clear the NPE, I sense there's a storage issue or a problem with the test, what do you say?

Comment 5 Vered Volansky 2014-12-09 05:21:04 UTC
Hi Gilad, you can go ahead and close. This bug doesn't have enough info regarding the flow / reproduction before the NPE as it is. So even though you're probably right, we'll just have to get to it when we get a clear bug or stumble on it ourselves. Thanks for the heads up.

Comment 6 Elad 2014-12-09 07:16:14 UTC
These are the steps that were executed as part of this automation job:


- Create a VM and insall OS
- Attach a second RO disk to the VM and hotplug it
- Kill qemu process of the VM
- Start the VM again

link to the test case in TCMS:
https://tcms.engineering.redhat.com/case/334921/

The operation of disk creation failed as we can learn from the job console log:

04:49:01 DiskNotFound: Disk virtio_cow_True_disk was not found in vm's Global_vm_1 disk collection

Console log:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_read_only_disks-nfs/9/consoleFull

Comment 7 Gilad Chaplik 2014-12-09 12:10:06 UTC
According to Elad's comment, moving back to storage.

Comment 8 Allon Mureinik 2014-12-09 12:33:44 UTC
The automation tries to deactivate an inactive disk:

04:41:22 2014-11-08 04:41:23,189 - MainThread - api_utils - ERROR - Failed to syncAction element:
04:41:22 	Status: 409
04:41:22 	Reason: Conflict
04:41:22 	Detail: [Disk is already deactivated.]
04:41:22 

In any event, this has nothing to do with the NPE that was originally reported.
Elad - if there's a real, consistent, failure in the automation, please open a new bugs and supply the logs.

Closing this one as a dup, as Gilad suggested.

*** This bug has been marked as a duplicate of bug 1168525 ***