1162756 – [engine-backend] [automation] AddDiskCommand throws a NullPointerException

Bug 1162756 - [engine-backend] [automation] AddDiskCommand throws a NullPointerException

Summary: [engine-backend] [automation] AddDiskCommand throws a NullPointerException

Keywords:
Status:	CLOSED DUPLICATE of bug 1168525
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.5.0
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Nobody
QA Contact:
Docs Contact:
URL:
Whiteboard:	sla
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-11-11 16:06 UTC by Elad
Modified:	2016-02-10 20:18 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-12-09 12:33:44 UTC
oVirt Team:	SLA
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs from Jenkins, incldes logs from host and from engine (1.32 MB, application/x-bzip) 2014-11-11 16:06 UTC, Elad	no flags	Details
View All

Description Elad 2014-11-11 16:06:26 UTC

Created attachment 956342 [details]
logs from Jenkins, incldes logs from host and from engine

Description of problem:
One of our Jenkins job fails with:

04:49:22 2014-11-08 04:49:23,240 - MainThread - api_utils - ERROR - Failed to delete element:
04:49:22 	Status: 400
04:49:22 	Reason: Bad Request
04:49:22 	Detail: [Cannot remove VM: Storage Domain cannot be accessed.
04:49:22 -Please check that at least one Host is operational and Data Center state is up.]
04:49:22 
04:49:22 2014-11-08 04:49:23,242 - MainThread - vms - ERROR - Response code is not valid, expected is: [200, 202, 204], actual is: 400 

In engine.log we see:

2014-11-08 04:49:14,900 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-6) Operation Failed: [Cannot attach Storage. There is no activ
e Host in the Data Center.]
2014-11-08 04:49:17,034 ERROR [org.ovirt.engine.core.bll.AddDiskCommand] (ajp-/127.0.0.1:8702-2) [disks_create_758c5dcd-fc85-4371] Error during CanDoActionFailure.: java.lang.Nu
llPointerException
        at org.ovirt.engine.core.bll.AddDiskCommand.setAndValidateDiskProfiles(AddDiskCommand.java:601) [bll.jar:]
        at org.ovirt.engine.core.bll.AddDiskCommand.canDoAction(AddDiskCommand.java:121) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalCanDoAction(CommandBase.java:744) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:338) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:369) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor237.invoke(Unknown Source) [:1.7.0_71]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71]

 

Version-Release number of selected component (if applicable):
rhev 3.5 vt9

How reproducible:
Always fails this job:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_read_only_disks-nfs/9/consoleFull

Steps to Reproduce:
Not sure about the steps, according to engine.log and the console putput from Jenkins, seems to happen while adding a disk:

AddVmFromScratchCommand

The job supposed to test the RO disk feature 

Actual results:
The failure casues the automation job to fail


Additional info:
logs from Jenkins, incldes logs from host and from engine

Comment 1 Ilanit Stein 2014-11-11 16:43:52 UTC

I had this same error on another automatic test, run on 3.4 (av13):
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-reg_vms-nfs/60/consoleFull

It failed vm removal.

Comment 2 Ilanit Stein 2014-11-11 17:11:53 UTC

Also had this same problem on another 3.4 test (av13), http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-templates-nfs/71/consoleFull
(see 18:16:35 	Detail: [Cannot remove VM: Storage Domain cannot be accessed.)
which failed VM removal.
but on next test run, test PASS:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-templates-nfs/72/

Comment 3 Vered Volansky 2014-11-19 12:02:07 UTC

setAndValidateDiskProfiles is an sla flow, moving to sla so the subject matter experts can investigate.

Comment 4 Gilad Chaplik 2014-12-08 16:15:45 UTC

Vered, the NPE in disk profiles caused because no SD was provided. this was solved in bug 1168525. Ilanit has mentioned in comment 1 and comment 2 the same issue occurs also in 3.4.

I can close the bug as duplicate, but once you clear the NPE, I sense there's a storage issue or a problem with the test, what do you say?

Comment 5 Vered Volansky 2014-12-09 05:21:04 UTC

Hi Gilad, you can go ahead and close. This bug doesn't have enough info regarding the flow / reproduction before the NPE as it is. So even though you're probably right, we'll just have to get to it when we get a clear bug or stumble on it ourselves. Thanks for the heads up.

Comment 6 Elad 2014-12-09 07:16:14 UTC

These are the steps that were executed as part of this automation job:


- Create a VM and insall OS
- Attach a second RO disk to the VM and hotplug it
- Kill qemu process of the VM
- Start the VM again

link to the test case in TCMS:
https://tcms.engineering.redhat.com/case/334921/

The operation of disk creation failed as we can learn from the job console log:

04:49:01 DiskNotFound: Disk virtio_cow_True_disk was not found in vm's Global_vm_1 disk collection

Console log:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_read_only_disks-nfs/9/consoleFull

Comment 7 Gilad Chaplik 2014-12-09 12:10:06 UTC

According to Elad's comment, moving back to storage.

Comment 8 Allon Mureinik 2014-12-09 12:33:44 UTC

The automation tries to deactivate an inactive disk:

04:41:22 2014-11-08 04:41:23,189 - MainThread - api_utils - ERROR - Failed to syncAction element:
04:41:22 	Status: 409
04:41:22 	Reason: Conflict
04:41:22 	Detail: [Disk is already deactivated.]
04:41:22 

In any event, this has nothing to do with the NPE that was originally reported.
Elad - if there's a real, consistent, failure in the automation, please open a new bugs and supply the logs.

Closing this one as a dup, as Gilad suggested.

*** This bug has been marked as a duplicate of bug 1168525 ***

Note You need to log in before you can comment on or make changes to this bug.