Bug 783083

Summary: NPE during SD removal
Product: [Retired] oVirt Reporter: Jakub Libosvar <jlibosva>
Component: ovirt-engine-coreAssignee: Allon Mureinik <amureini>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: acathrow, hateya, iheim, lpeer, ykaul, yzaslavs
Target Milestone: ---Keywords: EasyFix
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-09 08:05:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backend log none

Description Jakub Libosvar 2012-01-19 09:54:13 UTC
Created attachment 556233 [details]
Backend log

Description of problem:
After data-center was removed I wanted to remove iSCSI storage domain. I clicked Remove and storage domain didn't disappear, I clicked again and got NPE in log. This might be caused by late refresh of frontend.

Version-Release number of selected component (if applicable):
ovirt-engine-backend-3.0.0_0001-1.2.fc16.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create iSCSI data-center with storage domain
2. Change DC to maintenance and remove it
3. Remove SD from oVirt and then try to remove SD again and again
  
Actual results:
NPE

Expected results:
This situation should be prevented in frontend or error that SD no longer exists in oVirt.

Additional info:

2012-01-19 10:56:20,221 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (http--0.0.0.0-8443-7) Error during CanDoActionFailure.: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand.isLocalFs(RemoveStorageDomainCommand.java:148) [engine-bll.jar:]
	at org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand.canDoAction(RemoveStorageDomainCommand.java:78) [engine-bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.InternalCanDoAction(CommandBase.java:385) [engine-bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.ExecuteAction(CommandBase.java:205) [engine-bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:262) [engine-bll.jar:]
	at org.ovirt.engine.core.bll.Backend.RunAction(Backend.java:244) [engine-bll.jar:]
	at sun.reflect.GeneratedMethodAccessor139.invoke(Unknown Source) [:1.6.0_22]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [:1.6.0_22]
	at java.lang.reflect.Method.invoke(Method.java:616) [:1.6.0_22]
	at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee-7.1.0.Beta1b.jar:]

Comment 1 Haim 2012-01-30 15:18:32 UTC
Actually, the problem is backend tries to formatStorageDomain twice on remoavle (as we can see in the attached log), and thus, on second tine, vdsm fails to "format" storage domain (as its no longer exist), and return with unexpected exception, which translated to general storage action failure on backend, which triggers NPE. 

2 patches should be sent here, one to fix the flow, second to fix the NPE; I wouldn't eliminate the possibility that both Frontend and backend send identical commands on this flow .. 

2012-01-30 15:58:21,329 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (http--0.0.0.0-8443-5) Running command: RemoveStorageDomainCommand internal: false. Entities affected :  ID: a87e3dc4-c530-4398-b6cf-2df995bcb3
27 Type: Storage
2012-01-30 15:58:21,334 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http--0.0.0.0-8443-5) START, ConnectStorageServerVDSCommand(vdsId = 7da8822e-4b37-11e1-a59c-001a4a013f0a, storagePoolId = 00000000
-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ id: c95dff6a-978e-492e-9b33-28fc6b3664dc, connection: 10.34.63.204 };]), log id: 699f5c1d
2012-01-30 15:58:21,942 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http--0.0.0.0-8443-5) FINISH, ConnectStorageServerVDSCommand, return: {c95dff6a-978e-492e-9b33-28fc6b3664dc=0}, log id: 699f5c1d
2012-01-30 15:58:21,946 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (http--0.0.0.0-8443-5) START, FormatStorageDomainVDSCommand(vdsId = 7da8822e-4b37-11e1-a59c-001a4a013f0a, storageDomainId=a87e3dc4-c
530-4398-b6cf-2df995bcb327), log id: eef1716
2012-01-30 15:58:23,000 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (http--0.0.0.0-8443-5) FINISH, FormatStorageDomainVDSCommand, log id: eef1716
2012-01-30 15:58:23,007 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (http--0.0.0.0-8443-5) START, DisconnectStorageServerVDSCommand(vdsId = 7da8822e-4b37-11e1-a59c-001a4a013f0a, storagePoolId = 00
000000-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ id: c95dff6a-978e-492e-9b33-28fc6b3664dc, connection: 10.34.63.204 };]), log id: 39fbebee
2012-01-30 15:58:24,550 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (http--0.0.0.0-8443-3) Running command: RemoveStorageDomainCommand internal: false. Entities affected :  ID: a87e3dc4-c530-4398-b6cf-2df995bcb3
27 Type: Storage
2012-01-30 15:58:24,558 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http--0.0.0.0-8443-3) START, ConnectStorageServerVDSCommand(vdsId = 7da8822e-4b37-11e1-a59c-001a4a013f0a, storagePoolId = 00000000
-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ id: c95dff6a-978e-492e-9b33-28fc6b3664dc, connection: 10.34.63.204 };]), log id: b8ef351
2012-01-30 15:58:25,158 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http--0.0.0.0-8443-3) FINISH, ConnectStorageServerVDSCommand, return: {c95dff6a-978e-492e-9b33-28fc6b3664dc=0}, log id: b8ef351
2012-01-30 15:58:25,162 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (http--0.0.0.0-8443-3) START, FormatStorageDomainVDSCommand(vdsId = 7da8822e-4b37-11e1-a59c-001a4a013f0a, storageDomainId=a87e3dc4-c
530-4398-b6cf-2df995bcb327), log id: 67e7882b
2012-01-30 15:58:25,504 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http--0.0.0.0-8443-3) Failed in FormatStorageDomainVDS method
2012-01-30 15:58:25,504 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http--0.0.0.0-8443-3) Error code StorageDomainActionError and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDoma
inVDS, error = Error in storage domain action: ('sdUUID=a87e3dc4-c530-4398-b6cf-2df995bcb327',)
2012-01-30 15:58:25,505 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http--0.0.0.0-8443-3) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         350
mMessage                      Error in storage domain action: ('sdUUID=a87e3dc4-c530-4398-b6cf-2df995bcb327',)

Comment 2 Jakub Libosvar 2012-01-30 15:28:51 UTC
(In reply to comment #1)
> Actually, the problem is backend tries to formatStorageDomain twice on remoavle
> (as we can see in the attached log)

Actually I tried manually to remove storage domain several times (till it still exists in UI), backend calls it just once correctly. I wanted to point just for the NPE in this steps. Therefore RemoveStorageDomainCommand was called on non-existing object.

Comment 3 Allon Mureinik 2012-04-01 06:18:39 UTC
The problem was recreated in a unit-test (without UI).

A patch fixing it was submitted to gerrit, pending approval.

Comment 4 Yair Zaslavsky 2012-04-01 06:24:22 UTC
Can you please submit a link to gerrit here?

Comment 5 Allon Mureinik 2012-04-01 06:28:30 UTC
(In reply to comment #4)
> Can you please submit a link to gerrit here?

By all means:
http://gerrit.ovirt.org/#change,3221

Comment 6 Allon Mureinik 2012-04-01 12:13:08 UTC
Change was merged.

Comment 7 Allon Mureinik 2012-04-01 12:28:58 UTC
GitWeb for the relevant commit hash:
http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=1b57ae4414c239becfed6396ed984533dfb9f0c9

Comment 8 Itamar Heim 2012-08-09 08:05:30 UTC
closing ON_QA bugs as oVirt 3.1 was released:
http://www.ovirt.org/get-ovirt/