Bug 1460195

Summary: [engine-backend] ExtendSANStorageDomain fails with NullPointerException in case the host is unreachable
Product: [oVirt] ovirt-engine Reporter: Elad <ebenahar>
Component: BLL.StorageAssignee: Allon Mureinik <amureini>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: bugs, ebenahar
Target Milestone: ovirt-4.2.0Keywords: Regression
Target Release: ---Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-20 11:44:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log and vdsm.log none

Description Elad 2017-06-09 10:59:36 UTC
Created attachment 1286374 [details]
engine.log and vdsm.log

Description of problem:
Tried to extend a block (FC) storage domain using the Webadmin while the host that was intended to perform the extension was unreachable to the engine. The operation failed with NullPointerException for ExtendSANStorageDomainCommand.


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20170605153216.gita063574.el7.centos.noarch
vdsm-4.20.0-999.gitc3e1239.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Try to extend a block based storage domain while the host that supposed to perform the operation is unreachable to the engine. Can be achieved by rebooting the host and immediately right after, extend the domain with this host


Actual results:

2017-06-09 13:37:17,577+03 INFO  [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Running command: ConnectAllHostsToLunCommand interna
l: true. Entities affected :  ID: d9fed7e3-72ad-4c59-83cb-6108389fb853 Type: Storage


2017-06-09 13:37:17,590+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-10) [65454d46] START, GetDeviceListVDSCommand(HostName = host_mixed_1, Get
DeviceListVDSCommandParameters:{runAsync='true', hostId='d5369c8d-4b9b-43ea-87f5-6634313079df', storageType='FCP', checkStatus='false', lunIds='null'}), log id: 4293a40d



2017-06-09 13:37:18,264+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [65454d46] EVENT_ID: USER_CONNECT_HOSTS_TO_LUN_FAILED(988), Failed to connect Host host_mixed_1 to device. (User: admin@internal-authz)
2017-06-09 13:37:18,274+03 INFO  [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Lock freed to object 'EngineLock:{exclusiveLocks='[d9fed7e3-72ad-4c59-83cb-6108389fb853=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'


2017-06-09 13:37:18,274+03 ERROR [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-10) [65454d46] Error during ValidateFailure.: java.lang.NullPointerEx
ception
        at org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand.validate(ExtendSANStorageDomainCommand.java:119) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalValidate(CommandBase.java:849) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:403) [bll.jar:]
        at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:495) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:477) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor349.invoke(Unknown Source) [:1.8.0_131]




Webadmin:
Cannot extend Storage Domain. Storage device ${lun} is unreachable from ${hostName}.
General command validation failure.



Expected results:
In case the host is unreachable to the engine, storage domain extension should fail nicely 

Additional info:
engine.log and vdsm.log

Comment 1 Allon Mureinik 2017-06-11 02:39:39 UTC
I believe this is a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does not happen in 4.1.z engines?

Comment 2 Allon Mureinik 2017-06-11 02:57:10 UTC
(In reply to Allon Mureinik from comment #1)
> I believe this is a regression caused by
> 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does
> not happen in 4.1.z engines?

After reviewing the code again, there definitely is a regression there. Patch posted and bug tentatively targetted to 4.2, pending Elad's reply to see if there's another issue in the 4.1.z branch too.

Comment 3 Elad 2017-06-11 08:02:51 UTC
Doesn't happen in 4.1.3.1-0.1

2017-06-11 10:57:47,832+03 WARN  [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain' failed for user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun

Comment 4 Allon Mureinik 2017-06-11 08:09:51 UTC
(In reply to Elad from comment #3)
> Doesn't happen in 4.1.3.1-0.1
> 
> 2017-06-11 10:57:47,832+03 WARN 
> [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand]
> (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain'
> failed for user admin@internal-authz. Reasons:
> VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,
> ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun

Thanks Elad. So this is indeed a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786, and the 4.2 targeting is indeed correct.
Patch is already submitted, should probably be merged soon.

Comment 5 Red Hat Bugzilla Rules Engine 2017-06-11 08:09:55 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 6 Elad 2017-07-19 08:47:53 UTC
In case the host is unreachable to the engine, extend storage domain fails nicely:

2017-07-19 11:45:18,553+03 WARN  
[org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-28) [6a4a9f9a] Validation of action 'ExtendSANStorageDomain' failed f
or user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_3,$lun 3514f0c5a516008d7


Tested using:
ovirt-engine-4.2.0-0.0.master.20170717104433.gita1ba045.el7.centos.noarch
vdsm-4.20.1-202.git9f953f3.el7.centos.x86_64

Comment 7 Sandro Bonazzola 2017-12-20 11:44:20 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.