Bug 1460195 - [engine-backend] ExtendSANStorageDomain fails with NullPointerException in case the host is unreachable
[engine-backend] ExtendSANStorageDomain fails with NullPointerException in ca...
Status: VERIFIED
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
4.2.0
x86_64 Unspecified
unspecified Severity medium (vote)
: ovirt-4.2.0
: ---
Assigned To: Allon Mureinik
Elad
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-09 06:59 EDT by Elad
Modified: 2017-07-19 04:47 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2+
rule-engine: blocker+


Attachments (Terms of Use)
engine.log and vdsm.log (1.46 MB, application/x-gzip)
2017-06-09 06:59 EDT, Elad
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 78052 master MERGED core: Correct getting ConnectAllHostsToLunResult 2017-06-11 11:31 EDT

  None (edit)
Description Elad 2017-06-09 06:59:36 EDT
Created attachment 1286374 [details]
engine.log and vdsm.log

Description of problem:
Tried to extend a block (FC) storage domain using the Webadmin while the host that was intended to perform the extension was unreachable to the engine. The operation failed with NullPointerException for ExtendSANStorageDomainCommand.


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20170605153216.gita063574.el7.centos.noarch
vdsm-4.20.0-999.gitc3e1239.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Try to extend a block based storage domain while the host that supposed to perform the operation is unreachable to the engine. Can be achieved by rebooting the host and immediately right after, extend the domain with this host


Actual results:

2017-06-09 13:37:17,577+03 INFO  [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Running command: ConnectAllHostsToLunCommand interna
l: true. Entities affected :  ID: d9fed7e3-72ad-4c59-83cb-6108389fb853 Type: Storage


2017-06-09 13:37:17,590+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-10) [65454d46] START, GetDeviceListVDSCommand(HostName = host_mixed_1, Get
DeviceListVDSCommandParameters:{runAsync='true', hostId='d5369c8d-4b9b-43ea-87f5-6634313079df', storageType='FCP', checkStatus='false', lunIds='null'}), log id: 4293a40d



2017-06-09 13:37:18,264+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [65454d46] EVENT_ID: USER_CONNECT_HOSTS_TO_LUN_FAILED(988), Failed to connect Host host_mixed_1 to device. (User: admin@internal-authz)
2017-06-09 13:37:18,274+03 INFO  [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Lock freed to object 'EngineLock:{exclusiveLocks='[d9fed7e3-72ad-4c59-83cb-6108389fb853=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'


2017-06-09 13:37:18,274+03 ERROR [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-10) [65454d46] Error during ValidateFailure.: java.lang.NullPointerEx
ception
        at org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand.validate(ExtendSANStorageDomainCommand.java:119) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalValidate(CommandBase.java:849) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:403) [bll.jar:]
        at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:495) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:477) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor349.invoke(Unknown Source) [:1.8.0_131]




Webadmin:
Cannot extend Storage Domain. Storage device ${lun} is unreachable from ${hostName}.
General command validation failure.



Expected results:
In case the host is unreachable to the engine, storage domain extension should fail nicely 

Additional info:
engine.log and vdsm.log
Comment 1 Allon Mureinik 2017-06-10 22:39:39 EDT
I believe this is a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does not happen in 4.1.z engines?
Comment 2 Allon Mureinik 2017-06-10 22:57:10 EDT
(In reply to Allon Mureinik from comment #1)
> I believe this is a regression caused by
> 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does
> not happen in 4.1.z engines?

After reviewing the code again, there definitely is a regression there. Patch posted and bug tentatively targetted to 4.2, pending Elad's reply to see if there's another issue in the 4.1.z branch too.
Comment 3 Elad 2017-06-11 04:02:51 EDT
Doesn't happen in 4.1.3.1-0.1

2017-06-11 10:57:47,832+03 WARN  [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain' failed for user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun
Comment 4 Allon Mureinik 2017-06-11 04:09:51 EDT
(In reply to Elad from comment #3)
> Doesn't happen in 4.1.3.1-0.1
> 
> 2017-06-11 10:57:47,832+03 WARN 
> [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand]
> (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain'
> failed for user admin@internal-authz. Reasons:
> VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,
> ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun

Thanks Elad. So this is indeed a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786, and the 4.2 targeting is indeed correct.
Patch is already submitted, should probably be merged soon.
Comment 5 Red Hat Bugzilla Rules Engine 2017-06-11 04:09:55 EDT
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Comment 6 Elad 2017-07-19 04:47:53 EDT
In case the host is unreachable to the engine, extend storage domain fails nicely:

2017-07-19 11:45:18,553+03 WARN  
[org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-28) [6a4a9f9a] Validation of action 'ExtendSANStorageDomain' failed f
or user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_3,$lun 3514f0c5a516008d7


Tested using:
ovirt-engine-4.2.0-0.0.master.20170717104433.gita1ba045.el7.centos.noarch
vdsm-4.20.1-202.git9f953f3.el7.centos.x86_64

Note You need to log in before you can comment on or make changes to this bug.