Bug 1460195 - [engine-backend] ExtendSANStorageDomain fails with NullPointerException in case the host is unreachable
Summary: [engine-backend] ExtendSANStorageDomain fails with NullPointerException in ca...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: x86_64
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.2.0
: ---
Assignee: Allon Mureinik
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-09 10:59 UTC by Elad
Modified: 2017-12-20 11:44 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:44:20 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
engine.log and vdsm.log (1.46 MB, application/x-gzip)
2017-06-09 10:59 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 78052 0 master MERGED core: Correct getting ConnectAllHostsToLunResult 2020-09-10 12:54:06 UTC

Description Elad 2017-06-09 10:59:36 UTC
Created attachment 1286374 [details]
engine.log and vdsm.log

Description of problem:
Tried to extend a block (FC) storage domain using the Webadmin while the host that was intended to perform the extension was unreachable to the engine. The operation failed with NullPointerException for ExtendSANStorageDomainCommand.


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20170605153216.gita063574.el7.centos.noarch
vdsm-4.20.0-999.gitc3e1239.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Try to extend a block based storage domain while the host that supposed to perform the operation is unreachable to the engine. Can be achieved by rebooting the host and immediately right after, extend the domain with this host


Actual results:

2017-06-09 13:37:17,577+03 INFO  [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Running command: ConnectAllHostsToLunCommand interna
l: true. Entities affected :  ID: d9fed7e3-72ad-4c59-83cb-6108389fb853 Type: Storage


2017-06-09 13:37:17,590+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-10) [65454d46] START, GetDeviceListVDSCommand(HostName = host_mixed_1, Get
DeviceListVDSCommandParameters:{runAsync='true', hostId='d5369c8d-4b9b-43ea-87f5-6634313079df', storageType='FCP', checkStatus='false', lunIds='null'}), log id: 4293a40d



2017-06-09 13:37:18,264+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [65454d46] EVENT_ID: USER_CONNECT_HOSTS_TO_LUN_FAILED(988), Failed to connect Host host_mixed_1 to device. (User: admin@internal-authz)
2017-06-09 13:37:18,274+03 INFO  [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Lock freed to object 'EngineLock:{exclusiveLocks='[d9fed7e3-72ad-4c59-83cb-6108389fb853=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'


2017-06-09 13:37:18,274+03 ERROR [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-10) [65454d46] Error during ValidateFailure.: java.lang.NullPointerEx
ception
        at org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand.validate(ExtendSANStorageDomainCommand.java:119) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalValidate(CommandBase.java:849) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:403) [bll.jar:]
        at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:495) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:477) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor349.invoke(Unknown Source) [:1.8.0_131]




Webadmin:
Cannot extend Storage Domain. Storage device ${lun} is unreachable from ${hostName}.
General command validation failure.



Expected results:
In case the host is unreachable to the engine, storage domain extension should fail nicely 

Additional info:
engine.log and vdsm.log

Comment 1 Allon Mureinik 2017-06-11 02:39:39 UTC
I believe this is a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does not happen in 4.1.z engines?

Comment 2 Allon Mureinik 2017-06-11 02:57:10 UTC
(In reply to Allon Mureinik from comment #1)
> I believe this is a regression caused by
> 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does
> not happen in 4.1.z engines?

After reviewing the code again, there definitely is a regression there. Patch posted and bug tentatively targetted to 4.2, pending Elad's reply to see if there's another issue in the 4.1.z branch too.

Comment 3 Elad 2017-06-11 08:02:51 UTC
Doesn't happen in 4.1.3.1-0.1

2017-06-11 10:57:47,832+03 WARN  [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain' failed for user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun

Comment 4 Allon Mureinik 2017-06-11 08:09:51 UTC
(In reply to Elad from comment #3)
> Doesn't happen in 4.1.3.1-0.1
> 
> 2017-06-11 10:57:47,832+03 WARN 
> [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand]
> (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain'
> failed for user admin@internal-authz. Reasons:
> VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,
> ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun

Thanks Elad. So this is indeed a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786, and the 4.2 targeting is indeed correct.
Patch is already submitted, should probably be merged soon.

Comment 5 Red Hat Bugzilla Rules Engine 2017-06-11 08:09:55 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 6 Elad 2017-07-19 08:47:53 UTC
In case the host is unreachable to the engine, extend storage domain fails nicely:

2017-07-19 11:45:18,553+03 WARN  
[org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-28) [6a4a9f9a] Validation of action 'ExtendSANStorageDomain' failed f
or user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_3,$lun 3514f0c5a516008d7


Tested using:
ovirt-engine-4.2.0-0.0.master.20170717104433.gita1ba045.el7.centos.noarch
vdsm-4.20.1-202.git9f953f3.el7.centos.x86_64

Comment 7 Sandro Bonazzola 2017-12-20 11:44:20 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.