Created attachment 1286374 [details] engine.log and vdsm.log Description of problem: Tried to extend a block (FC) storage domain using the Webadmin while the host that was intended to perform the extension was unreachable to the engine. The operation failed with NullPointerException for ExtendSANStorageDomainCommand. Version-Release number of selected component (if applicable): ovirt-engine-4.2.0-0.0.master.20170605153216.gita063574.el7.centos.noarch vdsm-4.20.0-999.gitc3e1239.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. Try to extend a block based storage domain while the host that supposed to perform the operation is unreachable to the engine. Can be achieved by rebooting the host and immediately right after, extend the domain with this host Actual results: 2017-06-09 13:37:17,577+03 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Running command: ConnectAllHostsToLunCommand interna l: true. Entities affected : ID: d9fed7e3-72ad-4c59-83cb-6108389fb853 Type: Storage 2017-06-09 13:37:17,590+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-10) [65454d46] START, GetDeviceListVDSCommand(HostName = host_mixed_1, Get DeviceListVDSCommandParameters:{runAsync='true', hostId='d5369c8d-4b9b-43ea-87f5-6634313079df', storageType='FCP', checkStatus='false', lunIds='null'}), log id: 4293a40d 2017-06-09 13:37:18,264+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [65454d46] EVENT_ID: USER_CONNECT_HOSTS_TO_LUN_FAILED(988), Failed to connect Host host_mixed_1 to device. (User: admin@internal-authz) 2017-06-09 13:37:18,274+03 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectAllHostsToLunCommand] (default task-10) [65454d46] Lock freed to object 'EngineLock:{exclusiveLocks='[d9fed7e3-72ad-4c59-83cb-6108389fb853=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-06-09 13:37:18,274+03 ERROR [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-10) [65454d46] Error during ValidateFailure.: java.lang.NullPointerEx ception at org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand.validate(ExtendSANStorageDomainCommand.java:119) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalValidate(CommandBase.java:849) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:403) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:495) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:477) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:] at sun.reflect.GeneratedMethodAccessor349.invoke(Unknown Source) [:1.8.0_131] Webadmin: Cannot extend Storage Domain. Storage device ${lun} is unreachable from ${hostName}. General command validation failure. Expected results: In case the host is unreachable to the engine, storage domain extension should fail nicely Additional info: engine.log and vdsm.log
I believe this is a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does not happen in 4.1.z engines?
(In reply to Allon Mureinik from comment #1) > I believe this is a regression caused by > 05ceb0dfd3bcbedaf24556d2718da6b797749786. Elad, can you confirm this does > not happen in 4.1.z engines? After reviewing the code again, there definitely is a regression there. Patch posted and bug tentatively targetted to 4.2, pending Elad's reply to see if there's another issue in the 4.1.z branch too.
Doesn't happen in 4.1.3.1-0.1 2017-06-11 10:57:47,832+03 WARN [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain' failed for user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun
(In reply to Elad from comment #3) > Doesn't happen in 4.1.3.1-0.1 > > 2017-06-11 10:57:47,832+03 WARN > [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] > (default task-16) [754cf43b] Validation of action 'ExtendSANStorageDomain' > failed for user admin@internal-authz. Reasons: > VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND, > ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_2,$lun Thanks Elad. So this is indeed a regression caused by 05ceb0dfd3bcbedaf24556d2718da6b797749786, and the 4.2 targeting is indeed correct. Patch is already submitted, should probably be merged soon.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
In case the host is unreachable to the engine, extend storage domain fails nicely: 2017-07-19 11:45:18,553+03 WARN [org.ovirt.engine.core.bll.storage.domain.ExtendSANStorageDomainCommand] (default task-28) [6a4a9f9a] Validation of action 'ExtendSANStorageDomain' failed f or user admin@internal-authz. Reasons: VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__EXTEND,ERROR_CANNOT_EXTEND_CONNECTION_FAILED,$hostName host_mixed_3,$lun 3514f0c5a516008d7 Tested using: ovirt-engine-4.2.0-0.0.master.20170717104433.gita1ba045.el7.centos.noarch vdsm-4.20.1-202.git9f953f3.el7.centos.x86_64
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.