Hide Forgot
Created attachment 1521961 [details] logs Description of problem: Sometimes, DisconnectStorageServerVDSCommand fails with NullPointerException for a request to the host to disconnect an iSCSI connection Version-Release number of selected component (if applicable): ovirt-engine-4.2.8.2-0.1.el7ev.noarch vdsm-4.20.46-1.el7ev.x86_64 How reproducible: Executed ~10 times, reproduced ~5 times, all for disconnection from iSCSI Steps to Reproduce: Storage pool with iSCSI: 1. Set host (not the current SPM) SPM priority to 5 2. Put the host in maintenance Actual results: Sometimes DisconnectStorageServerVDSCommand fails with NullPointerException: 2019-01-12 09:41:09,496+02 ERROR [org.ovirt.engine.core.bll.storage.pool.DisconnectHostFromStoragePoolServersCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [11570294] Command 'org.ovirt.engine.core .bll.storage.pool.DisconnectHostFromStoragePoolServersCommand' failed: EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001) It doesn't seem that this NPE affect the host maintenance procedure. Expected results: No NPE
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
need more logs when this issue reproducing.
I'll provide the logs when I see the issue again but they won't be any different from the ones that are already attached
Some notes on the issue: Setting the SPM priority isn't really relevant here, the issue is that the host is being removed while being disconnected from the storage servers: Host moved to maintenance: 2019-01-12 09:41:03,746+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [] Updated host status from 'Preparing for Maintenance' to 'Maintenance' in database, host 'host_mixed_1'(51bfacf7-177b-481b-95f7-426e335c040e) Then, we disconnect it from the storage pool and storage servers (this is done in async fashion by the VdsEventListener: 2019-01-12 09:41:03,822+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [] START, DisconnectStoragePoolVDSCommand(HostName = host_mixed_1, DisconnectStoragePoolVDSCommandParameters:{hostId='51bfacf7-177b-481b-95f7-426e335c040e', storagePoolId='e027b568-15b6-11e9-8bfd-001a4a168bfc', vds_spm_id='1'}), log id: 5a9592bf While the host was being disconnected, a remove command was issued on that host: 2019-01-12 09:41:06,041+02 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (default task-39) [hosts_delete_a1d39f63-0337-4f24] Lock Acquired to object 'EngineLock:{exclusiveLocks='[51bfacf7-177b-481b-95f7-426e335c040e=VDS]', sharedLocks=''}' However DisconnectStoragPoolVDS wasn't finished at the time: 2019-01-12 09:41:09,437+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [] FINISH, DisconnectStoragePoolVDSCommand, log id: 5a9592bf And DisconnectStorageServerVDSCommand failed because RemoveVdsCommand was already in motion
as Benny mentioned, the issue is that the host is being removed while its being disconnected from the storage servers to move the host from active to maintenance. the issue caused by the locking group is different for remove command and maintenance command. in the patch, the groups has been set to be the same locking group.
Verified at ovirt engine 4.3.4.2-0.1.el7 with the same scenario as provided in the initial description, no NPE/ERROR occured.
This bugzilla is included in oVirt 4.3.4 release, published on June 11th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
*** Bug 1656742 has been marked as a duplicate of this bug. ***