Bug 1578763
| Summary: | [downstream clone - 4.2.4] Unreachable ISO/Export SD prevents hosts from activating | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
| Component: | vdsm | Assignee: | Fred Rolland <frolland> |
| Status: | CLOSED ERRATA | QA Contact: | Kevin Alon Goldblatt <kgoldbla> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.3 | CC: | aperotti, danken, ebenahar, frolland, gscott, gveitmic, jbryant, kgoldbla, kshukla, ktordeur, lsurette, milee, mkalinin, mklika, nsoffer, ratamir, srevivo, tnisan, ycui, ykaul, ylavi |
| Target Milestone: | ovirt-4.2.4 | Keywords: | Regression, ZStream |
| Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | ovirt-engine-4.2.4 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1499056 | Environment: | |
| Last Closed: | 2018-06-27 10:02:46 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1499056 | ||
| Bug Blocks: | |||
|
Description
RHV bug bot
2018-05-16 11:29:23 UTC
This is the same behavior as any other domain, pushing out to 4.3 to rethink. (Originally by amureini) (In reply to Allon Mureinik from comment #2) > This is the same behavior as any other domain, pushing out to 4.3 to rethink. Is it? I thought vdsTimeout exception for connectStorageServer on a Data Domain would move the Host to Non-Operational. Might be wrong though... (Originally by Germano Veit Michel) Stuck mount can be solved by letting someone else wait for the mount, and returning a timeout early. Engine can modify the domain or host status considering the domain status on other hosts. One option is to let systemd do the mount using mount units. If mount get stuck, this is now systemd problem. Vdsm can wait for a mount events, and report to engine when the mount was completed. See https://gerrit.ovirt.org/#/c/56516/ for example using mount events. Using persistent mount units, a host can recover the mounts automatically after a reboot - this will simplify hosted engine flows. (Originally by Nir Soffer) (In reply to Nir Soffer from comment #11) > Stuck mount can be solved by letting someone else wait for the mount, and > returning > a timeout early. Engine can modify the domain or host status considering the > domain > status on other hosts. > > One option is to let systemd do the mount using mount units. If mount get > stuck, > this is now systemd problem. Vdsm can wait for a mount events, and report to > engine when the mount was completed. See https://gerrit.ovirt.org/#/c/56516/ > for example using mount events. > > Using persistent mount units, a host can recover the mounts automatically > after > a reboot - this will simplify hosted engine flows. Allon should we move it back to 4.3? (Originally by ylavi) Returning old needinfo (Originally by Raz Tamir) Tested with the following code:
-----------------------------------------
rhevm-3.6.13.1-0.1.el6.noarch
vdsm-4.17.44-2.el7ev.noarch
Tested with the following scenario:
---------------------------------------
1. On a system with nfs, iscsi and ISO domain
2. Ran "iptables -I OUTPUT 1 -d <NFS_IP> -j DROP" from the host to the ISO domain >>>> The data center was set to Non responsive
Engine logs:
-------------------------------
2018-03-03 22:33:44,459 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-65) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM command failed: Client close
2018-03-03 22:33:44,467 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-65) [] ERROR, org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand, exception: VDSGenericException: VDSNetworkException: Client close, log id: 8478bdf
2018-03-03 22:33:44,467 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-65) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Client close
at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand.executeIrsBrokerCommand(GetStoragePoolInfoVDSCommand.java:32) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:161) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:74) [vdsbroker.jar:]
at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData.proceedStoragePoolStats(IrsProxyData.java:336) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData._updatingTimer_Elapsed(IrsProxyData.java:238) [vdsbroker.jar:]
at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source) [:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121]
at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:]
at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
2018-03-03 22:33:44,483 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (DefaultQuartzScheduler_Worker-65) [] IRS failover failed - can't allocate vds server
2018-03-03 22:33:44,498 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (org.ovirt.thread.pool-6-thread-6) [4b9f84fe] Lock freed to object 'EngineLock:{exclusiveLocks='null', sharedLocks='[dea07dac-f1a7-4a94-ac7b-ed697b635276=<POOL, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}'
2018-03-03 22:33:44,511 INFO [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (org.ovirt.thread.pool-6-thread-6) [4b9f84fe] Running command: MaintenanceVdsCommand internal: true. Entities affected : ID: 1fed50d9-ed6c-4adf-bcbc-37487de67bb0 Type: VDS
.
.
.
2018-03-03 22:33:54,498 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (DefaultQuartzScheduler_Worker-64) [113f6f10] Storage Pool 'dea07dac-f1a7-4a94-ac7b-ed697b635276' - Updating Storag
e Domain 'fb39c7f3-7bed-4946-970a-ff3c0656726d' status from 'Active' to 'Unknown', reason: null
2018-03-03 22:33:54,503 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-64) [113f6f10] Correlation ID: 113f6f10, Call Stack: null, Custom Event ID: -1, Message: Invalid status on Data Center golden_env_mixed. Setting status to Non Responsive.
2018-03-03 22:33:54,518 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (DefaultQuartzScheduler_Worker-64) [113f6f10] START, ConnectStoragePoolVDSCommand(HostName = host_mixed_1, ConnectStoragePoolVDSCommandParameters:{runAsync='true', hostId='5c9876d4-d68d-4eac-8f5a-10557b5f319a', vdsId='5c9876d4-d68d-4eac-8f5a-10557b5f319a', storagePoolId='dea07dac-f1a7-4a94-ac7b-ed697b635276', masterVersion='1'}), log id: 671a00e8
.
.
.
2018-03-03 22:36:20,618 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-65) [] Command 'GetAllVmStatsVDSCommand(HostName = host_mixed_1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='5c9876d4-d68d-4eac-8f5a-10557b5f319a', vds='Host[host_mixed_1,5c9876d4-d68d-4eac-8f5a-10557b5f319a]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-03-03 22:36:20,621 INFO [org.ovirt.engine.core.vdsbroker.PollVmStatsRefresher] (DefaultQuartzScheduler_Worker-65) [] Failed to fetch vms info for host 'host_mixed_1' - skipping VMs monitoring.
2018-03-03 22:36:20,623 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-54) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command failed: Heartbeat exeeded
2018-03-03 22:36:20,623 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] (DefaultQuartzScheduler_Worker-54) [] Command 'GetStatsVDSCommand(HostName = host_mixed_1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='5c9876d4-d68d-4eac-8f5a-10557b5f319a', vds='Host[host_mixed_1,5c9876d4-d68d-4eac-8f5a-10557b5f319a]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-03-03 22:36:20,623 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-54) [] Failed getting vds stats, vds='host_mixed_1'(5c9876d4-d68d-4eac-8f5a-10557b5f319a): org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-03-03 22:36:20,623 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-54) [] Failure to refresh Vds runtime info: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-03-03 22:36:20,623 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-54) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exeeded
at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:74) [vdsbroker.jar:]
at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:114) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227) [vdsbroker.jar:]
at sun.reflect.GeneratedMethodAccessor107.invoke(Unknown Source) [:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121]
at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:]
at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
2018-03-03 22:36:20,624 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-54) [] Failed to refresh VDS, network error, continuing, vds='host_mixed_1'(5c9876d4-d68d-4eac-8f5a-10557b5f319a): VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-03-03 22:36:20,624 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-22) [] Host 'host_mixed_1' is not responding.
2018-03-03 22:36:20,624 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-24) [] Host 'host_mixed_1' is not responding.
2018-03-03 22:36:20,625 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-63) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command failed: Heartbeat exeeded
2018-03-03 22:36:20,625 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-63) [] Command 'SpmStatusVDSCommand(HostName = host_mixed_1, SpmStatusVDSCommandParameters:{runAsync='true', hostId='5c9876d4-d68d-4eac-8f5a-10557b5f319a', storagePoolId='dea07dac-f1a7-4a94-ac7b-ed697b635276'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-03-03 22:36:20,627 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (DefaultQuartzScheduler_Worker-63) [39df4684] Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: dea07dac-f1a7-4a94-ac7b-ed697b635276 Type: StoragePool
(Originally by Kevin Goldblatt)
Created attachment 1403592 [details]
engine and vdsm logs
Added logs
(Originally by Kevin Goldblatt)
According to reproduction from Kevin, which is not the same issue as the bug itself, this seems like a regression (Originally by Raz Tamir) The iso domain which was used in the test reported in comment #22 was on the engine so non responsive is the desired behavior. Upon blocking connection to the storage server where the iso domain resides, only the iso domain becomes inactive (Originally by Elad Ben Aharon) (In reply to Elad from comment #25) > The iso domain which was used in the test reported in comment #22 was on the > engine so non responsive is the desired behavior. > Upon blocking connection to the storage server where the iso domain resides, > only the iso domain becomes inactive Logs? (Originally by amureini) Created attachment 1403741 [details]
logs-3.6-only_iso_domain_inactive
vdsm.log.1.xz:
Thread-52183::ERROR::2018-03-04 11:38:55,686::monitor::276::Storage.Monitor::(_monitorDomain) Error monitoring domain 8c5e702c-486e-4bc4-9bbb-0155218abf14
Traceback (most recent call last):
File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain
self._performDomainSelftest()
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in wrapper
value = meth(self, *a, **kw)
File "/usr/share/vdsm/storage/monitor.py", line 339, in _performDomainSelftest
self.domain.selftest()
File "/usr/share/vdsm/storage/fileSD.py", line 648, in selftest
self.oop.os.statvfs(self.domaindir)
File "/usr/share/vdsm/storage/outOfProcess.py", line 274, in statvfs
return self._iop.statvfs(path)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 452, in statvfs
resdict = self._sendCommand("statvfs", {"path": path}, self.timeout)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 419, in _sendCommand
raise Timeout(os.strerror(errno.ETIMEDOUT))
Timeout: Connection timed out
Thread-52183::INFO::2018-03-04 11:38:55,687::monitor::299::Storage.Monitor::(_notifyStatusChanges) Domain 8c5e702c-486e-4bc4-9bbb-0155218abf14 became INVALID
engine.log:
ISO domain is unreachable:
2018-03-04 11:42:01,809 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetFileStatsVDSCommand] (ajp-/127.0.0.1:8702-1) [] FINISH, GetFileStatsVDSCommand, log id: d97256f
2018-03-04 11:42:01,809 WARN [org.ovirt.engine.core.bll.IsoDomainListSyncronizer] (ajp-/127.0.0.1:8702-1) [] The refresh process for pattern * failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Cannot get file stats: ('8c5e702c-486e-4bc4-9bbb-0155218abf14',) (Failed with error GET_FILE_LIST_ERROR and code 330)
ISO domain is being deactivated:
2018-03-04 11:43:22,542 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-35) [762c9a06] Running command: DeactivateStorageDomainCommand internal: true. Entities affected : ID: 8c5e702c-486e-4bc4-9bbb-0155218abf14 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
The data center remains up, only the ISO domain is affected and becomes inactive.
There seems to be a bug on 3.6 for GetImagesListQuery, an NPE is thrown. Not sure if it's a known issue.
2018-03-04 11:42:01,809 WARN [org.ovirt.engine.core.bll.IsoDomainListSyncronizer] (ajp-/127.0.0.1:8702-1) [] The refresh process for pattern * failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Cannot get file stats: ('8c5e702c-486e-4bc4-9bbb-0155218abf14',) (Failed with error GET_FILE_LIST_ERROR and code 330)
2018-03-04 11:42:01,810 ERROR [org.ovirt.engine.core.bll.GetImagesListQuery] (ajp-/127.0.0.1:8702-1) [] Query 'GetImagesListQuery' failed: null
2018-03-04 11:42:01,810 ERROR [org.ovirt.engine.core.bll.GetImagesListQuery] (ajp-/127.0.0.1:8702-1) [] Exception: java.lang.NullPointerException
at org.ovirt.engine.core.bll.IsoDomainListSyncronizer.removeFileStatsForComplyingFileNames(IsoDomainListSyncronizer.java:699) [bll.jar:]
at org.ovirt.engine.core.bll.IsoDomainListSyncronizer.updateAllFileListFromVDSM(IsoDomainListSyncronizer.java:679) [bll.jar:]
at org.ovirt.engine.core.bll.IsoDomainListSyncronizer.updateFileList(IsoDomainListSyncronizer.java:305) [bll.jar:]
at org.ovirt.engine.core.bll.IsoDomainListSyncronizer.refreshIsoDomainFileForStoragePool(IsoDomainListSyncronizer.java:289) [bll.jar:]
at org.ovirt.engine.core.bll.IsoDomainListSyncronizer.refreshIsoDomain(IsoDomainListSyncronizer.java:346) [bll.jar:]
at org.ovirt.engine.core.bll.IsoDomainListSyncronizer.refreshRepos(IsoDomainListSyncronizer.java:212) [bll.jar:]
(Originally by Elad Ben Aharon)
I don't see the regression here. The bug is about the behavior when the the mount command for the iso domain get stuck for more then 180 seconds. Did we reproduce stuck mount in 3.6? do we have different behavior when mounting the iso domain is stuck? (Originally by Nir Soffer) (In reply to Elad from comment #28) > Thread-52183::ERROR::2018-03-04 > 11:38:55,686::monitor::276::Storage.Monitor::(_monitorDomain) Error > monitoring domain 8c5e702c-486e-4bc4-9bbb-0155218abf14 > Traceback (most recent call last): This means the connectStoragesServer succeeded, and we are monitoring the domain. This bug is about the case when connectStorageServer timed out. (Originally by Nir Soffer) Nir, how can we simulate this? Isn't blocking connection to the NFS server enough? (Originally by Elad Ben Aharon) (In reply to Elad from comment #31) > Nir, how can we simulate this? > Isn't blocking connection to the NFS server enough? It should, but if the mount commmand did not got stuck, we did not reproduce the issue reported in this bug. (Originally by Nir Soffer) Nir, maybe Germano can provide us a way of reproducing stuck mount. (Originally by Elad Ben Aharon) I see the exact same behavior in 4.2 as I saw in 4.1 in comment #0. No heartbeat exceeded, just vdsTimeout and a up/down dance forever. Host cannot activate. 1. Engine sends connectStorageServer 2018-03-05 12:42:16,037+11 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] 2. vdsTimeout on connectStorageServer (3min) 2018-03-05 12:45:16,966+11 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] ... execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 3. Engine reconnects to host 2018-03-05 12:45:17,021+11 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ... 4. Host is set to Up 2018-03-05 12:45:18,422+11 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Threa d-62) [773ff1ff] EVENT_ID: VDS_DETECTED(13), Status of host ... was set to Up. 5. Then quickly switches to not responding 2018-03-05 12:45:18,658+11 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-91979) [675de76a] Host '....' is not responding. 6. Engine connects again and sends another connectStorageServer. Up/Not-Responding dance continues, repeat 1-5 again, and again and again. The only improvement is that now I can clearly see the blocked worked on vdsm side trying to do mount (thanks!), but it only helps diagnosing the issue, it doesn't solve it of course. Versions used: rhvm-4.2.0-0.6.el7.noarch vdsm-4.20.17-1.el7ev.x86_64 (Originally by Germano Veit Michel) INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Project 'ovirt-engine'/Component 'vdsm' mismatch] For more info please contact: rhv-devops Verified with the following code: ----------------------------------------- ovirt-engine-4.2.4-0.1.el7.noarch vdsm-4.20.28-1.el7ev.x86_64 Verified with the following scenario: ----------------------------------------- Steps: - Data Center with at least two host - Configure an ISO storage domain - Move one of the hosts to Maintenance - Block with IPtables the connectivity to the ISO domain from the host in maintenance only. - Activate the Host >>>>> The host move to Up state. Moving to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2072 BZ<2>Jira Resync sync2jira sync2jira |