Hide Forgot
Description of problem: Setting maintenance mode from the engine will disconnect all the configured storage domains including the HE one. In the past ovirt-ha-agent was able to detect that and restart itself in order to reconnect the HE storage domain. in 4.2 storage domain connection has been moved to ovirt-ha-broker that doesn't handle the storage reconnection. Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha 2.2.12 How reproducible: 100% Steps to Reproduce: 1. deploy hosted-engine with 2 or more HE hosts 2. move the host where the engine VM is running into maintenance mode from the engine 3. Actual results: The engine VM correctly migrates to another host. The host got moved into maintenance mode which implies HE local maintenance mode. All the storage domains including the HE one got disconnected and nothing currently reconnects it until the user manually restarts ovirt-ha-broker In broker.log we see something like: Thread-7::INFO::2018-05-29 13:54:30,895::engine_health::191::engine_health.EngineHealth::(_result_from_stats) VM successfully migrated away from this host. Thread-3::INFO::2018-05-29 13:54:30,903::mem_free::51::mem_free.MemFree::(action) memFree: 5191 Thread-5::INFO::2018-05-29 13:54:37,700::engine_health::94::engine_health.EngineHealth::(action) VM not on this host StatusStorageThread::ERROR::2018-05-29 13:54:38,925::storage_broker::161::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata from /var/run/vdsm/storage/3e80ea57-56f2-4b97-9723-331d37319678/9b572739-1f4c-40ca-89e6-5c0b337f96af/d43a872f-239a-48cf-b667-ba68758bb02b Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 156, in get_raw_stats fin.readinto(direct_io_buffer) IOError: [Errno 5] Input/output error StatusStorageThread::ERROR::2018-05-29 13:54:38,941::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 88, in run self._storage_broker.get_raw_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 163, in get_raw_stats .format(str(e))) RequestError: failed to read metadata: [Errno 5] Input/output error Listener::INFO::2018-05-29 13:54:39,132::storage_broker::304::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(_get_domain_monitor_status) VDSM domain monitor status: NONE StatusStorageThread::ERROR::2018-05-29 13:54:39,159::storage_broker::211::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(put_stats) Failed to write metadata for host 1 to /var/run/vdsm/storage/3e80ea57-56f2-4b97-9723-331d37319678/9b572739-1f4c-40ca-89e6-5c0b337f96af/d43a872f-239a-48cf-b667-ba68758bb02b Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 207, in put_stats uninterruptible(os.write, f, direct_io_buffer) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 183, in uninterruptible return method(*args, **kwargs) OSError: [Errno 5] Input/output error StatusStorageThread::ERROR::2018-05-29 13:54:39,160::status_broker::85::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 81, in run entry.data File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 213, in put_stats .format(str(e))) RequestError: failed to write metadata: [Errno 5] Input/output error StatusStorageThread::ERROR::2018-05-29 13:54:39,162::storage_broker::161::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata from /var/run/vdsm/storage/3e80ea57-56f2-4b97-9723-331d37319678/9b572739-1f4c-40ca-89e6-5c0b337f96af/d43a872f-239a-48cf-b667-ba68758bb02b Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 156, in get_raw_stats fin.readinto(direct_io_buffer) IOError: [Errno 5] Input/output error StatusStorageThread::ERROR::2018-05-29 13:54:39,163::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 88, in run self._storage_broker.get_raw_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 163, in get_raw_stats .format(str(e))) RequestError: failed to read metadata: [Errno 5] Input/output error Thread-8::WARNING::2018-05-29 13:54:39,431::storage_domain::60::storage_domain.EngineHealth::(action) Hosted-engine storage domain is in invalid state Thread-3::INFO::2018-05-29 13:54:39,924::mem_free::51::mem_free.MemFree::(action) memFree: 5193 hosted-engine --vm-status on the host set into maintenance mode reports itself as '{"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}' and all the other hosts as 'unknown stale-data'. All the other hosts report the host into maintenance mode as 'unknown stale-data' Expected results: The hosted-engine storage domain is correctly reconnected on the host set into maintenance mode, hosted-engine --vm-status is correctly working on all the hosts. Additional info:
This bug discovered by automation - adding keyword
Works for me on these components: ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch rhvm-appliance-4.2-20180601.0.el7.noarch ovirt-engine-setup-4.2.4.1-0.1.el7.noarch Linux 3.10.0-862.6.1.el7.x86_64 #1 SMP Mon Jun 4 15:33:25 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) No errors in broker.log were discovered and migrations passed without any issues. Tested over NFS deployment.
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
*** Bug 1599110 has been marked as a duplicate of this bug. ***