Description of problem: During Hosted-Engine Undeploy on host re-install, apparently first the hosted-engine.conf file is removed and than the ha daemon is stopped. The shutdown (undeploy) is not clean, the agent goes into failed state (systemd) and fills logs with unnecessary errors. Deploy logs: 2018-01-11 21:11:08 DEBUG otopi.filetransaction filetransaction.prepare:219 backup '/etc/ovirt-hosted-engine/hosted-engine.conf'->'/etc/ovirt-hosted-engine/hosted-engine.conf.20180111211108' 2018-01-11 21:11:24 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/bin/systemctl', 'stop', 'ovirt-ha-agent.service'), executable='None', cwd='None', env=None From agent side: MainThread::ERROR::2018-01-11 21:11:26,620::config::163::ovirt_hosted_engine_ha.lib.storage_server.StorageServer.config::(_load_single_conf_file) Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not available [[Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf'] MainThread::WARNING::2018-01-11 21:11:26,620::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monitoring engine: 'Configuration value not found: file=/etc/ovirt-hosted-engine/hosted-engine.conf, key=domainType' MainThread::WARNING::2018-01-11 21:11:26,620::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring self._initialize_storage_images() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 631, in _initialize_storage_images sserver = storage_server.StorageServer() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 39, in __init__ self._domain_type = self._config.get(config.ENGINE, config.DOMAIN_TYPE) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/env/config.py", line 226, in get key KeyError: 'Configuration value not found: file=/etc/ovirt-hosted-engine/hosted-engine.conf, key=domainType' MainThread::INFO::2018-01-11 21:12:31,841::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down With several errors in between. Version-Release number of selected component (if applicable): rhevm-4.1.8.2-0.1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Undeploy Hosted-Engine 2. Check ha-agent status and logs Actual results: ha-agent is in failed state. logs shows errors. Expected results: ha-agent stopped, clean logs with graceful shutdown.
sync2jira
Is this still reproducible?
Moving to QE and marking as test only. If this is reproducible we'll dig into fresh data.
Moving to the engine since ovirt-host-deploy is not going to be shipped in 4.4 and this should work with the ansible deployment as well.
ovirt-ha-broker.service was still running after undeployment of secondary ha-host. ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2020-04-06 20:56:43 IDT; 19min ago Main PID: 38446 (ovirt-ha-broker) Tasks: 13 (limit: 178310) Memory: 43.2M CGroup: /system.slice/ovirt-ha-broker.service └─38446 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: inactive (dead) since Mon 2020-04-06 21:14:39 IDT; 2min 0s ago Process: 38618 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=0/SUCCESS) Main PID: 38618 (code=exited, status=0/SUCCESS) MainThread::INFO::2020-04-06 21:14:39,282::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ::(_stop_domain_monitor) Stopped VDSM domain monitor MainThread::INFO::2020-04-06 21:14:39,283::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down Agent was shut down gracefully without any errors.
Tested on: ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev Linux 4.18.0-193.el8.x86_64 #1 SMP Fri Mar 27 14:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.2 (Ootpa)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3247