Bug 1535796

Summary: Undeployment of HE is not graceful
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Evgeny Slutsky <eslutsky>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.8CC: dougsland, eslutsky, lsurette, lsvaty, mavital, pelauter, rdlugyhe, srevivo
Target Milestone: ovirt-4.4.0Keywords: TestOnly, Triaged
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rhv-4.4.0-28 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-04 13:16:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2018-01-18 06:45:52 UTC
Description of problem:

During Hosted-Engine Undeploy on host re-install, apparently first the hosted-engine.conf file is removed and than the ha daemon is stopped.

The shutdown (undeploy) is not clean, the agent goes into failed state (systemd) and fills logs with unnecessary errors.

Deploy logs:
2018-01-11 21:11:08 DEBUG otopi.filetransaction filetransaction.prepare:219 backup '/etc/ovirt-hosted-engine/hosted-engine.conf'->'/etc/ovirt-hosted-engine/hosted-engine.conf.20180111211108'                                                                 
2018-01-11 21:11:24 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/bin/systemctl', 'stop', 'ovirt-ha-agent.service'), executable='None', cwd='None', env=None                                                                    

From agent side:

MainThread::ERROR::2018-01-11 21:11:26,620::config::163::ovirt_hosted_engine_ha.lib.storage_server.StorageServer.config::(_load_single_conf_file) Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not available [[Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf']                          
MainThread::WARNING::2018-01-11 21:11:26,620::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monitoring engine: 'Configuration value not found: file=/etc/ovirt-hosted-engine/hosted-engine.conf, key=domainType'                                                                 
MainThread::WARNING::2018-01-11 21:11:26,620::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error                                                                                                 
Traceback (most recent call last):                                                                                                                                                                                                                             
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring                                                                                                                                         
    self._initialize_storage_images()                                                                                                                                                                                                                          
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 631, in _initialize_storage_images                                                                                                                               
    sserver = storage_server.StorageServer()                                                                                                                                                                                                                   
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 39, in __init__                                                                                                                                                   
    self._domain_type = self._config.get(config.ENGINE, config.DOMAIN_TYPE)                                                                                                                                                                                    
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/env/config.py", line 226, in get                                                                                                                                                               
    key                                                                                                                                                                                                                                                        
KeyError: 'Configuration value not found: file=/etc/ovirt-hosted-engine/hosted-engine.conf, key=domainType'  

MainThread::INFO::2018-01-11 21:12:31,841::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down                                                                                                                                     

With several errors in between.

Version-Release number of selected component (if applicable):
rhevm-4.1.8.2-0.1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Undeploy Hosted-Engine
2. Check ha-agent status and logs

Actual results:
ha-agent is in failed state. logs shows errors.

Expected results:
ha-agent stopped, clean logs with graceful shutdown.

Comment 2 Daniel Gur 2019-08-28 13:13:49 UTC
sync2jira

Comment 3 Daniel Gur 2019-08-28 13:18:03 UTC
sync2jira

Comment 4 Sandro Bonazzola 2019-11-13 08:45:24 UTC
Is this still reproducible?

Comment 5 Sandro Bonazzola 2020-03-02 13:53:13 UTC
Moving to QE and marking as test only. If this is reproducible we'll dig into fresh data.

Comment 6 Sandro Bonazzola 2020-03-17 13:57:39 UTC
Moving to the engine since ovirt-host-deploy is not going to be shipped in 4.4 and this should work with the ansible deployment as well.

Comment 8 Nikolai Sednev 2020-04-06 18:21:19 UTC
ovirt-ha-broker.service was still running after undeployment of secondary ha-host.


ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-04-06 20:56:43 IDT; 19min ago
 Main PID: 38446 (ovirt-ha-broker)
    Tasks: 13 (limit: 178310)
   Memory: 43.2M
   CGroup: /system.slice/ovirt-ha-broker.service
           └─38446 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Mon 2020-04-06 21:14:39 IDT; 2min 0s ago
  Process: 38618 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=0/SUCCESS)
 Main PID: 38618 (code=exited, status=0/SUCCESS)

MainThread::INFO::2020-04-06 21:14:39,282::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
::(_stop_domain_monitor) Stopped VDSM domain monitor
MainThread::INFO::2020-04-06 21:14:39,283::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down

Agent was shut down gracefully without any errors.

Comment 9 Nikolai Sednev 2020-04-06 18:22:42 UTC
Tested on:
ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev
Linux 4.18.0-193.el8.x86_64 #1 SMP Fri Mar 27 14:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.2 (Ootpa)

Comment 17 errata-xmlrpc 2020-08-04 13:16:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3247