Bug 1857793

Summary: Global Maintenance by cli causes penalizing score of the host with HE VM on it.
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Polina <pagranat>
Component: AgentAssignee: Asaf Rachmani <arachman>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: bugs, dagur, michal.skrivanek
Target Milestone: ovirt-4.4.3Keywords: Automation
Target Release: ---Flags: pm-rhel: ovirt-4.4+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-ha-2.4.5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 06:39:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Polina 2020-07-16 15:10:16 UTC
Created attachment 1701412 [details]
logs

Description of problem:
Global Maintenance by cli 'hosted-engine --set-maintenance --mode=global' causes penalizing score of the host with HE VM on it. host must remain with score 3400 while global maintenance

Version-Release number of selected component (if applicable):
rhv-4.4.1-11

How reproducible:happened several times while Automation tier2. I didn't reproduced it manually. Affected the tests results a lot. 

Steps to Reproduce:
1. We have three hosts in healthy state, max score 3400 each. HE VM is running on host1
2. Sending Global Maintenance to the engine by cli , like 'hosted-engine --set-maintenance --mode=global' causes penalizing score of the host1.

MainThread::INFO::2020-07-13 20:52:28,662::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400)
MainThread::INFO::2020-07-13 20:52:38,691::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Global maintenance detected
MainThread::INFO::2020-07-13 20:52:38,747::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineUp-GlobalMaintenance) sent? ignored
MainThread::INFO::2020-07-13 20:52:38,908::states::72::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_penalize_memory) Penalizing score by 400 due to free memory 16069 being lower than required 16384
MainThread::INFO::2020-07-13 20:52:38,908::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state GlobalMaintenance (score: 3000)

As soon as the 'hosted-engine --set-maintenance --mode=none' is sent this penalizing is canceled


Actual results:

HE VM is already running on the host and it must not be a problem that the host has 16069 MB free memory after this. This is not the reason for penalizing score.


Expected results:
After global maintenance the expected score must be 3400.

Additional info:

Comment 1 Michal Skrivanek 2020-07-17 04:31:44 UTC
> HE VM is already running on the host and it must not be a problem that the host has 16069 MB free memory after this. This is not the reason for penalizing score.

Are you saying that reason is bogus and there is more memory available, or that it considers it in addition to the currently used memory by HE VM?
Anyway, it doesn’t have any ill effect, does it?

Comment 2 Polina 2020-07-19 08:01:03 UTC
(In reply to Michal Skrivanek from comment #1)
> > HE VM is already running on the host and it must not be a problem that the host has 16069 MB free memory after this. This is not the reason for penalizing score.
> 
> Are you saying that reason is bogus and there is more memory available, or
> that it considers it in addition to the currently used memory by HE VM?

the second - The HE VM is already running on this host. And it is completely ok that the host has 16069 MB memory left and must not be a reason for penalizing score.

> Anyway, it doesn’t have any ill effect, does it?

it looks like a changed behavior (we have a check that after sending to Global Maintenance the score for all hosts is 3400). As often happens the ill effect is for automation runs (I don't know if users could be affected by this change) - in many cases we check the hosts' health in the setup , before the test starts , and in such cases the test will fail on setup stage.

Comment 4 Polina 2020-10-21 20:32:29 UTC
verified on ovirt-engine-4.4.3.7-0.22.el8ev.noarch by running automation tests

Comment 5 Sandro Bonazzola 2020-11-11 06:39:35 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.