Bug 1415561 - [HE] Upgrade from 3.5 to 3.6 leaves agent.conf behind with old version
Summary: [HE] Upgrade from 3.5 to 3.6 leaves agent.conf behind with old version
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 3.6.9
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
: ---
Assignee: Douglas Schilling Landgraf
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1430513
TreeView+ depends on / blocked
 
Reported: 2017-01-23 04:44 UTC by Germano Veit Michel
Modified: 2022-04-16 08:50 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-19 13:32:28 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-45699 0 None None None 2022-04-16 08:50:30 UTC
Red Hat Knowledge Base (Solution) 2937271 0 None None None 2017-02-21 07:19:20 UTC
oVirt gerrit 74161 0 master ABANDONED migrate: Make sure base-score from agent.conf is set to 3400 2020-03-12 07:58:13 UTC

Description Germano Veit Michel 2017-01-23 04:44:58 UTC
Description of problem:

After upgrading from 3.5 to 3.6 by following the documentation, agent.conf still contains previous version scores (2400 max) on some hosts.

This happened on different occasions, with 3 different Hosts. I'm still trying to understand how.

They all look like this:

# cat etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160920.1.el7ev)

# cat etc/ovirt-hosted-engine-ha/agent.conf 
[score]
# NOTE: These values must be the same for all hosts in the HA cluster!
base-score=2400
gateway-score-penalty=1600
mgmt-bridge-score-penalty=600
free-memory-score-penalty=400
cpu-load-score-penalty=1000
engine-retry-score-penalty=50
cpu-load-penalty-min=0.4
cpu-load-penalty-max=0.9

# cat mount | grep ovirt-hosted-engine-ha
none on /var/lib/ovirt-hosted-engine-ha type tmpfs (rw,relatime,seclabel)
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha type ext4 (rw,noatime,seclabel,data=ordered) [CONFIG]

These are also persisted on 3.6 Hosts. But it looks like it has the previous version hypervisor image (3.5 image - 7.2-20160219) agent.conf persisted, instead of the newer one.

* All upgrade logs rotated

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160920.1.el7ev)
ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch

How reproducible:
0%

Actual results:
agent.conf outdated, host with 2400 Score even when all is correct and other files are correct (i.e.: hosted-engine.conf)

Expected results:
agent.conf updated, host with 3400 Score.

Comment 6 Germano Veit Michel 2017-01-30 05:12:43 UTC
Hi Douglas,

Hosts persisted like this seem to hit the problem:

/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d

Whereas these ones, do not:

/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha/broker.conf
/dev/mapper/HostVG-Config on /var/log/ovirt-hosted-engine-setup
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/answers.conf
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/hosted-engine.conf
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/iptables.example
/dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/vm.conf

Note that the latter does not have persistence under /etc/ovirt-hosted-engine-ha.

What is the correct persistence setting for 3.5 (20160219.0.el7ev)? These were all deployed the same way, it seems to be random.

Comment 7 Douglas Schilling Landgraf 2017-02-01 00:58:09 UTC
Hi Germano,

(In reply to Germano Veit Michel from comment #6)
> Hi Douglas,
> 
> Hosts persisted like this seem to hit the problem:
> 
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d
> 
> Whereas these ones, do not:
> 
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha/broker.conf
> /dev/mapper/HostVG-Config on /var/log/ovirt-hosted-engine-setup
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/answers.conf
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/hosted-engine.conf
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/iptables.example
> /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/vm.conf

This is 3.5 vs 3.6 persistence or only 3.5? 

> 
> Note that the latter does not have persistence under
> /etc/ovirt-hosted-engine-ha.
> 
> What is the correct persistence setting for 3.5 (20160219.0.el7ev)? These
> were all deployed the same way, it seems to be random.

Could you please share which way is the deploy for us trying a reproducer?

Comment 8 Germano Veit Michel 2017-02-08 06:52:51 UTC
Hi Douglas,

Sorry but just saw this now (missing needinfo).

(In reply to Douglas Schilling Landgraf from comment #7)
> This is 3.5 vs 3.6 persistence or only 3.5? 

Only 3.5, all the same version: 20160219.0.el7ev. I was thinking it could be something special to the first host (id=1?).

> Could you please share which way is the deploy for us trying a reproducer?
Nothing special. Following the documentation [1].

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Installation_Guide/Installing_Additional_Hosts_to_a_Self-Hosted_Environment.html

Comment 10 Sandro Bonazzola 2017-02-14 08:23:46 UTC
Douglas, can you please follow up?

Comment 11 Sandro Bonazzola 2017-03-07 09:47:18 UTC
Simone can you help Douglas on this?

Comment 12 Douglas Schilling Landgraf 2017-03-09 04:39:34 UTC
Just to share some tests:

I have tried to reproduce this report executing some node upgrades and so far it's not possible. The only way I could reproduce is forcing the persist and upgrading. Under investigation.

Comment 13 Douglas Schilling Landgraf 2017-03-13 20:36:37 UTC
I was able to reproduce the report. During the deploy, the file was persisted. Node upgrade and base-score still show 2400.

Comment 14 Germano Veit Michel 2017-03-15 23:48:08 UTC
Douglas, thanks for working on it.

But looking at the patch worried me a bit: as you can see on https://bugzilla.redhat.com/show_bug.cgi?id=1415561#c5 this is not a isolated case of agent.conf base score. It's mostly it, but sometimes some other files are left behind (hosted-engine.conf). Maybe the solution needs to be more towards the root cause and not ensuring a specific set of values are updated?

Comment 15 Douglas Schilling Landgraf 2017-03-16 00:41:09 UTC
Hi Germano,

(In reply to Germano Veit Michel from comment #14)
> Douglas, thanks for working on it.
> 
> But looking at the patch worried me a bit: as you can see on
> https://bugzilla.redhat.com/show_bug.cgi?id=1415561#c5 this is not a
> isolated case of agent.conf base score. It's mostly it, but sometimes some
> other files are left behind (hosted-engine.conf). Maybe the solution needs
> to be more towards the root cause and not ensuring a specific set of values
> are updated?

In this case, I had understood agent.conf was the one affected but comment#5 shows others files might be involved as well. Let me check others possibilities.

Comment 16 Germano Veit Michel 2017-03-16 00:45:20 UTC
(In reply to Douglas Schilling Landgraf from comment #15)
> In this case, I had understood agent.conf was the one affected but comment#5
> shows others files might be involved as well. Let me check others
> possibilities.

Excellent. Please let me know if you need any sort of logs or more info. I can try to dig it from the data we have.

Comment 18 Douglas Schilling Landgraf 2017-03-16 15:48:06 UTC
Germano, 

Before the upgrade, could you please confirm the output of:

# rpm -Va ovirt-hosted-engine-ha
# rpm -Va ovirt-hosted-engine-setup


Thanks!

Comment 19 Germano Veit Michel 2017-03-16 23:28:49 UTC
(In reply to Douglas Schilling Landgraf from comment #18)
> Germano, 
> 
> Before the upgrade, could you please confirm the output of:
> 
> # rpm -Va ovirt-hosted-engine-ha
> # rpm -Va ovirt-hosted-engine-setup
> 
> 
> Thanks!

Hi Douglas,

All these upgrades were from 20160219.0.el7ev (3.5) to 20160920.1.el7ev (3.6), so the package versions are fixed. See below:

Pre-Upgrade (20160219.0.el7ev)
  RHV 3.5.8
  ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch
  ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch

Post-Upgrade (20160920.1.el7ev)
  RHV 3.6.9
  ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch
  ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch

Comment 20 Germano Veit Michel 2017-03-16 23:34:22 UTC
Oops, now I realized you wanted to verify the packages not just query the versions.

Unfortunately the customer has already upgraded all the environments and -Va is not captured in the sosreports. If I see this again I can paste the output here.

Sorry

Comment 21 Douglas Schilling Landgraf 2017-03-17 16:53:17 UTC
I believe we figure out what the root cause for this report, this happened when we created the plugin for hosted-engine in ovirt-node-plugin-hosted-engine in the early days of 3.5. Basically, after deploy we persist everything in
"/etc/ovirt-hosted-engine", "/etc/ovirt-hosted-engine-ha", "/etc/ovirt-hosted-engine-setup.env.d":

<snip>
            def run_additional(*args):
                with self.application.ui.suspended():
                    try:
                        utils.process.call(
                            "reset; screen hosted-engine --deploy",
                            shell=True)
                        sys.stdout.write("Press <Return> to return to the TUI")
                        console.wait_for_keypress()
                        self.__persist_configs()
</snip>


<snip>
    def __persist_configs(self):
        dirs = ["/etc/ovirt-hosted-engine", "/etc/ovirt-hosted-engine-ha",
                "/etc/ovirt-hosted-engine-setup.env.d"]
        [Config().persist(d) for d in dirs]
</snip>


We could write a hook before the upgrade to unpersist everything in these directories or compare what we have changed in filesystem with rpm -Va and compare with new files in the new image but there is no safe guarantee which files should keep persisted or not. At this scenario, seems preferable to suggest the KCS created by customer and SEG.

Comment 22 Simone Tiraboschi 2017-03-17 17:04:38 UTC
hosted-engine-setup should persist by itself all the file it needs but now we understood that everything under /etc/ovirt-hosted-engine, /etc/ovirt-hosted-engine-ha, /etc/ovirt-hosted-engine-setup.env.d was also persisted by ovirt-node-plugin-hosted-engine so, if we now we prevent ovirt-node-plugin-hosted-engine from doing that, we should carefully retest that hosted-engine-setup on 3.6 was really persisting everything is needed to be persisted.

Being the last 3.6.z release, having it implicitly fixed by NGN on later version and having it affecting just customers that upgrade an hosted-engine env from 3.5 to 3.6 without moving now to 4.0, I'd suggest as well to avoid a potentially risky code fix and just address it with a KCS since the workaround is also pretty simple.


Note You need to log in before you can comment on or make changes to this bug.