Description of problem: After upgrading from 3.5 to 3.6 by following the documentation, agent.conf still contains previous version scores (2400 max) on some hosts. This happened on different occasions, with 3 different Hosts. I'm still trying to understand how. They all look like this: # cat etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160920.1.el7ev) # cat etc/ovirt-hosted-engine-ha/agent.conf [score] # NOTE: These values must be the same for all hosts in the HA cluster! base-score=2400 gateway-score-penalty=1600 mgmt-bridge-score-penalty=600 free-memory-score-penalty=400 cpu-load-score-penalty=1000 engine-retry-score-penalty=50 cpu-load-penalty-min=0.4 cpu-load-penalty-max=0.9 # cat mount | grep ovirt-hosted-engine-ha none on /var/lib/ovirt-hosted-engine-ha type tmpfs (rw,relatime,seclabel) /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha type ext4 (rw,noatime,seclabel,data=ordered) [CONFIG] These are also persisted on 3.6 Hosts. But it looks like it has the previous version hypervisor image (3.5 image - 7.2-20160219) agent.conf persisted, instead of the newer one. * All upgrade logs rotated Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160920.1.el7ev) ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch How reproducible: 0% Actual results: agent.conf outdated, host with 2400 Score even when all is correct and other files are correct (i.e.: hosted-engine.conf) Expected results: agent.conf updated, host with 3400 Score.
Hi Douglas, Hosts persisted like this seem to hit the problem: /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d Whereas these ones, do not: /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha/broker.conf /dev/mapper/HostVG-Config on /var/log/ovirt-hosted-engine-setup /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/answers.conf /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/hosted-engine.conf /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/iptables.example /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/vm.conf Note that the latter does not have persistence under /etc/ovirt-hosted-engine-ha. What is the correct persistence setting for 3.5 (20160219.0.el7ev)? These were all deployed the same way, it seems to be random.
Hi Germano, (In reply to Germano Veit Michel from comment #6) > Hi Douglas, > > Hosts persisted like this seem to hit the problem: > > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d > > Whereas these ones, do not: > > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-ha/broker.conf > /dev/mapper/HostVG-Config on /var/log/ovirt-hosted-engine-setup > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine-setup.env.d > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/answers.conf > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/hosted-engine.conf > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/iptables.example > /dev/mapper/HostVG-Config on /etc/ovirt-hosted-engine/vm.conf This is 3.5 vs 3.6 persistence or only 3.5? > > Note that the latter does not have persistence under > /etc/ovirt-hosted-engine-ha. > > What is the correct persistence setting for 3.5 (20160219.0.el7ev)? These > were all deployed the same way, it seems to be random. Could you please share which way is the deploy for us trying a reproducer?
Hi Douglas, Sorry but just saw this now (missing needinfo). (In reply to Douglas Schilling Landgraf from comment #7) > This is 3.5 vs 3.6 persistence or only 3.5? Only 3.5, all the same version: 20160219.0.el7ev. I was thinking it could be something special to the first host (id=1?). > Could you please share which way is the deploy for us trying a reproducer? Nothing special. Following the documentation [1]. [1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Installation_Guide/Installing_Additional_Hosts_to_a_Self-Hosted_Environment.html
Douglas, can you please follow up?
Simone can you help Douglas on this?
Just to share some tests: I have tried to reproduce this report executing some node upgrades and so far it's not possible. The only way I could reproduce is forcing the persist and upgrading. Under investigation.
I was able to reproduce the report. During the deploy, the file was persisted. Node upgrade and base-score still show 2400.
Douglas, thanks for working on it. But looking at the patch worried me a bit: as you can see on https://bugzilla.redhat.com/show_bug.cgi?id=1415561#c5 this is not a isolated case of agent.conf base score. It's mostly it, but sometimes some other files are left behind (hosted-engine.conf). Maybe the solution needs to be more towards the root cause and not ensuring a specific set of values are updated?
Hi Germano, (In reply to Germano Veit Michel from comment #14) > Douglas, thanks for working on it. > > But looking at the patch worried me a bit: as you can see on > https://bugzilla.redhat.com/show_bug.cgi?id=1415561#c5 this is not a > isolated case of agent.conf base score. It's mostly it, but sometimes some > other files are left behind (hosted-engine.conf). Maybe the solution needs > to be more towards the root cause and not ensuring a specific set of values > are updated? In this case, I had understood agent.conf was the one affected but comment#5 shows others files might be involved as well. Let me check others possibilities.
(In reply to Douglas Schilling Landgraf from comment #15) > In this case, I had understood agent.conf was the one affected but comment#5 > shows others files might be involved as well. Let me check others > possibilities. Excellent. Please let me know if you need any sort of logs or more info. I can try to dig it from the data we have.
Germano, Before the upgrade, could you please confirm the output of: # rpm -Va ovirt-hosted-engine-ha # rpm -Va ovirt-hosted-engine-setup Thanks!
(In reply to Douglas Schilling Landgraf from comment #18) > Germano, > > Before the upgrade, could you please confirm the output of: > > # rpm -Va ovirt-hosted-engine-ha > # rpm -Va ovirt-hosted-engine-setup > > > Thanks! Hi Douglas, All these upgrades were from 20160219.0.el7ev (3.5) to 20160920.1.el7ev (3.6), so the package versions are fixed. See below: Pre-Upgrade (20160219.0.el7ev) RHV 3.5.8 ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch Post-Upgrade (20160920.1.el7ev) RHV 3.6.9 ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch
Oops, now I realized you wanted to verify the packages not just query the versions. Unfortunately the customer has already upgraded all the environments and -Va is not captured in the sosreports. If I see this again I can paste the output here. Sorry
I believe we figure out what the root cause for this report, this happened when we created the plugin for hosted-engine in ovirt-node-plugin-hosted-engine in the early days of 3.5. Basically, after deploy we persist everything in "/etc/ovirt-hosted-engine", "/etc/ovirt-hosted-engine-ha", "/etc/ovirt-hosted-engine-setup.env.d": <snip> def run_additional(*args): with self.application.ui.suspended(): try: utils.process.call( "reset; screen hosted-engine --deploy", shell=True) sys.stdout.write("Press <Return> to return to the TUI") console.wait_for_keypress() self.__persist_configs() </snip> <snip> def __persist_configs(self): dirs = ["/etc/ovirt-hosted-engine", "/etc/ovirt-hosted-engine-ha", "/etc/ovirt-hosted-engine-setup.env.d"] [Config().persist(d) for d in dirs] </snip> We could write a hook before the upgrade to unpersist everything in these directories or compare what we have changed in filesystem with rpm -Va and compare with new files in the new image but there is no safe guarantee which files should keep persisted or not. At this scenario, seems preferable to suggest the KCS created by customer and SEG.
hosted-engine-setup should persist by itself all the file it needs but now we understood that everything under /etc/ovirt-hosted-engine, /etc/ovirt-hosted-engine-ha, /etc/ovirt-hosted-engine-setup.env.d was also persisted by ovirt-node-plugin-hosted-engine so, if we now we prevent ovirt-node-plugin-hosted-engine from doing that, we should carefully retest that hosted-engine-setup on 3.6 was really persisting everything is needed to be persisted. Being the last 3.6.z release, having it implicitly fixed by NGN on later version and having it affecting just customers that upgrade an hosted-engine env from 3.5 to 3.6 without moving now to 4.0, I'd suggest as well to avoid a potentially risky code fix and just address it with a KCS since the workaround is also pretty simple.