Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1367732

Summary: If ovirt-ha-agent fails to read local answers.conf during upgrade, it writes None to shared fhanswers.conf
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Jiri Belka <jbelka>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: ---CC: bugs, dfediuck, didi, jbelka, nsednev, pstehlik, sbonazzo, ylavi
Target Milestone: ovirt-4.0.3Keywords: Triaged, ZStream
Target Release: 2.0.3Flags: rule-engine: ovirt-4.0.z+
rule-engine: blocker+
ylavi: planning_ack+
dfediuck: devel_ack+
pstehlik: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1368127 (view as bug list) Environment:
Last Closed: 2016-08-31 09:34:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1366879, 1368127, 1368399, 1369784    

Description Jiri Belka 2016-08-17 10:38:37 UTC
Description of problem:

It was discovered in BZ that conf_volume file on SHE storage does contain bogus content:

# file=$( awk -F= '/^conf_volume/ { print $2 }' /etc/ovirt-hosted-engine/hosted-engine.conf )

# domain=$( awk -F= '/^sdUUID/ { print $2 }' /etc/ovirt-hosted-engine/hosted-engine.conf )

# find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs -I {} tar Oxf {} version
1.3.5.7[root@dell-r210ii-03 ~]# 

# find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs -I {} tar Oxf {} fhanswers.conf
None# 

See 'None' in fhanswers.conf. This has impacted SHE env migration.

Version-Release number of selected component (if applicable):
discovered in 1.3.3.6 (brq-setup env)

How reproducible:
not clear

Steps to Reproduce:
1. see https://bugzilla.redhat.com/show_bug.cgi?id=1366879#c9
2.
3.

Actual results:
if during SHE env upgrade there probably was an issue with getting /etc/ovirt-hosted-engine/answers.conf, it could end that fhanswers.conf inside conf_volume file in storage has 'None', obviously bogus

Expected results:
there should be check that bogus is not present in various files in conf_volume file tarball on storage

Additional info:
no logs, discovered it happened cca 1.5 year ago, sometime during 3.5 -> 3.6

Comment 1 Yedidyah Bar David 2016-08-17 12:13:50 UTC
(In reply to Jiri Belka from comment #0)
> Description of problem:
> 
> It was discovered in BZ that conf_volume file on SHE storage does contain
> bogus content:
> 
> # file=$( awk -F= '/^conf_volume/ { print $2 }'
> /etc/ovirt-hosted-engine/hosted-engine.conf )
> 
> # domain=$( awk -F= '/^sdUUID/ { print $2 }'
> /etc/ovirt-hosted-engine/hosted-engine.conf )
> 
> # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs
> -I {} tar Oxf {} version
> 1.3.5.7[root@dell-r210ii-03 ~]# 
> 
> # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs
> -I {} tar Oxf {} fhanswers.conf
> None# 
> 
> See 'None' in fhanswers.conf. This has impacted SHE env migration.
> 
> Version-Release number of selected component (if applicable):
> discovered in 1.3.3.6 (brq-setup env)
> 
> How reproducible:
> not clear

Well, c9 there is a full reproducer, why not clear? It's not clear
what happened on your care instead of someone doing 'rm', but that
flow is very clear.

> 
> Steps to Reproduce:
> 1. see https://bugzilla.redhat.com/show_bug.cgi?id=1366879#c9
> 2.
> 3.
> 
> Actual results:
> if during SHE env upgrade there probably was an issue with getting
> /etc/ovirt-hosted-engine/answers.conf, it could end that fhanswers.conf
> inside conf_volume file in storage has 'None', obviously bogus
> 
> Expected results:
> there should be check that bogus is not present in various files in
> conf_volume file tarball on storage

And then what? Not sure what's the purpose of this bug. If you just want
a nicer error message, I thought that's what we have bug 1366879 for.

I asked for a new bug not for this, but for not _writing_ None to the
shared storage. And under normal circumstances, it's a 3.6-only bug.

> 
> Additional info:
> no logs, discovered it happened cca 1.5 year ago, sometime during 3.5 -> 3.6

Comment 2 Jiri Belka 2016-08-17 17:11:28 UTC
> And then what? Not sure what's the purpose of this bug. If you just want
> a nicer error message, I thought that's what we have bug 1366879 for.
> 
> I asked for a new bug not for this, but for not _writing_ None to the
> shared storage. And under normal circumstances, it's a 3.6-only bug.

IIUC there's no check what is written to these files in tarball, it just believes it has written good stuff.

Comment 3 Yedidyah Bar David 2016-08-18 06:57:17 UTC
(In reply to Jiri Belka from comment #2)
> > And then what? Not sure what's the purpose of this bug. If you just want
> > a nicer error message, I thought that's what we have bug 1366879 for.
> > 
> > I asked for a new bug not for this, but for not _writing_ None to the
> > shared storage. And under normal circumstances, it's a 3.6-only bug.
> 
> IIUC there's no check what is written to these files in tarball, it just
> believes it has written good stuff.

OK, changing current bug:

Expected Results:

If during upgrade HA fails reading the local answer file, it should fail instead of writing None in shared fhanswers.conf. It should continue trying in a loop (as I think it already does), so that if/when local answers.conf is fixed/restored, it will try again and should succeed.

Changing also summary line. If you want something else, please update accordingly.

Comment 4 Yedidyah Bar David 2016-08-18 13:08:57 UTC
IIUC the product/component should point at the patched package, not at where the bug is perceived to be. Also, it's actually more important for 3.6 than for 4.0, as it affects (only?) the 3.5->3.6 upgrade.

Comment 9 Nikolai Sednev 2016-08-30 13:32:01 UTC
I've removed the /etc/ovirt-hosted-engine/answers.conf prior to 3.6.9->4.0.3 upgrade and the upgrade succeeded regardless of unavailable answers.conf.
Works for me on these components on host:
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
sanlock-3.2.4-3.el7_2.x86_64
rhevm-appliance-20160731.0-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch
mom-0.5.5-1.el7ev.noarch
ovirt-host-deploy-1.5.1-1.el7ev.noarch
vdsm-4.18.11-1.el7ev.x86_64
rhev-release-3.6.9-1-001.noarch
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
rhev-release-4.0.3-1-001.noarch
ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch
Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016
Linux 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Engine:
ovirt-engine-dwh-setup-4.0.2-1.el7ev.noarch
ovirt-image-uploader-4.0.0-1.el7ev.noarch
ovirt-imageio-proxy-setup-0.3.0-0.el7ev.noarch
ovirt-engine-webadmin-portal-4.0.3-0.1.el7ev.noarch
ovirt-engine-restapi-4.0.3-0.1.el7ev.noarch
ovirt-host-deploy-1.5.1-1.el7ev.noarch
ovirt-engine-extension-aaa-jdbc-1.1.0-1.el7ev.noarch
ovirt-engine-cli-3.6.8.1-1.el7ev.noarch
ovirt-engine-websocket-proxy-4.0.3-0.1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch
ovirt-log-collector-4.0.0-1.el7ev.noarch
ovirt-imageio-proxy-0.3.0-0.el7ev.noarch
ovirt-engine-tools-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-base-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.0.3-0.1.el7ev.noarch
python-ovirt-engine-sdk4-4.0.0-0.5.a5.el7ev.x86_64
ovirt-iso-uploader-4.0.0-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-engine-dashboard-1.0.3-1.el7ev.x86_64
ovirt-engine-userportal-4.0.3-0.1.el7ev.noarch
ovirt-engine-4.0.3-0.1.el7ev.noarch
ovirt-host-deploy-java-1.5.1-1.el7ev.noarch
ovirt-engine-lib-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-4.0.3-0.1.el7ev.noarch
ovirt-engine-vmconsole-proxy-helper-4.0.3-0.1.el7ev.noarch
ovirt-engine-tools-backup-4.0.3-0.1.el7ev.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7ev.noarch
ovirt-engine-dbscripts-4.0.3-0.1.el7ev.noarch
ovirt-engine-dwh-4.0.2-1.el7ev.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.3-0.1.el7ev.noarch
ovirt-engine-extensions-api-impl-4.0.3-0.1.el7ev.noarch
ovirt-engine-backend-4.0.3-0.1.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch
rhevm-doc-4.0.0-3.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch
rhev-guest-tools-iso-4.0-5.el7ev.noarch
rhevm-4.0.3-0.1.el7ev.noarch
rhevm-branding-rhev-4.0.0-5.el7ev.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhevm-setup-plugins-4.0.0.2-1.el7ev.noarch
rhev-release-4.0.3-1-001.noarch
Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016
Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

During hosted-engine --upgrade-appliance I've used the rhevm-appliance-20160731.0-1.el7ev.noarch, then updated engine's repos and installed the latest 4.0.3 bits.