Bug 1367732 - If ovirt-ha-agent fails to read local answers.conf during upgrade, it writes None to shared fhanswers.conf
Summary: If ovirt-ha-agent fails to read local answers.conf during upgrade, it writes ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.0.3
: 2.0.3
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1366879 1368127 1368399 1369784
TreeView+ depends on / blocked
 
Reported: 2016-08-17 10:38 UTC by Jiri Belka
Modified: 2017-05-11 09:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1368127 (view as bug list)
Environment:
Last Closed: 2016-08-31 09:34:35 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: blocker+
ylavi: planning_ack+
dfediuck: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 62507 0 master MERGED upgrade: stop the upgrade with unavailable files 2016-08-23 08:41:54 UTC
oVirt gerrit 62583 0 v2.0.z MERGED upgrade: stop the upgrade with unavailable files 2016-08-23 08:42:21 UTC

Description Jiri Belka 2016-08-17 10:38:37 UTC
Description of problem:

It was discovered in BZ that conf_volume file on SHE storage does contain bogus content:

# file=$( awk -F= '/^conf_volume/ { print $2 }' /etc/ovirt-hosted-engine/hosted-engine.conf )

# domain=$( awk -F= '/^sdUUID/ { print $2 }' /etc/ovirt-hosted-engine/hosted-engine.conf )

# find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs -I {} tar Oxf {} version
1.3.5.7[root@dell-r210ii-03 ~]# 

# find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs -I {} tar Oxf {} fhanswers.conf
None# 

See 'None' in fhanswers.conf. This has impacted SHE env migration.

Version-Release number of selected component (if applicable):
discovered in 1.3.3.6 (brq-setup env)

How reproducible:
not clear

Steps to Reproduce:
1. see https://bugzilla.redhat.com/show_bug.cgi?id=1366879#c9
2.
3.

Actual results:
if during SHE env upgrade there probably was an issue with getting /etc/ovirt-hosted-engine/answers.conf, it could end that fhanswers.conf inside conf_volume file in storage has 'None', obviously bogus

Expected results:
there should be check that bogus is not present in various files in conf_volume file tarball on storage

Additional info:
no logs, discovered it happened cca 1.5 year ago, sometime during 3.5 -> 3.6

Comment 1 Yedidyah Bar David 2016-08-17 12:13:50 UTC
(In reply to Jiri Belka from comment #0)
> Description of problem:
> 
> It was discovered in BZ that conf_volume file on SHE storage does contain
> bogus content:
> 
> # file=$( awk -F= '/^conf_volume/ { print $2 }'
> /etc/ovirt-hosted-engine/hosted-engine.conf )
> 
> # domain=$( awk -F= '/^sdUUID/ { print $2 }'
> /etc/ovirt-hosted-engine/hosted-engine.conf )
> 
> # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs
> -I {} tar Oxf {} version
> 1.3.5.7[root@dell-r210ii-03 ~]# 
> 
> # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs
> -I {} tar Oxf {} fhanswers.conf
> None# 
> 
> See 'None' in fhanswers.conf. This has impacted SHE env migration.
> 
> Version-Release number of selected component (if applicable):
> discovered in 1.3.3.6 (brq-setup env)
> 
> How reproducible:
> not clear

Well, c9 there is a full reproducer, why not clear? It's not clear
what happened on your care instead of someone doing 'rm', but that
flow is very clear.

> 
> Steps to Reproduce:
> 1. see https://bugzilla.redhat.com/show_bug.cgi?id=1366879#c9
> 2.
> 3.
> 
> Actual results:
> if during SHE env upgrade there probably was an issue with getting
> /etc/ovirt-hosted-engine/answers.conf, it could end that fhanswers.conf
> inside conf_volume file in storage has 'None', obviously bogus
> 
> Expected results:
> there should be check that bogus is not present in various files in
> conf_volume file tarball on storage

And then what? Not sure what's the purpose of this bug. If you just want
a nicer error message, I thought that's what we have bug 1366879 for.

I asked for a new bug not for this, but for not _writing_ None to the
shared storage. And under normal circumstances, it's a 3.6-only bug.

> 
> Additional info:
> no logs, discovered it happened cca 1.5 year ago, sometime during 3.5 -> 3.6

Comment 2 Jiri Belka 2016-08-17 17:11:28 UTC
> And then what? Not sure what's the purpose of this bug. If you just want
> a nicer error message, I thought that's what we have bug 1366879 for.
> 
> I asked for a new bug not for this, but for not _writing_ None to the
> shared storage. And under normal circumstances, it's a 3.6-only bug.

IIUC there's no check what is written to these files in tarball, it just believes it has written good stuff.

Comment 3 Yedidyah Bar David 2016-08-18 06:57:17 UTC
(In reply to Jiri Belka from comment #2)
> > And then what? Not sure what's the purpose of this bug. If you just want
> > a nicer error message, I thought that's what we have bug 1366879 for.
> > 
> > I asked for a new bug not for this, but for not _writing_ None to the
> > shared storage. And under normal circumstances, it's a 3.6-only bug.
> 
> IIUC there's no check what is written to these files in tarball, it just
> believes it has written good stuff.

OK, changing current bug:

Expected Results:

If during upgrade HA fails reading the local answer file, it should fail instead of writing None in shared fhanswers.conf. It should continue trying in a loop (as I think it already does), so that if/when local answers.conf is fixed/restored, it will try again and should succeed.

Changing also summary line. If you want something else, please update accordingly.

Comment 4 Yedidyah Bar David 2016-08-18 13:08:57 UTC
IIUC the product/component should point at the patched package, not at where the bug is perceived to be. Also, it's actually more important for 3.6 than for 4.0, as it affects (only?) the 3.5->3.6 upgrade.

Comment 9 Nikolai Sednev 2016-08-30 13:32:01 UTC
I've removed the /etc/ovirt-hosted-engine/answers.conf prior to 3.6.9->4.0.3 upgrade and the upgrade succeeded regardless of unavailable answers.conf.
Works for me on these components on host:
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
sanlock-3.2.4-3.el7_2.x86_64
rhevm-appliance-20160731.0-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch
mom-0.5.5-1.el7ev.noarch
ovirt-host-deploy-1.5.1-1.el7ev.noarch
vdsm-4.18.11-1.el7ev.x86_64
rhev-release-3.6.9-1-001.noarch
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
rhev-release-4.0.3-1-001.noarch
ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch
Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016
Linux 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Engine:
ovirt-engine-dwh-setup-4.0.2-1.el7ev.noarch
ovirt-image-uploader-4.0.0-1.el7ev.noarch
ovirt-imageio-proxy-setup-0.3.0-0.el7ev.noarch
ovirt-engine-webadmin-portal-4.0.3-0.1.el7ev.noarch
ovirt-engine-restapi-4.0.3-0.1.el7ev.noarch
ovirt-host-deploy-1.5.1-1.el7ev.noarch
ovirt-engine-extension-aaa-jdbc-1.1.0-1.el7ev.noarch
ovirt-engine-cli-3.6.8.1-1.el7ev.noarch
ovirt-engine-websocket-proxy-4.0.3-0.1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch
ovirt-log-collector-4.0.0-1.el7ev.noarch
ovirt-imageio-proxy-0.3.0-0.el7ev.noarch
ovirt-engine-tools-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-base-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.0.3-0.1.el7ev.noarch
python-ovirt-engine-sdk4-4.0.0-0.5.a5.el7ev.x86_64
ovirt-iso-uploader-4.0.0-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-engine-dashboard-1.0.3-1.el7ev.x86_64
ovirt-engine-userportal-4.0.3-0.1.el7ev.noarch
ovirt-engine-4.0.3-0.1.el7ev.noarch
ovirt-host-deploy-java-1.5.1-1.el7ev.noarch
ovirt-engine-lib-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.0.3-0.1.el7ev.noarch
ovirt-engine-setup-4.0.3-0.1.el7ev.noarch
ovirt-engine-vmconsole-proxy-helper-4.0.3-0.1.el7ev.noarch
ovirt-engine-tools-backup-4.0.3-0.1.el7ev.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7ev.noarch
ovirt-engine-dbscripts-4.0.3-0.1.el7ev.noarch
ovirt-engine-dwh-4.0.2-1.el7ev.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.3-0.1.el7ev.noarch
ovirt-engine-extensions-api-impl-4.0.3-0.1.el7ev.noarch
ovirt-engine-backend-4.0.3-0.1.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch
rhevm-doc-4.0.0-3.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch
rhev-guest-tools-iso-4.0-5.el7ev.noarch
rhevm-4.0.3-0.1.el7ev.noarch
rhevm-branding-rhev-4.0.0-5.el7ev.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhevm-setup-plugins-4.0.0.2-1.el7ev.noarch
rhev-release-4.0.3-1-001.noarch
Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016
Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

During hosted-engine --upgrade-appliance I've used the rhevm-appliance-20160731.0-1.el7ev.noarch, then updated engine's repos and installed the latest 4.0.3 bits.


Note You need to log in before you can comment on or make changes to this bug.