Description of problem: It was discovered in BZ that conf_volume file on SHE storage does contain bogus content: # file=$( awk -F= '/^conf_volume/ { print $2 }' /etc/ovirt-hosted-engine/hosted-engine.conf ) # domain=$( awk -F= '/^sdUUID/ { print $2 }' /etc/ovirt-hosted-engine/hosted-engine.conf ) # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs -I {} tar Oxf {} version 1.3.5.7[root@dell-r210ii-03 ~]# # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs -I {} tar Oxf {} fhanswers.conf None# See 'None' in fhanswers.conf. This has impacted SHE env migration. Version-Release number of selected component (if applicable): discovered in 1.3.3.6 (brq-setup env) How reproducible: not clear Steps to Reproduce: 1. see https://bugzilla.redhat.com/show_bug.cgi?id=1366879#c9 2. 3. Actual results: if during SHE env upgrade there probably was an issue with getting /etc/ovirt-hosted-engine/answers.conf, it could end that fhanswers.conf inside conf_volume file in storage has 'None', obviously bogus Expected results: there should be check that bogus is not present in various files in conf_volume file tarball on storage Additional info: no logs, discovered it happened cca 1.5 year ago, sometime during 3.5 -> 3.6
(In reply to Jiri Belka from comment #0) > Description of problem: > > It was discovered in BZ that conf_volume file on SHE storage does contain > bogus content: > > # file=$( awk -F= '/^conf_volume/ { print $2 }' > /etc/ovirt-hosted-engine/hosted-engine.conf ) > > # domain=$( awk -F= '/^sdUUID/ { print $2 }' > /etc/ovirt-hosted-engine/hosted-engine.conf ) > > # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs > -I {} tar Oxf {} version > 1.3.5.7[root@dell-r210ii-03 ~]# > > # find /rhev/data-center/ -path "*/$domain/*" -type f -name "$file" | xargs > -I {} tar Oxf {} fhanswers.conf > None# > > See 'None' in fhanswers.conf. This has impacted SHE env migration. > > Version-Release number of selected component (if applicable): > discovered in 1.3.3.6 (brq-setup env) > > How reproducible: > not clear Well, c9 there is a full reproducer, why not clear? It's not clear what happened on your care instead of someone doing 'rm', but that flow is very clear. > > Steps to Reproduce: > 1. see https://bugzilla.redhat.com/show_bug.cgi?id=1366879#c9 > 2. > 3. > > Actual results: > if during SHE env upgrade there probably was an issue with getting > /etc/ovirt-hosted-engine/answers.conf, it could end that fhanswers.conf > inside conf_volume file in storage has 'None', obviously bogus > > Expected results: > there should be check that bogus is not present in various files in > conf_volume file tarball on storage And then what? Not sure what's the purpose of this bug. If you just want a nicer error message, I thought that's what we have bug 1366879 for. I asked for a new bug not for this, but for not _writing_ None to the shared storage. And under normal circumstances, it's a 3.6-only bug. > > Additional info: > no logs, discovered it happened cca 1.5 year ago, sometime during 3.5 -> 3.6
> And then what? Not sure what's the purpose of this bug. If you just want > a nicer error message, I thought that's what we have bug 1366879 for. > > I asked for a new bug not for this, but for not _writing_ None to the > shared storage. And under normal circumstances, it's a 3.6-only bug. IIUC there's no check what is written to these files in tarball, it just believes it has written good stuff.
(In reply to Jiri Belka from comment #2) > > And then what? Not sure what's the purpose of this bug. If you just want > > a nicer error message, I thought that's what we have bug 1366879 for. > > > > I asked for a new bug not for this, but for not _writing_ None to the > > shared storage. And under normal circumstances, it's a 3.6-only bug. > > IIUC there's no check what is written to these files in tarball, it just > believes it has written good stuff. OK, changing current bug: Expected Results: If during upgrade HA fails reading the local answer file, it should fail instead of writing None in shared fhanswers.conf. It should continue trying in a loop (as I think it already does), so that if/when local answers.conf is fixed/restored, it will try again and should succeed. Changing also summary line. If you want something else, please update accordingly.
IIUC the product/component should point at the patched package, not at where the bug is perceived to be. Also, it's actually more important for 3.6 than for 4.0, as it affects (only?) the 3.5->3.6 upgrade.
I've removed the /etc/ovirt-hosted-engine/answers.conf prior to 3.6.9->4.0.3 upgrade and the upgrade succeeded regardless of unavailable answers.conf. Works for me on these components on host: libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-imageio-common-0.3.0-0.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 sanlock-3.2.4-3.el7_2.x86_64 rhevm-appliance-20160731.0-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch mom-0.5.5-1.el7ev.noarch ovirt-host-deploy-1.5.1-1.el7ev.noarch vdsm-4.18.11-1.el7ev.x86_64 rhev-release-3.6.9-1-001.noarch ovirt-imageio-daemon-0.3.0-0.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch rhev-release-4.0.3-1-001.noarch ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016 Linux 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: ovirt-engine-dwh-setup-4.0.2-1.el7ev.noarch ovirt-image-uploader-4.0.0-1.el7ev.noarch ovirt-imageio-proxy-setup-0.3.0-0.el7ev.noarch ovirt-engine-webadmin-portal-4.0.3-0.1.el7ev.noarch ovirt-engine-restapi-4.0.3-0.1.el7ev.noarch ovirt-host-deploy-1.5.1-1.el7ev.noarch ovirt-engine-extension-aaa-jdbc-1.1.0-1.el7ev.noarch ovirt-engine-cli-3.6.8.1-1.el7ev.noarch ovirt-engine-websocket-proxy-4.0.3-0.1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch ovirt-log-collector-4.0.0-1.el7ev.noarch ovirt-imageio-proxy-0.3.0-0.el7ev.noarch ovirt-engine-tools-4.0.3-0.1.el7ev.noarch ovirt-engine-setup-base-4.0.3-0.1.el7ev.noarch ovirt-engine-setup-plugin-ovirt-engine-common-4.0.3-0.1.el7ev.noarch ovirt-engine-setup-plugin-ovirt-engine-4.0.3-0.1.el7ev.noarch python-ovirt-engine-sdk4-4.0.0-0.5.a5.el7ev.x86_64 ovirt-iso-uploader-4.0.0-1.el7ev.noarch ovirt-imageio-common-0.3.0-0.el7ev.noarch ovirt-engine-dashboard-1.0.3-1.el7ev.x86_64 ovirt-engine-userportal-4.0.3-0.1.el7ev.noarch ovirt-engine-4.0.3-0.1.el7ev.noarch ovirt-host-deploy-java-1.5.1-1.el7ev.noarch ovirt-engine-lib-4.0.3-0.1.el7ev.noarch ovirt-engine-setup-plugin-websocket-proxy-4.0.3-0.1.el7ev.noarch ovirt-engine-setup-4.0.3-0.1.el7ev.noarch ovirt-engine-vmconsole-proxy-helper-4.0.3-0.1.el7ev.noarch ovirt-engine-tools-backup-4.0.3-0.1.el7ev.noarch ovirt-vmconsole-proxy-1.0.4-1.el7ev.noarch ovirt-engine-dbscripts-4.0.3-0.1.el7ev.noarch ovirt-engine-dwh-4.0.2-1.el7ev.noarch ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.3-0.1.el7ev.noarch ovirt-engine-extensions-api-impl-4.0.3-0.1.el7ev.noarch ovirt-engine-backend-4.0.3-0.1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch rhevm-doc-4.0.0-3.el7ev.noarch rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch rhev-guest-tools-iso-4.0-5.el7ev.noarch rhevm-4.0.3-0.1.el7ev.noarch rhevm-branding-rhev-4.0.0-5.el7ev.noarch rhevm-guest-agent-common-1.0.12-3.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-setup-plugins-4.0.0.2-1.el7ev.noarch rhev-release-4.0.3-1-001.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) During hosted-engine --upgrade-appliance I've used the rhevm-appliance-20160731.0-1.el7ev.noarch, then updated engine's repos and installed the latest 4.0.3 bits.