Red Hat Bugzilla – Bug 995504
rhevm-upgrade fails to populate /etc/sysconfig/ovirt-engine correctly when migrating from RHEV 3.0 to 3.1
Last modified: 2015-09-22 09:09 EDT
Description of problem:
When using the rhevm-upgrade script to migration from RHEV 3.0 to RHEV 3.1, the migration fails at the end because ovirt-engine fails to start. Troubleshooting the issue resulted showed that /etc/sysconfig/ovirt-engine was populated with invalid values
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Build RHEV 3.0 environment
2. Change proper channels and run 'yum update rhevm-setup'
3. Run rhevm-upgrade to migration to 3.1 (it will be necessary to also use the workaround in bugzilla 995502)
ovirt-engine fails to start
When ovirt-engine starts, the following lines were posted in /var/log/messages:
Aug 8 14:41:32 rhev-manager engine-service: The value "$HTTPS_PORT$" of parameter "ENGINE_HTTPS_PORT" is not a valid integer.
Aug 8 14:42:01 rhevm-manager engine-service: The value "$HTTPS_PORT$" of parameter "ENGINE_HTTPS_PORT" is not a valid integer.
To troubleshoot this, we edited the rhevm-upgrade script manually and commented out the line that attempts to start ovirt-engine when the upgrade completes so that the script would succeed and not attempt to roll back.
This is the contents of /etc/sysconfig/ovirt-engine when the script completed (except, of course for the database password hash):
# For descriptions of the parameters and their default values look
# at the /usr/share/ovirt-engine/service/ovirt-engine.defaults
# Please note that the engine installation tool engine-setup will
# append the modified parameters to the end of this file.
The ENGINE_FQDN, ENGINE_HTTP_PORT, and ENGINE_HTTPS_PORT values are all populated with variable names instead of values. Fixing these values manually allowed ovirt-engine to start correctly.
It is not clear how did customers' system got into this state. The problem lies in the /etc/rhevm/web-conf.js file prior to upgrade, that is in 3.0.
The bad values you're talking about are fetched from that file, but having them like that would mean that engine can't work.
Would you mind please checking with the customer how the 3.0 got into this state?
Every time the upgrade failed, the rollback would also fail, leaving his environment unusable. So each time, he built a fresh RHEV-M, imported his database and certs, and continued. Each time, the same issue occurred.
I'll get the details on his rebuild process as soon as possible, in case this is how this happens.
If the customer tried this again, please ask him to:
1. Verify the engine/rhev works correctly with all the entities prior to upgrade attempt (after recovering certificates and the DB).
2. Attach the /etc/rhevm/web-conf.js to the report.
I can confirm that RHEV worked correctly prior to upgrade. The rebuilds were specifically so that he could manage VMs while we worked on figuring out the cause of the failure.
I'll provide the /etc/rhevm/web-conf.js if the customer still has a copy of it.
Another possibly important note:
The 3.0 environment had been previously upgrade from 2.2, then restored from backup later.
I would still want to see their /etc/rhevm/web-conf.js.
He has no file by that name at that location at all.
It is extremely strange. This file is responsible for jboss working correctly, especially in 3.0.
Could you please make sure they checked file's existence in 3.0? 3.1 and further installations do not need it, so it may not exist on 3.1 and later installations.
(In reply to Alex Lourie from comment #12)
> Hey Allan
> Any news?
The sysadmin working this issue is out of the office for the next few days.
This issue is going on for awhile without progress. Do you want to keep it open for a bit longer and get the info from customer or you would prefer closing it and reopen if becomes relevant again?
This is extremely strange. The file web-conf.js is the "bad" one while the backup one is the "good" one. This looks as something done in a wrong order on the customer's system.
With the file contents as in backup the upgrade will work fine. With the content of the web-conf.js it will fail.
I can't explain how web-conf.js became as it is. I suggest closing the bug and reopening if similar issue arises.
Every time the installation would fail, the customer would have rebuild his RHEV-M because rollback would also fail. I assume that the incorrect values in web.conf.js are related to the reinstallation.
Shouldn't rhevm-upgrade be able to handle the situation?
Automatic rollback during the 3.0 -> 3.1 upgrade is not working due to substantial architectural changes in jboss, not allowing us to restore the system.
Additionally, we do not touch web-conf.js at all on upgrade, we only read values from it. It seems that during one of the rebuild attempts it was copied over incorrectly, and from that point on the upgrade can not succeed in populating /etc/sysconfig/ovirt-engine correctly.
If we know about a clear flow, where customer had web-conf.js correctly defined prior to upgrade, and the upgrade still failed to populate the final ovirt-engine file, I'd be happy to analyse it. Other than that, I don't see what else can we do in this specific case.
I was thinking along the lines of checking the contents of web-conf.js to ensure that values were in place instead of placeholders before the upgrade occurs, and reporting the error and aborting the upgrade before the upgrade goes far enough to require a rollback.