Bug 995504 - rhevm-upgrade fails to populate /etc/sysconfig/ovirt-engine correctly when migrating from RHEV 3.0 to 3.1
rhevm-upgrade fails to populate /etc/sysconfig/ovirt-engine correctly when mi...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-setup (Show other bugs)
3.1.5
All Linux
high Severity high
: ---
: 3.1.5
Assigned To: Alex Lourie
Pavel Stehlik
integration
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-09 10:46 EDT by Allan Voss
Modified: 2015-09-22 09 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-17 15:26:55 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Allan Voss 2013-08-09 10:46:33 EDT
Description of problem:
When using the rhevm-upgrade script to migration from RHEV 3.0 to RHEV 3.1, the migration fails at the end because ovirt-engine fails to start. Troubleshooting the issue resulted showed that /etc/sysconfig/ovirt-engine was populated with invalid values

Version-Release number of selected component (if applicable):
rhevm-setup-3.1.0-55


How reproducible:
Very

Steps to Reproduce:
1. Build RHEV 3.0 environment
2. Change proper channels and run 'yum update rhevm-setup'
3. Run rhevm-upgrade to migration to 3.1 (it will be necessary to also use the workaround in bugzilla 995502)

Actual results:
ovirt-engine fails to start

Expected results:
ovirt-engine starts

Additional info:
When ovirt-engine starts, the following lines were posted in /var/log/messages:

Aug  8 14:41:32 rhev-manager engine-service[4886]: The value "$HTTPS_PORT$" of parameter "ENGINE_HTTPS_PORT" is not a valid integer.
Aug  8 14:42:01 rhevm-manager engine-service[4901]: The value "$HTTPS_PORT$" of parameter "ENGINE_HTTPS_PORT" is not a valid integer.

To troubleshoot this, we edited the rhevm-upgrade script manually and commented out the line that attempts to start ovirt-engine when the upgrade completes so that the script would succeed and not attempt to roll back.

This is the contents of /etc/sysconfig/ovirt-engine when the script completed (except, of course for the database password hash):

# For descriptions of the parameters and their default values look
# at the /usr/share/ovirt-engine/service/ovirt-engine.defaults
# file.
#
# Please note that the engine installation tool engine-setup will
# append the modified parameters to the end of this file.
#
JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
ENGINE_DB_DRIVER=org.postgresql.Driver
ENGINE_DB_URL=jdbc:postgresql://localhost:5432/engine
ENGINE_DB_USER=engine
ENGINE_FQDN=$HOST_FQDN$
ENGINE_PROXY_ENABLED=false
ENGINE_HTTP_ENABLED=true
ENGINE_HTTP_PORT=$HTTP_PORT$
ENGINE_HTTPS_ENABLED=true
ENGINE_HTTPS_PORT=$HTTPS_PORT$
ENGINE_AJP_ENABLED=false
ENGINE_DB_PASSWORD=<hash removed>

The ENGINE_FQDN, ENGINE_HTTP_PORT, and ENGINE_HTTPS_PORT values are all populated with variable names instead of values. Fixing these values manually allowed ovirt-engine to start correctly.
Comment 2 Alex Lourie 2013-08-22 09:39:17 EDT
Allan

It is not clear how did customers' system got into this state. The problem lies in the /etc/rhevm/web-conf.js file prior to upgrade, that is in 3.0.

The bad values you're talking about are fetched from that file, but having them like that would mean that engine can't work.

Would you mind please checking with the customer how the 3.0 got into this state?

Thanks.
Comment 3 Allan Voss 2013-08-22 11:13:58 EDT
Alex,

Every time the upgrade failed, the rollback would also fail, leaving his environment unusable. So each time, he built a fresh RHEV-M, imported his database and certs, and continued. Each time, the same issue occurred.

I'll get the details on his rebuild process as soon as possible, in case this is how this happens.
Comment 4 Alex Lourie 2013-08-22 17:13:37 EDT
Allan

Thank you.

If the customer tried this again, please ask him to:

1. Verify the engine/rhev works correctly with all the entities prior to upgrade attempt (after recovering certificates and the DB).
2. Attach the /etc/rhevm/web-conf.js to the report.

Thanks.
Comment 5 Allan Voss 2013-08-23 09:15:29 EDT
I can confirm that RHEV worked correctly prior to upgrade. The rebuilds were specifically so that he could manage VMs while we worked on figuring out the cause of the failure.

I'll provide the /etc/rhevm/web-conf.js if the customer still has a copy of it.
Comment 6 Allan Voss 2013-08-28 15:21:42 EDT
Another possibly important note:

The 3.0 environment had been previously upgrade from 2.2, then restored from backup later.
Comment 7 Alex Lourie 2013-09-01 11:56:09 EDT
Allan

I would still want to see their /etc/rhevm/web-conf.js.

Thanks.
Comment 9 Alex Lourie 2013-09-24 10:31:50 EDT
Hi Allan

Any updates?

Thanks.
Comment 10 Allan Voss 2013-09-26 12:07:23 EDT
Hi,

He has no file by that name at that location at all.
Comment 11 Alex Lourie 2013-09-29 07:20:43 EDT
Allan

It is extremely strange. This file is responsible for jboss working correctly, especially in 3.0.

Could you please make sure they checked file's existence in 3.0? 3.1 and further installations do not need it, so it may not exist on 3.1 and later installations.

Thanks.
Comment 12 Alex Lourie 2013-10-07 11:22:32 EDT
Hey Allan

Any news?

Thanks.
Comment 13 Allan Voss 2013-10-07 11:50:24 EDT
(In reply to Alex Lourie from comment #12)
> Hey Allan
> 
> Any news?
> 
> Thanks.

The sysadmin working this issue is out of the office for the next few days.
Comment 14 Alex Lourie 2013-10-24 07:24:13 EDT
Hey Allan.

This issue is going on for awhile without progress. Do you want to keep it open for a bit longer and get the info from customer or you would prefer closing it and reopen if becomes relevant again?

Thanks.
Comment 17 Alex Lourie 2013-10-27 10:48:57 EDT
@Allan

This is extremely strange. The file web-conf.js is the "bad" one while the backup one is the "good" one. This looks as something done in a wrong order on the customer's system.

With the file contents as in backup the upgrade will work fine. With the content of the web-conf.js it will fail.

I can't explain how web-conf.js became as it is. I suggest closing the bug and reopening if similar issue arises.
Comment 18 Allan Voss 2013-10-27 12:40:46 EDT
Every time the installation would fail, the customer would have rebuild his RHEV-M because rollback would also fail. I assume that the incorrect values in web.conf.js are related to the reinstallation.

Shouldn't rhevm-upgrade be able to handle the situation?
Comment 19 Alex Lourie 2013-10-29 05:26:35 EDT
Allan

Automatic rollback during the 3.0 -> 3.1 upgrade is not working due to substantial architectural changes in jboss, not allowing us to restore the system.

Additionally, we do not touch web-conf.js at all on upgrade, we only read values from it. It seems that during one of the rebuild attempts it was copied over incorrectly, and from that point on the upgrade can not succeed in populating /etc/sysconfig/ovirt-engine correctly.

If we know about a clear flow, where customer had web-conf.js correctly defined prior to upgrade, and the upgrade still failed to populate the final ovirt-engine file, I'd be happy to analyse it. Other than that, I don't see what else can we do in this specific case.
Comment 20 Allan Voss 2013-10-29 12:52:25 EDT
I was thinking along the lines of checking the contents of web-conf.js to ensure that values were in place instead of placeholders before the upgrade occurs, and reporting the error and aborting the upgrade before the upgrade goes far enough to require a rollback.

Note You need to log in before you can comment on or make changes to this bug.