Bug 1408493 - A 3.6 engine fails after updating setup to 4.0 with: Error: cannot find 'getstring'
Summary: A 3.6 engine fails after updating setup to 4.0 with: Error: cannot find 'gets...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Python Library
Version: 4.0.5.5
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Yedidyah Bar David
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-23 18:53 UTC by Robert Story
Modified: 2017-05-11 09:23 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-25 07:51:23 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: blocker+


Attachments (Terms of Use)
setup log (467.31 KB, text/plain)
2017-01-04 14:33 UTC, rstory
no flags Details
yum.log (9.68 KB, text/plain)
2017-01-04 14:34 UTC, rstory
no flags Details
engine log (16.91 KB, text/plain)
2017-01-04 17:31 UTC, rstory
no flags Details
server log (108.04 KB, text/plain)
2017-01-04 17:31 UTC, rstory
no flags Details

Description Robert Story 2016-12-23 18:53:41 UTC
Description of problem:
engine-setup installs new release rpms before checking for cluster compatibility. If a cluster wasn't updated, the setup fails, but since engine rpms have been updated, the engine can't be started so that the cluster level can be fixed.

Version-Release number of selected component (if applicable):
ovirt-engine-setup-4.0.5.5-1.el7.centos.noarch

How reproducible:
Only tried it once, assume 100%

Steps to Reproduce:
1. start with ovirt 3.6 with at least 1 cluster set to 3.5
2. follow documented updgrade procedure
3.

Actual results:
rpms installed before cluster check

Expected results:
cluster check before installing rpm and nuking previous working engine.

Additional info:

Comment 1 Sandro Bonazzola 2017-01-02 07:51:07 UTC
(In reply to Robert Story from comment #0)
> Description of problem:
> engine-setup installs new release rpms before checking for cluster
> compatibility. If a cluster wasn't updated, the setup fails, but since
> engine rpms have been updated, the engine can't be started so that the
> cluster level can be fixed.

if setup fails, package transaction should be rolled back by the setup process.
If this is not happening, we have a regression:
Rafael please investigate: if rollback is broken please open a separate bug.

About the cluster checking, it should be done in validation stage before package update.

Robert, can you please attach logs from engine-setup?

> 
> Version-Release number of selected component (if applicable):
> ovirt-engine-setup-4.0.5.5-1.el7.centos.noarch
> 
> How reproducible:
> Only tried it once, assume 100%
> 
> Steps to Reproduce:
> 1. start with ovirt 3.6 with at least 1 cluster set to 3.5
> 2. follow documented updgrade procedure
> 3.
> 
> Actual results:
> rpms installed before cluster check
> 
> Expected results:
> cluster check before installing rpm and nuking previous working engine.
> 
> Additional info:

Comment 2 Red Hat Bugzilla Rules Engine 2017-01-02 07:51:14 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 rstory 2017-01-04 14:33:34 UTC
Created attachment 1237191 [details]
setup log

initial failed setup log

Comment 4 rstory 2017-01-04 14:34:41 UTC
Created attachment 1237192 [details]
yum.log

yum.log showing that no rollback occurred by setup (12/23). Did it manually on 12/26.

Comment 5 rstory 2017-01-04 14:37:56 UTC
Logs attached. Rollback did not happen, as you can see in my yum log. I was able to use yum history to roll it back a few days later and start the engine.

Turns out is wasn't the cluster compatibility that I missed, but the datacenter compatibility. I fixed that, and the the next setup run succeeded.

Comment 6 Yedidyah Bar David 2017-01-04 15:23:32 UTC
From the attached logs:

yum:

Dec 23 13:23:00 Installed: ovirt-release40-4.0.5-2.noarch
- Likely the first relevant line for this bug

Dec 23 13:24:21 Updated: otopi-1.5.2-1.el7.centos.noarch
...
Dec 23 13:25:07 Updated: otopi-java-1.5.2-1.el7.centos.noarch
- Likely a result of updating the setup packages. Engine packages (e.g. ovirt-engine-backend) were not updated.

engine-setup:

2016-12-23 13:26:14 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
- First line, so it started at 13:26:14
...
- Many more lines, with lots of output, including questions and answers

2016-12-23 13:27:40 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Configure Data Warehouse on this host (Yes, No) [Yes]: 
- So this question was asked at 13:27:40

yum.log:

Dec 26 13:28:33 Installed: otopi-1.4.2-1.el7.centos.noarch
Dec 26 13:28:33 Installed: ovirt-engine-lib-3.6.7.5-1.el7.centos.noarch
Dec 26 13:28:35 Installed: ovirt-engine-setup-base-3.6.7.5-1.el7.centos.noarch
...
Dec 26 13:28:41 Erased: ovirt-engine-dwh-4.0.5-1.el7.centos.noarch
Dec 26 13:28:44 Erased: ovirt-engine-dwh-setup-4.0.5-1.el7.centos.noarch
...
Dec 26 13:28:54 Erased: 1:java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
Dec 26 13:28:55 Erased: 1:java-1.8.0-openjdk-headless-1.8.0.111-2.b15.el7_3.x86_64
- So in the middle of running engine-setup, you undid the yum transaction updating the setup packages. Or at least it seems so. Why?

engine-setup:

2016-12-23 13:29:10 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
2016-12-23 13:29:10 DEBUG otopi.context context.dumpEnvironment:770 ENV OVESETUP_DWH_CONFIG/dwhServiceStopNeeded=bool:'True'
2016-12-23 13:29:10 DEBUG otopi.context context.dumpEnvironment:770 ENV OVESETUP_DWH_CORE/enable=bool:'True'
2016-12-23 13:29:10 DEBUG otopi.context context.dumpEnvironment:774 ENVIRONMENT DUMP - END
- and then continued engine-setup, accepting its default answer 'Yes' to the above question.

(In reply to Robert Story from comment #0)
> Description of problem:
> Steps to Reproduce:
> 1. start with ovirt 3.6 with at least 1 cluster set to 3.5
> 2. follow documented updgrade procedure
> 3.
> 
> Actual results:
> rpms installed before cluster check

Indeed "rpms installed before cluster check", by following the upgrade procedure, which says to install/update some packages. But this is before engine-setup runs. So if you claim that this is a bug, it's a doc bug.

> 
> Expected results:
> cluster check before installing rpm

How? The old code does not know that you want to upgrade, and even if it did, it does not know what compatibility levels the new code supports. You must update at least the setup packages, these do know and can check.

> and nuking previous working engine.

Did it? It should not.

Please explain what you want to do and what you think does not work well.

If the above was intentional (running 'yum history undo' in the middle of engine-setup), then it's not supported. If it was a mistake, please try again from scratch. Thanks!

Comment 7 rstory 2017-01-04 16:27:17 UTC
(In reply to Yedidyah Bar David from comment #6)
> > Expected results:
> > cluster check before installing rpm
> 
> How? The old code does not know that you want to upgrade, and even if it
> did, it does not know what compatibility levels the new code supports. You
> must update at least the setup packages, these do know and can check.

Ok, so here's where the confusion originated. Obviously the setup rpm needs to be updated. But when I ran engine-setup, it did an 'update' check and presented a long list of things to be updated (see setup log, Setup has found updates for some packages: at 2016-12-23 13:29:51). When setup failed and I couldn't restart the engine afterwards, I assume all those rpms had been installed and left me without a working engine. Per the yum log, apparently those packages were not actually installed, so something else must have broken the engine.

> > and nuking previous working engine.
> 
> Did it? It should not.

It did. When setup failed, engine was stopped, and after the compatibility error from setup, I could not start the engine again to fix it. From /var/log/messages:

Dec 23 13:33:33 ovirt-engine systemd: Starting oVirt Engine...
Dec 23 13:33:34 ovirt-engine journal: 2016-12-23 13:33:34,466 ovirt-engine: ERROR run:532 Error: cannot find 'getstring'
Dec 23 13:33:34 ovirt-engine systemd: ovirt-engine.service: main process exited, code=exited, status=1/FAILURE
Dec 23 13:33:34 ovirt-engine systemd: Failed to start oVirt Engine.
Dec 23 13:33:34 ovirt-engine systemd: Unit ovirt-engine.service entered failed state.
Dec 23 13:33:34 ovirt-engine systemd: ovirt-engine.service failed.


> If the above was intentional (running 'yum history undo' in the middle of
> engine-setup), then it's not supported. If it was a mistake, please try
> again from scratch. Thanks!

yum history undo wasn't in the middle of engine setup, it was 3 days after the setup failed. It was the only way I could find to get the engine to start again so that I could change the data center compatibility so the next run of engine-setup could proceed.

Comment 8 Derek Atkins 2017-01-04 16:43:13 UTC
(In reply to rstory from comment #7)
> (In reply to Yedidyah Bar David from comment #6)

> > If the above was intentional (running 'yum history undo' in the middle of
> > engine-setup), then it's not supported. If it was a mistake, please try
> > again from scratch. Thanks!
> 
> yum history undo wasn't in the middle of engine setup, it was 3 days after
> the setup failed. It was the only way I could find to get the engine to
> start again so that I could change the data center compatibility so the next
> run of engine-setup could proceed.

Just to point out the obvious, the yum.log dates are Dec 26.  The engine log issues are run Dec 23.  C.f.:

> > yum.log
> > 
> > Dec 26 13:28:33 Installed: otopi-1.4.2-1.el7.centos.noarch
> > Dec 26 13:28:33 Installed: ovirt-engine-lib-3.6.7.5-1.el7.centos.noarch
> > Dec 26 13:28:35 Installed: ovirt-engine-setup-base-3.6.7.5-1.el7.centos.noarch
> > ...
> > Dec 26 13:28:41 Erased: ovirt-engine-dwh-4.0.5-1.el7.centos.noarch
> > Dec 26 13:28:44 Erased: ovirt-engine-dwh-setup-4.0.5-1.el7.centos.noarch
> > ...
> > Dec 26 13:28:54 Erased: 1:java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
> > Dec 26 13:28:55 Erased: 1:java-1.8.0-openjdk-headless-1.8.0.111-2.b15.el7_3.x86_64
> > - So in the middle of running engine-setup, you undid the yum transaction updating the setup packages. Or at least it seems so. Why?

Comment 9 Simone Tiraboschi 2017-01-04 17:04:18 UTC
I suspect that the issue is here:
Dec 23 13:24:49 Updated: ovirt-engine-lib-4.0.5.5-1.el7.centos.noarch
Dec 23 13:24:51 Updated: ovirt-engine-setup-base-4.0.5.5-1.el7.centos.noarch
Dec 23 13:24:53 Updated: ovirt-engine-setup-plugin-ovirt-engine-common-4.0.5.5-1.el7.centos.noarch
Dec 23 13:24:54 Installed: ovirt-engine-dwh-setup-4.0.5-1.el7.centos.noarch
Dec 23 13:24:56 Installed: ovirt-engine-dwh-4.0.5-1.el7.centos.noarch

ovirt-engine-dwh (and not just ovirt-engine-dwh-setup) got installed when you was still with the engine 3.6.

At that point you had:
ovirt-engine-setup-base from 4.0.5
ovirt-engine-dwh-setup from 4.0.5
but also ovirt-engine-dwh from 4.0.5
while the engine was still at 3.6.7

then you run engine-setup and it failed due to the cluster compatibility level issue; it didn't rolled back anything since it didn't updated any package.

But at this point your 3.6 engine wasn't able to start anymore

Than you manually rolled back you latest yum transaction which deleted
Dec 26 13:28:41 Erased: ovirt-engine-dwh-4.0.5-1.el7.centos.noarch
Dec 26 13:28:44 Erased: ovirt-engine-dwh-setup-4.0.5-1.el7.centos.noarch
Dec 26 13:28:45 Erased: jdom-1.1.3-6.el7.noarch
Dec 26 13:28:46 Erased: jaxen-1.1.3-11.el7.noarch
Dec 26 13:28:47 Erased: dom4j-1.6.1-20.el7.noarch
Dec 26 13:28:47 Erased: 1:msv-msv-2013.5.1-7.el7.noarch
Dec 26 13:28:47 Erased: 1:msv-xsdlib-2013.5.1-7.el7.noarch
Dec 26 13:28:48 Erased: ws-jaxme-0.5.2-10.el7.noarch
Dec 26 13:28:48 Erased: antlr-tool-2.7.7-30.el7.noarch
Dec 26 13:28:49 Erased: 1:hsqldb-1.8.1.3-14.el7.noarch
Dec 26 13:28:50 Erased: 1:isorelax-0-0.15.release20050331.el7.noarch
Dec 26 13:28:50 Erased: xpp3-1.1.3.8-11.el7.noarch
Dec 26 13:28:51 Erased: apache-commons-collections-3.2.1-22.el7_2.noarch
Dec 26 13:28:51 Erased: postgresql-jdbc-9.2.1002-5.el7.noarch
Dec 26 13:28:51 Erased: ovirt-imageio-proxy-setup-0.4.0-0.201608310602.gita9b573b.el7.centos.noarch
Dec 26 13:28:54 Erased: relaxngDatatype-1.0-11.el7.noarch
Dec 26 13:28:54 Erased: bea-stax-1.2.0-9.el7.noarch
Dec 26 13:28:54 Erased: 1:java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
Dec 26 13:28:55 Erased: 1:java-1.8.0-openjdk-headless-1.8.0.111-2.b15.el7_3.x86_64

and so you was able again to start your 3.6 engine.

We need to investigate which of that packages will prevent engine 3.6.7 to start; ovirt-engine-dwh-4.0.5 is a good candidate.

Comment 10 Simone Tiraboschi 2017-01-04 17:08:39 UTC
Could you please share engine.log and server.log from the time-frame where your 3.6 engine wasn't able to start (about 2016-12-23 13:35) ?

Comment 11 rstory 2017-01-04 17:24:09 UTC
(In reply to Simone Tiraboschi from comment #9)
> I suspect that the issue is here:
> Dec 23 13:24:49 Updated: ovirt-engine-lib-4.0.5.5-1.el7.centos.noarch
> Dec 23 13:24:51 Updated: ovirt-engine-setup-base-4.0.5.5-1.el7.centos.noarch
> Dec 23 13:24:53 Updated:
> ovirt-engine-setup-plugin-ovirt-engine-common-4.0.5.5-1.el7.centos.noarch
> Dec 23 13:24:54 Installed: ovirt-engine-dwh-setup-4.0.5-1.el7.centos.noarch
> Dec 23 13:24:56 Installed: ovirt-engine-dwh-4.0.5-1.el7.centos.noarch
> 
> ovirt-engine-dwh (and not just ovirt-engine-dwh-setup) got installed when
> you was still with the engine 3.6.

I checked my command history, and I ran 'yum update ovirt-engine-setup\*' per the docs...

Comment 12 rstory 2017-01-04 17:31:22 UTC
Created attachment 1237251 [details]
engine log

Comment 13 rstory 2017-01-04 17:31:59 UTC
Created attachment 1237252 [details]
server log

Comment 14 rstory 2017-01-04 17:35:14 UTC
(In reply to Derek Atkins from comment #8)
> Just to point out the obvious, the yum.log dates are Dec 26.  The engine log
> issues are run Dec 23.  C.f.:
> 
> > > yum.log
> > > 
> > > Dec 26 13:28:33 Installed: otopi-1.4.2-1.el7.centos.noarch

The yum log starts on 12/23. The 12/26  install was after 12/23 setup failed and I rolled back to start over.

Dec 23 13:24:21 Updated: otopi-1.5.2-1.el7.centos.noarch
...
Dec 23 13:24:49 Updated: ovirt-engine-lib-4.0.5.5-1.el7.centos.noarch
Dec 23 13:24:51 Updated: ovirt-engine-setup-base-4.0.5.5-1.el7.centos.noarch
Dec 23 13:24:53 Updated: ovirt-engine-setup-plugin-ovirt-engine-common-4.0.5.5-1.el7.centos.noarch
Dec 23 13:24:54 Installed: ovirt-engine-dwh-setup-4.0.5-1.el7.centos.noarch
Dec 23 13:24:56 Installed: ovirt-engine-dwh-4.0.5-1.el7.centos.noarch

Comment 15 Simone Tiraboschi 2017-01-04 19:50:20 UTC
The issue was in the connection with the DB, also
Dec 26 13:28:51 Erased: postgresql-jdbc-9.2.1002-5.el7.noarch
is a good candidate

Comment 16 rstory 2017-01-04 20:15:31 UTC
(In reply to Simone Tiraboschi from comment #15)
> The issue was in the connection with the DB, also
> Dec 26 13:28:51 Erased: postgresql-jdbc-9.2.1002-5.el7.noarch
> is a good candidate

Note the date - 12/26, 3 days later. This is when I did the rollback to 3.6 so I could get the engine started to fix data center compat.

Comment 17 Yedidyah Bar David 2017-01-05 07:21:05 UTC
(In reply to rstory from comment #7)
> It did. When setup failed, engine was stopped, and after the compatibility
> error from setup, I could not start the engine again to fix it. From
> /var/log/messages:
> 
> Dec 23 13:33:33 ovirt-engine systemd: Starting oVirt Engine...
> Dec 23 13:33:34 ovirt-engine journal: 2016-12-23 13:33:34,466 ovirt-engine:
> ERROR run:532 Error: cannot find 'getstring'

This rings a bell for me. So I did a a quick search, and found it in bug 1323580 comment 1.

Seems like a regression in ovirt-engine-lib, caused by removing the function getstring as part of bug 1267508. This function is indeed not needed after the move to jinja2, in 4.0, but breaks 3.6.

Changing the summary and other fields. Thanks for the report!

However, not sure it's very important/urgent to solve, as I assume most users already migrated to 4.0, and it affects a rather specific flow.

Comment 18 Sandro Bonazzola 2017-01-25 07:51:23 UTC
Closing won't fix, since no 4.0.7 release will happen and not a critical issue according to comment #17


Note You need to log in before you can comment on or make changes to this bug.