Bug 1492138 - Rollback after failed upgrade from 4.1 to 4.2 does not reconfigure original postgresql service
Summary: Rollback after failed upgrade from 4.1 to 4.2 does not reconfigure original p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Setup.Engine
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.2.2
: 4.2.2.1
Assignee: Yedidyah Bar David
QA Contact: Jiri Belka
URL:
Whiteboard:
Depends On:
Blocks: 1546487
TreeView+ depends on / blocked
 
Reported: 2017-09-15 15:01 UTC by Jiri Belka
Modified: 2018-03-29 10:56 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.2.2.1
Doc Type: Bug Fix
Doc Text:
If engine-setup fails and rolls back the installation, it now correctly handles the old and new postgresql services.
Clone Of:
: 1546487 (view as bug list)
Environment:
Last Closed: 2018-03-29 10:56:12 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
ylavi: blocker+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1366900 0 urgent CLOSED [RFE] - Support direct to version upgrade of the manager 2021-02-22 00:41:40 UTC
oVirt gerrit 82860 0 master MERGED packaging: spec: setup: Require at least 4.1.7 engine 2017-11-14 10:33:55 UTC
oVirt gerrit 85543 0 master MERGED packaging: setup: postgres95: Fixes 2018-02-20 09:07:23 UTC
oVirt gerrit 87754 0 master ABANDONED packaging: setup: postgres95: Do not backup/restore on upgrade 2018-02-19 15:24:14 UTC
oVirt gerrit 87807 0 master ABANDONED packaging: setup: postgres95: Do not allow in-place upgrade 2018-02-19 15:24:18 UTC
oVirt gerrit 87899 0 ovirt-engine-4.2 MERGED packaging: setup: postgres95: Fixes 2018-02-20 09:21:57 UTC

Internal Links: 1366900

Description Jiri Belka 2017-09-15 15:01:13 UTC
Description of problem:

For BZ1366900 I was testing failed upgrade between non-following next minor versions of engine, ie. 4.0 -> 4.2.

Rollback was almost ok except original DB service is not configured. engine seems to work with PG 9.5 but I'm not sure this is what we want to keep after rollback.

# engine-setup
...
[WARNING] This release requires PostgreSQL server 9.5.7 but the engine database is currently hosted on PostgreSQL server 9.2.23

                                         ^^^^^^

          This tool can automatically upgrade PostgreSQL. Automatically upgrade? (Yes, No) [Yes]: 

           ^^^ - automatically do everything
...
[ INFO  ] Upgrading PostgreSQL
[ INFO  ] PostgreSQL has been successfully upgraded, starting the new instance (rh-postgresql95-postgresql).
[ INFO  ] Cleaning the previous PostgreSQL data directory
[ INFO  ] Updating PostgreSQL configuration
...
[ INFO  ] Backing up database localhost:engine to '/var/lib/ovirt-engine/backups/engine-20170915163010.JtU3v3.dump'.
[ INFO  ] Creating/refreshing Engine database schema
[ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission denied

  ^^^^^ a simulation of failure

[ INFO  ] Yum Performing yum transaction rollback
...
[ INFO  ] Rolling back database schema
[ INFO  ] Clearing Engine database engine
[ INFO  ] Restoring Engine database engine

          ^^^^^^^^^ old DB was restored into "new" PG 9.5

[ INFO  ] Restoring file '/var/lib/ovirt-engine/backups/engine-20170915163010.JtU3v3.dump' to database localhost:engine.
[ INFO  ] Stage: Clean up
          Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20170915161331-a2uggc.log
[ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20170915163303-setup.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Execution of setup failed


# systemctl list-unit-files | grep postgres
postgresql.service                            disabled

                                              ^^^^^^^^ !!

rh-postgresql95-postgresql.service            enabled 
rh-postgresql95-postgresql@.service           disabled


Version-Release number of selected component (if applicable):
ovirt-engine-setup-4.2.0-0.0.master.20170913112412.git2eb3c0a.el7.centos.noarch

How reproducible:
100%

Steps to Reproduce:
1. have 4.0, add 4.2 repo and yum update ovirt\*setup\*
2. engine-setup
3. when you see in the output that ovirt-engine-dbscripts gets updated,
   run chmod 000 /usr/share/ovirt-engine/dbscripts/schema.sh to simulate
   failure

Actual results:
rollback does not rollback/reconfigured previous PG version used by original
version of the engine

Expected results:
it should rollback fully to previous DB version

Additional info:
or at least create a documentation for this, thx

Comment 2 Yaniv Lavi 2017-10-18 08:25:27 UTC
We will only support a stepped upgrade. Direct to version upgrade tool will also do automatic steps.

Comment 3 Jiri Belka 2017-12-14 16:10:48 UTC
I don't understand how a spec file diff can solve the problem of incorrect rollback.

rhevm-4.1.8.2-0.1.el7.noarch -> rhvm-4.2.0.2-0.1.el7.noarch

Anyway, the problem still persists - if anything goes wrong during DB upgrade

  [ INFO  ] Creating/refreshing Engine database schema
  [ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission 
            denied

the rollback did not do its work completely:

1. missing old dbs

# ls -l /var/lib/pgsql/data
ls: cannot access /var/lib/pgsql/data: No such file or directory

2. bad pg service

# systemctl list-unit-files | grep postgres
postgresql.service                            disabled
rh-postgresql95-postgresql.service            enabled 
rh-postgresql95-postgresql@.service           disabled

The rollback should result following env:

1. old dbs should be in original place
2. rh-pg95 should be disabled
3. postgresql should be enabled

Comment 4 Red Hat Bugzilla Rules Engine 2017-12-14 16:10:52 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 5 Sandro Bonazzola 2017-12-18 08:29:45 UTC
(In reply to Jiri Belka from comment #3)

> rhevm-4.1.8.2-0.1.el7.noarch -> rhvm-4.2.0.2-0.1.el7.noarch

Changed summary accordingly.

Comment 7 Yaniv Kaul 2018-01-16 11:31:36 UTC
Has anyone looked at why it was reopened?

Comment 8 Yedidyah Bar David 2018-02-08 15:35:02 UTC
Adding some random notes about this bug. Most important, notes about the flows we want to handle (and verify):

1. List of relevant events during setup:

1.1. Optional backup of existing databases

1.2. Upgrade databases to postgresql 9.5

1.3. Updating database schema and other changes

2. We should handle/verify each flow, with:

2.1. Success

2.2. Failure before 1.1.

2.3. Failure between 1.1 and 1.2.

2.4. Failure between 1.2 and 1.3

2.5. Failure after 1.3

3. We should handle/verify each of above with the various possible combinations of answers to the relevant questions we ask:

3.1. Upgrade in-place or by copying data files?

3.2. Backup databases?
- We currently (also in previous versions) ask this about DWH
- About engine we don't, but might decide to ask
- It makes sense to not back up if doing the upgrade by copying

3.3. automatically clean up the old data directory on success?
- Should not be relevant, but make sure. Definitely relevant with the current code, which commits the transaction immediately on upgrade success, before step 1.3.

Not sure about "Has anyone looked at why it was reopened?". Current code only rolls back pg if pg upgrade failed, as it runs in its own transaction. The code in the current pending patch moves this to the main transaction, but suffers some other problems, so not ready yet.

Comment 9 Yedidyah Bar David 2018-02-12 08:06:39 UTC
Some more notes:

1. We should make sure we dump DBs with the version they use (9.2) and not the version we upgrade them to (9.5)

2. On rollback, if we did not upgrade in-place, we should stop/disable 9.5 service and start/enable 9.2 service

3. On rollback, if we upgraded in-place, it might be risky/complex to try to restore the backup to the old database (see next comment). So:

3.1. Should probably leave the engine using the 9.5 db

3.2. Should point the user somewhere that explains the situation

3.3. Should start/enable 9.5 service

3.4. Open another bug about how to handle this on the next attempt (upgrading a oVirt 4.1 setup that already uses 9.5 pg).

Comment 10 Yedidyah Bar David 2018-02-12 08:10:07 UTC
It's probably risky to try to restore the backup to a 9.2 pg that went through an in-place upgrade, because the data files are not 9.2-compatible anymore. If we do want to do this:

1. Test if there were any other databases in the pg cluster (that we didn't backup). If there aren't any we might continue.

2. Stop/disable all pg services (both old and new)

3. Remove all data directories

4. initdb

5. restore

Comment 11 Yedidyah Bar David 2018-02-12 08:27:21 UTC
See also bug 1498351 about the state of an upgrade that moved the db to 9.5 and failed later, thus leaving 4.1 engine.

Comment 12 Simone Tiraboschi 2018-02-12 09:15:56 UTC
(In reply to Yedidyah Bar David from comment #8)
> Adding some random notes about this bug. Most important, notes about the
> flows we want to handle (and verify):
> 
> 1. List of relevant events during setup:
> 
> 1.1. Optional backup of existing databases
> 
> 1.2. Upgrade databases to postgresql 9.5

1.1 (backup of existing databases) currently happens after point 1.2 since the 4.2 engine-setup uses all the pg tools from scl and so they requires a DB already at 9.5.

If the user perform a 9.2 -> 9.5 upgrade not in place is neither that relevant since the 9.2 DB is still there untouched but simply in another folder.
We have just to re-enable the 9.2 pg service and everything will be fine.

If the user decided to perform an in place upgrade instead we cannot easily rollback.
A file system level backup could be significantly faster (it doesn't have to reconstruct all the indexes as a restore at sql level has to do) although it will require more space (the same as doing it not in-place although possibly on an external mount point) and the DBMS should be down to be sure that the copy is consistent.

Comment 13 Yedidyah Bar David 2018-02-15 15:36:22 UTC
How about deciding that in-place upgrade does not allow rollback at all?
So that if you use in-place upgrade, it will be faster and use less space, but engine-setup will not take backups, nor try to rollback if it fails.

Users that only want things to run as quickly as possible (e.g. for testing), can use in-place.

Users that want backups (and rollback), should either use upgrade-by-copying (not in-place) or take care of backups themselves.

This will simplify things a lot. Pushed another patch that does this, didn't verify yet.

Yaniv - makes sense?

Comment 14 Yedidyah Bar David 2018-02-18 10:18:30 UTC
Talked with Yaniv about this. Summary:

1. We should remove the option of upgrading postgresql in-place.
2. OK to not backup (pg_dump) the databases on upgrade, and rollback to the previous pg version.

Comment 15 Jiri Belka 2018-02-23 14:00:18 UTC
ok, ovirt-engine-setup-base-4.2.2.1-0.1.el7.noarch

after engine-setup fail because of some strange issue with rpm deps everything else was rolled back successfully and works ok.


...
2018-02-23 14:27:38,622+0100 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:85 Yum [u'rhevm-4.1.10.
1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requires rhvm = 4.2.2.1-0.1.el7', u
'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1.10.1-0.1.el7.noarch requires o
virt-engine = 4.1.10.1-0.1.el7']
2018-02-23 14:27:38,623+0100 DEBUG otopi.context context._executeMethod:143 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in _executeMethod
    method['method']()
  File "/usr/share/otopi/plugins/otopi/packagers/yumpackager.py", line 248, in _packages
    self.processTransaction()
  File "/usr/share/otopi/plugins/otopi/packagers/yumpackager.py", line 262, in processTransaction
    if self._miniyum.buildTransaction():
  File "/usr/lib/python2.7/site-packages/otopi/miniyum.py", line 920, in buildTransaction
    raise yum.Errors.YumBaseError(msg)
YumBaseError: [u'rhevm-4.1.10.1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requi
res rhvm = 4.2.2.1-0.1.el7', u'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1.
10.1-0.1.el7.noarch requires ovirt-engine = 4.1.10.1-0.1.el7']
2018-02-23 14:27:38,625+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Package installat
ion': [u'rhevm-4.1.10.1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requires rhvm
 = 4.2.2.1-0.1.el7', u'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1.10.1-0.1
.el7.noarch requires ovirt-engine = 4.1.10.1-0.1.el7']
...

2018-02-23 14:27:38,626+0100 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Performing yum tra
nsaction rollback
Loaded plugins: product-id, versionlock
...
2018-02-23 14:27:38,665+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DWH Engine database Transaction'
2018-02-23 14:27:38,665+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Database Transaction'
2018-02-23 14:27:38,666+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Version Lock Transaction'
2018-02-23 14:27:38,667+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DWH database Transaction'
2018-02-23 14:27:38,667+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Firewalld Transaction'
2018-02-23 14:27:38,668+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DBMS Upgrade Transaction'
2018-02-23 14:27:38,668+0100 INFO otopi.plugins.ovirt_engine_setup.ovirt_engine.db.dbmsupgrade postgres.abort:808 Rolling back to the previous PostgreSQL instance (postgresql).
2018-02-23 14:27:38,669+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 stopping service rh-postgresql95-postgresql
2018-02-23 14:27:38,669+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service'), executable='None', cwd='None', env=None
2018-02-23 14:27:39,712+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service'), rc=0
2018-02-23 14:27:39,713+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service') stdout:


2018-02-23 14:27:39,714+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service') stderr:


2018-02-23 14:27:39,714+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 starting service postgresql
2018-02-23 14:27:39,715+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'postgresql.service'), executable='None', cwd='None', env=None
...

engine=# show data_directory;
   data_directory    
---------------------
 /var/lib/pgsql/data
(1 row)

# systemctl is-enabled postgresql
enabled
# systemctl is-active postgresql
active
# systemctl is-active ovirt-engine
inactive
# systemctl start ovirt-engine

4.1 admin portal is up

Comment 16 Yedidyah Bar David 2018-02-25 07:04:54 UTC
(In reply to Jiri Belka from comment #15)
> ok, ovirt-engine-setup-base-4.2.2.1-0.1.el7.noarch
> 
> after engine-setup fail because of some strange issue with rpm deps
> everything else was rolled back successfully and works ok.

OK.

Did you also verify rollback where failure happened in other points? See also comment 8.

Also, it's not clear if the "strange issue with rpm deps" was deliberate, or another bug. If another bug, please open one. Thanks.

Comment 17 Jiri Belka 2018-02-26 11:28:02 UTC
> 1.1. Optional backup of existing databases

^^ there is no backup before db ugprade (iiuc there is no more in-place ugprade)

[ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission denied
[ INFO  ] Yum Performing yum transaction rollback


[ ERROR ] Failed to execute stage 'Misc configuration': Command '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to execute
[ INFO  ] Yum Performing yum transaction rollback

ok, all works

> 1.2. Upgrade databases to postgresql 9.5

[ INFO  ] Upgrading PostgreSQL
[ ERROR ] Failed to execute stage 'Misc configuration': Command '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to execute

ok, all works

> 1.3. Updating database schema and other changes

rpm failure

ok, all works

> 2. We should handle/verify each flow, with:
> 
> 2.1. Success
> 
> 2.2. Failure before 1.1.
> 
> 2.3. Failure between 1.1 and 1.2.
> 
> 2.4. Failure between 1.2 and 1.3
> 
> 2.5. Failure after 1.3

that rpm issue

> 3. We should handle/verify each of above with the various possible
> combinations of answers to the relevant questions we ask:
> 
> 3.1. Upgrade in-place or by copying data files?
> 
> 3.2. Backup databases?
> - We currently (also in previous versions) ask this about DWH

I don't see any question backup back when doing upgrade from 4.1 to 4.2

Comment 18 Sandro Bonazzola 2018-03-29 10:56:12 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.