Description of problem: For BZ1366900 I was testing failed upgrade between non-following next minor versions of engine, ie. 4.0 -> 4.2. Rollback was almost ok except original DB service is not configured. engine seems to work with PG 9.5 but I'm not sure this is what we want to keep after rollback. # engine-setup ... [WARNING] This release requires PostgreSQL server 9.5.7 but the engine database is currently hosted on PostgreSQL server 9.2.23 ^^^^^^ This tool can automatically upgrade PostgreSQL. Automatically upgrade? (Yes, No) [Yes]: ^^^ - automatically do everything ... [ INFO ] Upgrading PostgreSQL [ INFO ] PostgreSQL has been successfully upgraded, starting the new instance (rh-postgresql95-postgresql). [ INFO ] Cleaning the previous PostgreSQL data directory [ INFO ] Updating PostgreSQL configuration ... [ INFO ] Backing up database localhost:engine to '/var/lib/ovirt-engine/backups/engine-20170915163010.JtU3v3.dump'. [ INFO ] Creating/refreshing Engine database schema [ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission denied ^^^^^ a simulation of failure [ INFO ] Yum Performing yum transaction rollback ... [ INFO ] Rolling back database schema [ INFO ] Clearing Engine database engine [ INFO ] Restoring Engine database engine ^^^^^^^^^ old DB was restored into "new" PG 9.5 [ INFO ] Restoring file '/var/lib/ovirt-engine/backups/engine-20170915163010.JtU3v3.dump' to database localhost:engine. [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20170915161331-a2uggc.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20170915163303-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of setup failed # systemctl list-unit-files | grep postgres postgresql.service disabled ^^^^^^^^ !! rh-postgresql95-postgresql.service enabled rh-postgresql95-postgresql@.service disabled Version-Release number of selected component (if applicable): ovirt-engine-setup-4.2.0-0.0.master.20170913112412.git2eb3c0a.el7.centos.noarch How reproducible: 100% Steps to Reproduce: 1. have 4.0, add 4.2 repo and yum update ovirt\*setup\* 2. engine-setup 3. when you see in the output that ovirt-engine-dbscripts gets updated, run chmod 000 /usr/share/ovirt-engine/dbscripts/schema.sh to simulate failure Actual results: rollback does not rollback/reconfigured previous PG version used by original version of the engine Expected results: it should rollback fully to previous DB version Additional info: or at least create a documentation for this, thx
We will only support a stepped upgrade. Direct to version upgrade tool will also do automatic steps.
I don't understand how a spec file diff can solve the problem of incorrect rollback. rhevm-4.1.8.2-0.1.el7.noarch -> rhvm-4.2.0.2-0.1.el7.noarch Anyway, the problem still persists - if anything goes wrong during DB upgrade [ INFO ] Creating/refreshing Engine database schema [ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission denied the rollback did not do its work completely: 1. missing old dbs # ls -l /var/lib/pgsql/data ls: cannot access /var/lib/pgsql/data: No such file or directory 2. bad pg service # systemctl list-unit-files | grep postgres postgresql.service disabled rh-postgresql95-postgresql.service enabled rh-postgresql95-postgresql@.service disabled The rollback should result following env: 1. old dbs should be in original place 2. rh-pg95 should be disabled 3. postgresql should be enabled
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
(In reply to Jiri Belka from comment #3) > rhevm-4.1.8.2-0.1.el7.noarch -> rhvm-4.2.0.2-0.1.el7.noarch Changed summary accordingly.
Has anyone looked at why it was reopened?
Adding some random notes about this bug. Most important, notes about the flows we want to handle (and verify): 1. List of relevant events during setup: 1.1. Optional backup of existing databases 1.2. Upgrade databases to postgresql 9.5 1.3. Updating database schema and other changes 2. We should handle/verify each flow, with: 2.1. Success 2.2. Failure before 1.1. 2.3. Failure between 1.1 and 1.2. 2.4. Failure between 1.2 and 1.3 2.5. Failure after 1.3 3. We should handle/verify each of above with the various possible combinations of answers to the relevant questions we ask: 3.1. Upgrade in-place or by copying data files? 3.2. Backup databases? - We currently (also in previous versions) ask this about DWH - About engine we don't, but might decide to ask - It makes sense to not back up if doing the upgrade by copying 3.3. automatically clean up the old data directory on success? - Should not be relevant, but make sure. Definitely relevant with the current code, which commits the transaction immediately on upgrade success, before step 1.3. Not sure about "Has anyone looked at why it was reopened?". Current code only rolls back pg if pg upgrade failed, as it runs in its own transaction. The code in the current pending patch moves this to the main transaction, but suffers some other problems, so not ready yet.
Some more notes: 1. We should make sure we dump DBs with the version they use (9.2) and not the version we upgrade them to (9.5) 2. On rollback, if we did not upgrade in-place, we should stop/disable 9.5 service and start/enable 9.2 service 3. On rollback, if we upgraded in-place, it might be risky/complex to try to restore the backup to the old database (see next comment). So: 3.1. Should probably leave the engine using the 9.5 db 3.2. Should point the user somewhere that explains the situation 3.3. Should start/enable 9.5 service 3.4. Open another bug about how to handle this on the next attempt (upgrading a oVirt 4.1 setup that already uses 9.5 pg).
It's probably risky to try to restore the backup to a 9.2 pg that went through an in-place upgrade, because the data files are not 9.2-compatible anymore. If we do want to do this: 1. Test if there were any other databases in the pg cluster (that we didn't backup). If there aren't any we might continue. 2. Stop/disable all pg services (both old and new) 3. Remove all data directories 4. initdb 5. restore
See also bug 1498351 about the state of an upgrade that moved the db to 9.5 and failed later, thus leaving 4.1 engine.
(In reply to Yedidyah Bar David from comment #8) > Adding some random notes about this bug. Most important, notes about the > flows we want to handle (and verify): > > 1. List of relevant events during setup: > > 1.1. Optional backup of existing databases > > 1.2. Upgrade databases to postgresql 9.5 1.1 (backup of existing databases) currently happens after point 1.2 since the 4.2 engine-setup uses all the pg tools from scl and so they requires a DB already at 9.5. If the user perform a 9.2 -> 9.5 upgrade not in place is neither that relevant since the 9.2 DB is still there untouched but simply in another folder. We have just to re-enable the 9.2 pg service and everything will be fine. If the user decided to perform an in place upgrade instead we cannot easily rollback. A file system level backup could be significantly faster (it doesn't have to reconstruct all the indexes as a restore at sql level has to do) although it will require more space (the same as doing it not in-place although possibly on an external mount point) and the DBMS should be down to be sure that the copy is consistent.
How about deciding that in-place upgrade does not allow rollback at all? So that if you use in-place upgrade, it will be faster and use less space, but engine-setup will not take backups, nor try to rollback if it fails. Users that only want things to run as quickly as possible (e.g. for testing), can use in-place. Users that want backups (and rollback), should either use upgrade-by-copying (not in-place) or take care of backups themselves. This will simplify things a lot. Pushed another patch that does this, didn't verify yet. Yaniv - makes sense?
Talked with Yaniv about this. Summary: 1. We should remove the option of upgrading postgresql in-place. 2. OK to not backup (pg_dump) the databases on upgrade, and rollback to the previous pg version.
ok, ovirt-engine-setup-base-4.2.2.1-0.1.el7.noarch after engine-setup fail because of some strange issue with rpm deps everything else was rolled back successfully and works ok. ... 2018-02-23 14:27:38,622+0100 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:85 Yum [u'rhevm-4.1.10. 1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requires rhvm = 4.2.2.1-0.1.el7', u 'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1.10.1-0.1.el7.noarch requires o virt-engine = 4.1.10.1-0.1.el7'] 2018-02-23 14:27:38,623+0100 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in _executeMethod method['method']() File "/usr/share/otopi/plugins/otopi/packagers/yumpackager.py", line 248, in _packages self.processTransaction() File "/usr/share/otopi/plugins/otopi/packagers/yumpackager.py", line 262, in processTransaction if self._miniyum.buildTransaction(): File "/usr/lib/python2.7/site-packages/otopi/miniyum.py", line 920, in buildTransaction raise yum.Errors.YumBaseError(msg) YumBaseError: [u'rhevm-4.1.10.1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requi res rhvm = 4.2.2.1-0.1.el7', u'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1. 10.1-0.1.el7.noarch requires ovirt-engine = 4.1.10.1-0.1.el7'] 2018-02-23 14:27:38,625+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Package installat ion': [u'rhevm-4.1.10.1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requires rhvm = 4.2.2.1-0.1.el7', u'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1.10.1-0.1 .el7.noarch requires ovirt-engine = 4.1.10.1-0.1.el7'] ... 2018-02-23 14:27:38,626+0100 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Performing yum tra nsaction rollback Loaded plugins: product-id, versionlock ... 2018-02-23 14:27:38,665+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DWH Engine database Transaction' 2018-02-23 14:27:38,665+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Database Transaction' 2018-02-23 14:27:38,666+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Version Lock Transaction' 2018-02-23 14:27:38,667+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DWH database Transaction' 2018-02-23 14:27:38,667+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Firewalld Transaction' 2018-02-23 14:27:38,668+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DBMS Upgrade Transaction' 2018-02-23 14:27:38,668+0100 INFO otopi.plugins.ovirt_engine_setup.ovirt_engine.db.dbmsupgrade postgres.abort:808 Rolling back to the previous PostgreSQL instance (postgresql). 2018-02-23 14:27:38,669+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 stopping service rh-postgresql95-postgresql 2018-02-23 14:27:38,669+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service'), executable='None', cwd='None', env=None 2018-02-23 14:27:39,712+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service'), rc=0 2018-02-23 14:27:39,713+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service') stdout: 2018-02-23 14:27:39,714+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service') stderr: 2018-02-23 14:27:39,714+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 starting service postgresql 2018-02-23 14:27:39,715+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'postgresql.service'), executable='None', cwd='None', env=None ... engine=# show data_directory; data_directory --------------------- /var/lib/pgsql/data (1 row) # systemctl is-enabled postgresql enabled # systemctl is-active postgresql active # systemctl is-active ovirt-engine inactive # systemctl start ovirt-engine 4.1 admin portal is up
(In reply to Jiri Belka from comment #15) > ok, ovirt-engine-setup-base-4.2.2.1-0.1.el7.noarch > > after engine-setup fail because of some strange issue with rpm deps > everything else was rolled back successfully and works ok. OK. Did you also verify rollback where failure happened in other points? See also comment 8. Also, it's not clear if the "strange issue with rpm deps" was deliberate, or another bug. If another bug, please open one. Thanks.
> 1.1. Optional backup of existing databases ^^ there is no backup before db ugprade (iiuc there is no more in-place ugprade) [ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission denied [ INFO ] Yum Performing yum transaction rollback [ ERROR ] Failed to execute stage 'Misc configuration': Command '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to execute [ INFO ] Yum Performing yum transaction rollback ok, all works > 1.2. Upgrade databases to postgresql 9.5 [ INFO ] Upgrading PostgreSQL [ ERROR ] Failed to execute stage 'Misc configuration': Command '/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to execute ok, all works > 1.3. Updating database schema and other changes rpm failure ok, all works > 2. We should handle/verify each flow, with: > > 2.1. Success > > 2.2. Failure before 1.1. > > 2.3. Failure between 1.1 and 1.2. > > 2.4. Failure between 1.2 and 1.3 > > 2.5. Failure after 1.3 that rpm issue > 3. We should handle/verify each of above with the various possible > combinations of answers to the relevant questions we ask: > > 3.1. Upgrade in-place or by copying data files? > > 3.2. Backup databases? > - We currently (also in previous versions) ask this about DWH I don't see any question backup back when doing upgrade from 4.1 to 4.2
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.