Cloning to dwh. See original bug below. Going to change engine-setup also on dwh to not backup at all if we need to upgrade the database. If we upgrade in-place, it's complex and risky to backup and restore correctly. If we upgrade by copying, which is the default, we can rollback by using the old db, no need to backup either. +++ This bug was initially created as a clone of Bug #1492138 +++ Description of problem: For BZ1366900 I was testing failed upgrade between non-following next minor versions of engine, ie. 4.0 -> 4.2. Rollback was almost ok except original DB service is not configured. engine seems to work with PG 9.5 but I'm not sure this is what we want to keep after rollback. # engine-setup ... [WARNING] This release requires PostgreSQL server 9.5.7 but the engine database is currently hosted on PostgreSQL server 9.2.23 ^^^^^^ This tool can automatically upgrade PostgreSQL. Automatically upgrade? (Yes, No) [Yes]: ^^^ - automatically do everything ... [ INFO ] Upgrading PostgreSQL [ INFO ] PostgreSQL has been successfully upgraded, starting the new instance (rh-postgresql95-postgresql). [ INFO ] Cleaning the previous PostgreSQL data directory [ INFO ] Updating PostgreSQL configuration ... [ INFO ] Backing up database localhost:engine to '/var/lib/ovirt-engine/backups/engine-20170915163010.JtU3v3.dump'. [ INFO ] Creating/refreshing Engine database schema [ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission denied ^^^^^ a simulation of failure [ INFO ] Yum Performing yum transaction rollback ... [ INFO ] Rolling back database schema [ INFO ] Clearing Engine database engine [ INFO ] Restoring Engine database engine ^^^^^^^^^ old DB was restored into "new" PG 9.5 [ INFO ] Restoring file '/var/lib/ovirt-engine/backups/engine-20170915163010.JtU3v3.dump' to database localhost:engine. [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20170915161331-a2uggc.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20170915163303-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of setup failed # systemctl list-unit-files | grep postgres postgresql.service disabled ^^^^^^^^ !! rh-postgresql95-postgresql.service enabled rh-postgresql95-postgresql@.service disabled Version-Release number of selected component (if applicable): ovirt-engine-setup-4.2.0-0.0.master.20170913112412.git2eb3c0a.el7.centos.noarch How reproducible: 100% Steps to Reproduce: 1. have 4.0, add 4.2 repo and yum update ovirt\*setup\* 2. engine-setup 3. when you see in the output that ovirt-engine-dbscripts gets updated, run chmod 000 /usr/share/ovirt-engine/dbscripts/schema.sh to simulate failure Actual results: rollback does not rollback/reconfigured previous PG version used by original version of the engine Expected results: it should rollback fully to previous DB version Additional info: or at least create a documentation for this, thx --- Additional comment from Yaniv Lavi on 2017-10-18 11:25:27 IDT --- We will only support a stepped upgrade. Direct to version upgrade tool will also do automatic steps. --- Additional comment from Jiri Belka on 2017-12-14 18:10:48 IST --- I don't understand how a spec file diff can solve the problem of incorrect rollback. rhevm-4.1.8.2-0.1.el7.noarch -> rhvm-4.2.0.2-0.1.el7.noarch Anyway, the problem still persists - if anything goes wrong during DB upgrade [ INFO ] Creating/refreshing Engine database schema [ ERROR ] Failed to execute stage 'Misc configuration': [Errno 13] Permission denied the rollback did not do its work completely: 1. missing old dbs # ls -l /var/lib/pgsql/data ls: cannot access /var/lib/pgsql/data: No such file or directory 2. bad pg service # systemctl list-unit-files | grep postgres postgresql.service disabled rh-postgresql95-postgresql.service enabled rh-postgresql95-postgresql@.service disabled The rollback should result following env: 1. old dbs should be in original place 2. rh-pg95 should be disabled 3. postgresql should be enabled --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-12-14 18:10:52 IST --- Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. --- Additional comment from Sandro Bonazzola on 2017-12-18 10:29:45 IST --- (In reply to Jiri Belka from comment #3) > rhevm-4.1.8.2-0.1.el7.noarch -> rhvm-4.2.0.2-0.1.el7.noarch Changed summary accordingly. --- Additional comment from Yaniv Kaul on 2018-01-16 13:31:36 IST --- Has anyone looked at why it was reopened? --- Additional comment from Yedidyah Bar David on 2018-02-08 17:35:02 IST --- Adding some random notes about this bug. Most important, notes about the flows we want to handle (and verify): 1. List of relevant events during setup: 1.1. Optional backup of existing databases 1.2. Upgrade databases to postgresql 9.5 1.3. Updating database schema and other changes 2. We should handle/verify each flow, with: 2.1. Success 2.2. Failure before 1.1. 2.3. Failure between 1.1 and 1.2. 2.4. Failure between 1.2 and 1.3 2.5. Failure after 1.3 3. We should handle/verify each of above with the various possible combinations of answers to the relevant questions we ask: 3.1. Upgrade in-place or by copying data files? 3.2. Backup databases? - We currently (also in previous versions) ask this about DWH - About engine we don't, but might decide to ask - It makes sense to not back up if doing the upgrade by copying 3.3. automatically clean up the old data directory on success? - Should not be relevant, but make sure. Definitely relevant with the current code, which commits the transaction immediately on upgrade success, before step 1.3. Not sure about "Has anyone looked at why it was reopened?". Current code only rolls back pg if pg upgrade failed, as it runs in its own transaction. The code in the current pending patch moves this to the main transaction, but suffers some other problems, so not ready yet. --- Additional comment from Yedidyah Bar David on 2018-02-12 10:06:39 IST --- Some more notes: 1. We should make sure we dump DBs with the version they use (9.2) and not the version we upgrade them to (9.5) 2. On rollback, if we did not upgrade in-place, we should stop/disable 9.5 service and start/enable 9.2 service 3. On rollback, if we upgraded in-place, it might be risky/complex to try to restore the backup to the old database (see next comment). So: 3.1. Should probably leave the engine using the 9.5 db 3.2. Should point the user somewhere that explains the situation 3.3. Should start/enable 9.5 service 3.4. Open another bug about how to handle this on the next attempt (upgrading a oVirt 4.1 setup that already uses 9.5 pg). --- Additional comment from Yedidyah Bar David on 2018-02-12 10:10:07 IST --- It's probably risky to try to restore the backup to a 9.2 pg that went through an in-place upgrade, because the data files are not 9.2-compatible anymore. If we do want to do this: 1. Test if there were any other databases in the pg cluster (that we didn't backup). If there aren't any we might continue. 2. Stop/disable all pg services (both old and new) 3. Remove all data directories 4. initdb 5. restore --- Additional comment from Yedidyah Bar David on 2018-02-12 10:27:21 IST --- See also bug 1498351 about the state of an upgrade that moved the db to 9.5 and failed later, thus leaving 4.1 engine. --- Additional comment from Simone Tiraboschi on 2018-02-12 11:15:56 IST --- (In reply to Yedidyah Bar David from comment #8) > Adding some random notes about this bug. Most important, notes about the > flows we want to handle (and verify): > > 1. List of relevant events during setup: > > 1.1. Optional backup of existing databases > > 1.2. Upgrade databases to postgresql 9.5 1.1 (backup of existing databases) currently happens after point 1.2 since the 4.2 engine-setup uses all the pg tools from scl and so they requires a DB already at 9.5. If the user perform a 9.2 -> 9.5 upgrade not in place is neither that relevant since the 9.2 DB is still there untouched but simply in another folder. We have just to re-enable the 9.2 pg service and everything will be fine. If the user decided to perform an in place upgrade instead we cannot easily rollback. A file system level backup could be significantly faster (it doesn't have to reconstruct all the indexes as a restore at sql level has to do) although it will require more space (the same as doing it not in-place although possibly on an external mount point) and the DBMS should be down to be sure that the copy is consistent. --- Additional comment from Yedidyah Bar David on 2018-02-15 17:36:22 IST --- How about deciding that in-place upgrade does not allow rollback at all? So that if you use in-place upgrade, it will be faster and use less space, but engine-setup will not take backups, nor try to rollback if it fails. Users that only want things to run as quickly as possible (e.g. for testing), can use in-place. Users that want backups (and rollback), should either use upgrade-by-copying (not in-place) or take care of backups themselves. This will simplify things a lot. Pushed another patch that does this, didn't verify yet. Yaniv - makes sense?
ok, ovirt-engine-setup-base-4.2.2.1-0.1.el7.noarch after engine-setup fail because of some strange issue with rpm deps everything else was rolled back successfully and works ok. ... 2018-02-23 14:27:38,622+0100 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:85 Yum [u'rhevm-4.1.10. 1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requires rhvm = 4.2.2.1-0.1.el7', u 'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1.10.1-0.1.el7.noarch requires o virt-engine = 4.1.10.1-0.1.el7'] 2018-02-23 14:27:38,623+0100 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in _executeMethod method['method']() File "/usr/share/otopi/plugins/otopi/packagers/yumpackager.py", line 248, in _packages self.processTransaction() File "/usr/share/otopi/plugins/otopi/packagers/yumpackager.py", line 262, in processTransaction if self._miniyum.buildTransaction(): File "/usr/lib/python2.7/site-packages/otopi/miniyum.py", line 920, in buildTransaction raise yum.Errors.YumBaseError(msg) YumBaseError: [u'rhevm-4.1.10.1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requi res rhvm = 4.2.2.1-0.1.el7', u'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1. 10.1-0.1.el7.noarch requires ovirt-engine = 4.1.10.1-0.1.el7'] 2018-02-23 14:27:38,625+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Package installat ion': [u'rhevm-4.1.10.1-0.1.el7.noarch requires rhevm-doc >= 4.0', u'ovirt-engine-4.2.2.1-0.1.el7.noarch requires rhvm = 4.2.2.1-0.1.el7', u'rhevm-4.1.10.1-0.1.el7.noarch requires redhat-support-plugin-rhev >= 4.0', u'rhevm-4.1.10.1-0.1 .el7.noarch requires ovirt-engine = 4.1.10.1-0.1.el7'] ... 2018-02-23 14:27:38,626+0100 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Performing yum tra nsaction rollback Loaded plugins: product-id, versionlock ... 2018-02-23 14:27:38,665+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DWH Engine database Transaction' 2018-02-23 14:27:38,665+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Database Transaction' 2018-02-23 14:27:38,666+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Version Lock Transaction' 2018-02-23 14:27:38,667+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DWH database Transaction' 2018-02-23 14:27:38,667+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'Firewalld Transaction' 2018-02-23 14:27:38,668+0100 DEBUG otopi.transaction transaction.abort:119 aborting 'DBMS Upgrade Transaction' 2018-02-23 14:27:38,668+0100 INFO otopi.plugins.ovirt_engine_setup.ovirt_engine.db.dbmsupgrade postgres.abort:808 Rolling back to the previous PostgreSQL instance (postgresql). 2018-02-23 14:27:38,669+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 stopping service rh-postgresql95-postgresql 2018-02-23 14:27:38,669+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service'), executable='None', cwd='None', env=None 2018-02-23 14:27:39,712+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service'), rc=0 2018-02-23 14:27:39,713+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service') stdout: 2018-02-23 14:27:39,714+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'stop', 'rh-postgresql95-postgresql.service') stderr: 2018-02-23 14:27:39,714+0100 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 starting service postgresql 2018-02-23 14:27:39,715+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'postgresql.service'), executable='None', cwd='None', env=None ... engine=# show data_directory; data_directory --------------------- /var/lib/pgsql/data (1 row) # systemctl is-enabled postgresql enabled # systemctl is-active postgresql active # systemctl is-active ovirt-engine-dwhd inactive # systemctl start ovirt-engine-dwhd # systemctl is-active ovirt-engine-dwhd active
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.