Description of problem: Our brq-setup is long-running env, iirc engine deployment was done before 3.3 with then read-only user in reports. Then it was migrated to SHE and now we are migrating to 4.0. Unfortunatelly hosted-engine --upgrade-appliance fails when restoring dwh db inside appliance OS... ~~~ ... |- Provisioning PostgreSQL users/databases: |- - user 'engine', database 'engine' |- - user 'engine_history', database 'ovirt_engine_history' |- Restoring: |- - Engine database 'engine' |- - Cleaning up temporary tables in engine database 'engine' |- - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database |- ------------------------------------------------------------------------------ |- Please note: |- The engine database was backed up at 2016-08-19 14:30:20.000000000 -0400 . |- Objects that were added, removed or changed after this date, such as virtual |- machines, disks, etc., are missing in the engine, and will probably require |- recovery or recreation. |- ------------------------------------------------------------------------------ |- - DWH database 'ovirt_engine_history' |- FATAL: Errors while restoring database ovirt_engine_history |- HE_APPLIANCE_ENGINE_RESTORE_FAIL [ ERROR ] Engine backup restore failed on the appliance [ ERROR ] Failed to execute stage 'Closing up': engine-backup failed restoring the engine backup on the appliance Please check its log on the appliance. [ INFO ] Stage: Clean up [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine upgrade failed: this system is not reliable, you can use --rollback-upgrade option to recover the engine VM disk from a backup Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160819203422-skz2jm.log ~~~ In engine-restore.log I see: ~~~ ... 2016-08-19 15:20:50 2620: pg_cmd running: pg_restore -w -U engine -h localhost -p 5432 -d engine -j 2 /tmp/engine-backup.cVuqEpikwA/db/engine_backup.db pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 3181; 2612 2618669 PROCEDURAL LANGUAGE plpgsql engine pg_restore: [archiver (db)] could not execute query: ERROR: language "plpgsql" already exists Command was: CREATE PROCEDURAL LANGUAGE plpgsql; ... pg_restore: [archiver (db)] could not execute query: ERROR: role "nasrouser" does not exist Command was: REVOKE ALL ON TABLE calendar FROM PUBLIC; REVOKE ALL ON TABLE calendar FROM engine_history; GRANT ALL ON TABLE calendar TO e... pg_restore: [archiver (db)] Error from TOC entry 4135; 0 0 ACL cluster_configuration engine_history pg_restore: [archiver (db)] could not execute query: ERROR: role "nasrouser" does not exist Command was: REVOKE ALL ON TABLE cluster_configuration FROM PUBLIC; REVOKE ALL ON TABLE cluster_configuration FROM engine_history; GRANT ... ... ~~~ IIRC there used to be a bug about 'nasrouser' causing restore fail. https://bugzilla.redhat.com/show_bug.cgi?id=1217402 It seems it is related to --restore-permissions in cloud-init.py: ~~~ # sed -n '841,867p' /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py engine_restore = ( ' - engine-backup --mode=restore --file={backup_file}' ' --log=engine_restore.log --restore-permissions' ' --provision-db {p_dwh_db} {p_reports_db}' ' 1>{port}' ' 2>&1\n' ' - if [ $? -eq 0 ];' ' then echo "{success_string}" >{port};' ' else echo "{fail_string}" >{port};' ' fi\n' ).format( backup_file=self.environment[ ohostedcons.Upgrade.BACKUP_FILE ], p_dwh_db='--provision-dwh-db' if self.environment[ ohostedcons.Upgrade.RESTORE_DWH ] else '', p_reports_db='--provision-reports-db' if self.environment[ ohostedcons.Upgrade.RESTORE_REPORTS ] else '', port=( ohostedcons.Const.VIRTIO_PORTS_PATH + ohostedcons.Const.OVIRT_HE_CHANNEL_NAME ), success_string=ohostedcons.Const.E_RESTORE_SUCCESS_STRING, fail_string=ohostedcons.Const.E_RESTORE_FAIL_STRING, ) ~~~ I tried to restore inside the problematic HE VM with --no-restore-permissions and it passed. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. make dwh db to have extra permissions on db objects https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Administration_Guide/sect-History_Database.html#Allowing_Read_Only_Access_to_the_History_Database 2. 3. Actual results: restore inside HE VM via cloud-init fails Expected results: should pass or at least inform user somehow Additional info:
I used following workaround which made it pass successfully: ~~~ # diff -uNp /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py{.orig,} --- /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py.orig 2016-08-19 23:01:10.404521450 +0200 +++ /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py 2016-08-19 23:01:40.761731067 +0200 @@ -840,7 +840,7 @@ class Plugin(plugin.PluginBase): ]: engine_restore = ( ' - engine-backup --mode=restore --file={backup_file}' - ' --log=engine_restore.log --restore-permissions' + ' --log=engine_restore.log --no-restore-permissions' ' --provision-db {p_dwh_db} {p_reports_db}' ' 1>{port}' ' 2>&1\n' ~~~
+1 for changing to --no-restore-permissions. It was mentioned in [1] but somehow slipped later on and not included in [2]. We should also add a section about this to the docs, saying that the user has to manually add extra users/grants as needed. [1] https://gerrit.ovirt.org/#/c/56933/37/src/plugins/ovirt-hosted-engine-common/vm/cloud_init.py [2] https://gerrit.ovirt.org/57521
This will probably break the CFME integration. Why is it like this? Can't we restore the role if the DB is self managed?
(In reply to Yaniv Dary from comment #4) > This will probably break the CFME integration. I admit I do not know much about how it works. If it required manual actions during !=3.3 setup, it will require same manual actions during upgrade. > Why is it like this? Can't we restore the role if the DB is self managed? Please see the very long discussion in bug 1220791.
Admins can creates users on the history DB and we need to support this. I understand that in backup\restore, we avoided this issue, but for the migration and self managed DBs, we need to recreate the roles to allow users to continue to work as before on the new DB. What are the options for design to resolve this issue?
(In reply to Yaniv Dary from comment #6) > Admins can creates users on the history DB and we need to support this. I > understand that in backup\restore, we avoided this issue, but for the > migration and self managed DBs, we need to recreate the roles to allow users > to continue to work as before on the new DB. > What are the options for design to resolve this issue? What if engine or dwh (or both) used remote databases? What if the user used the local postgres installation also for other applications, or for different versions/setups of oVirt? E.g. for development/testing? The project should not treat the machine it's running on as if it's its owner. If we want to own postgresql, we should say so, see bug 1191995. What if the admin had some other tools/agents/applications/whatever on this machine? Should we backup/restore that as well? I do not think so. The admin should understand well what we provide and what we don't, and prepare accordingly. We should help the admin by writing good documentation, not by trying to guess around stuff and risk more damage. It should be quite easy to backup all users of a postgresql setup with 'pg_dumpall -g' (and restore with psql). No problem adding this somewhere in the documentation. Should we add an option to engine-backup to do that? Not sure about that. Should it be enabled by default? I do not think so.
(In reply to Yedidyah Bar David from comment #7) > (In reply to Yaniv Dary from comment #6) > > Admins can creates users on the history DB and we need to support this. I > > understand that in backup\restore, we avoided this issue, but for the > > migration and self managed DBs, we need to recreate the roles to allow users > > to continue to work as before on the new DB. > > What are the options for design to resolve this issue? > > What if engine or dwh (or both) used remote databases? Then the user has a chance to create them in advance. > > What if the user used the local postgres installation also for other > applications, or for different versions/setups of oVirt? E.g. for > development/testing? This is out scope especially for HE. > > Should we add an option to engine-backup to do that? Not sure about that. > Should it be enabled by default? I do not think so. For appliance migration and local DB we need to not break and allow users to get to the same place he left off.
Created bug 1369797 for now, to not fail upgrade. bug 1369757 is to make engine-backup create the extra users, and if/when we do that, we can revert the patch for bug 1369797 to try and restore them.
How will fixing this for 4.1 resolve the problem? It will be post migration and all the users will be gone.
No changes needed for ovirt-hosted-engine-setup, once bug 1369757 is fixed - can then move to QE. Keeping open instead of closing duplicate because the verification of bug 1369757 is simpler, so better try to verify current only after bug 1369757 is verified.
*** Bug 1369797 has been marked as a duplicate of this bug. ***
Setting Test Only for this bug.
fails, i suppose it fails because BZ1369757 fails as well. |- Please note: |- The engine database was backed up at 2016-09-16 10:38:05.000000000 -0400 . |- Objects that were added, removed or changed after this date, such as virtual |- machines, disks, etc., are missing in the engine, and will probably require |- recovery or recreation. |- ------------------------------------------------------------------------------ |- - DWH database 'ovirt_engine_history' |- FATAL: Errors while restoring database ovirt_engine_history |- HE_APPLIANCE_ENGINE_RESTORE_FAIL [ ERROR ] Engine backup restore failed on the appliance [ ERROR ] Failed to execute stage 'Closing up': engine-backup failed restoring the engine backup on the appliance Please check its log on the appliance. [ INFO ] Stage: Clean up [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine upgrade failed: this system is not reliable, you can use --rollback-upgrade option to recover the engine VM disk from a backup Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160916163515-tupliq.log
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
bug 1369757 is on modified, moving current too.
ok, rhevm-appliance-20160922.0-1.el7ev.noarch as part for hosted-engine --upgrade-appliance... ... |- Provisioning PostgreSQL users/databases: |- - user 'engine', database 'engine' |- - user 'ovirt_engine_history', database 'ovirt_engine_history' |- - extra user 'readonly' having grants on database ovirt_engine_history, created with a random password ...