Bug 1368589 - HE_APPLIANCE_ENGINE_RESTORE_FAIL - extra permissions on dwh db objects issue
Summary: HE_APPLIANCE_ENGINE_RESTORE_FAIL - extra permissions on dwh db objects issue
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.0.1.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.0.4
: 2.0.2.2
Assignee: Yedidyah Bar David
QA Contact: Jiri Belka
URL:
Whiteboard:
: 1369797 (view as bug list)
Depends On: 1369757
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-19 20:55 UTC by Jiri Belka
Modified: 2017-05-11 09:31 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
: 1369797 (view as bug list)
Environment:
Last Closed: 2016-09-26 12:41:58 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.0.z+
ylavi: exception+
mgoldboi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)

Description Jiri Belka 2016-08-19 20:55:49 UTC
Description of problem:

Our brq-setup is long-running env, iirc engine deployment was done before 3.3 with then read-only user in reports. Then it was migrated to SHE and now we are migrating to 4.0.

Unfortunatelly hosted-engine --upgrade-appliance fails when restoring dwh db inside appliance OS...

~~~
     ...
          |- Provisioning PostgreSQL users/databases:
          |- - user 'engine', database 'engine'
          |- - user 'engine_history', database 'ovirt_engine_history'
          |- Restoring:
          |- - Engine database 'engine'
          |-   - Cleaning up temporary tables in engine database 'engine'
          |-   - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database
          |- ------------------------------------------------------------------------------
          |- Please note:
          |- The engine database was backed up at 2016-08-19 14:30:20.000000000 -0400 .
          |- Objects that were added, removed or changed after this date, such as virtual
          |- machines, disks, etc., are missing in the engine, and will probably require
          |- recovery or recreation.
          |- ------------------------------------------------------------------------------
          |- - DWH database 'ovirt_engine_history'
          |- FATAL: Errors while restoring database ovirt_engine_history
          |- HE_APPLIANCE_ENGINE_RESTORE_FAIL
[ ERROR ] Engine backup restore failed on the appliance
[ ERROR ] Failed to execute stage 'Closing up': engine-backup failed restoring the engine backup on the appliance Please check its log on the appliance.
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, you can use --rollback-upgrade option to recover the engine VM disk from a backup
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160819203422-skz2jm.log
~~~

In engine-restore.log I see:

~~~
...
2016-08-19 15:20:50 2620: pg_cmd running: pg_restore -w -U engine -h localhost -p 5432 -d engine -j 2 /tmp/engine-backup.cVuqEpikwA/db/engine_backup.db
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3181; 2612 2618669 PROCEDURAL LANGUAGE plpgsql engine
pg_restore: [archiver (db)] could not execute query: ERROR:  language "plpgsql" already exists
    Command was: CREATE PROCEDURAL LANGUAGE plpgsql;
...
pg_restore: [archiver (db)] could not execute query: ERROR:  role "nasrouser" does not exist
    Command was: REVOKE ALL ON TABLE calendar FROM PUBLIC;
REVOKE ALL ON TABLE calendar FROM engine_history;
GRANT ALL ON TABLE calendar TO e...
pg_restore: [archiver (db)] Error from TOC entry 4135; 0 0 ACL cluster_configuration engine_history
pg_restore: [archiver (db)] could not execute query: ERROR:  role "nasrouser" does not exist
    Command was: REVOKE ALL ON TABLE cluster_configuration FROM PUBLIC;
REVOKE ALL ON TABLE cluster_configuration FROM engine_history;
GRANT ...
...
~~~

IIRC there used to be a bug about 'nasrouser' causing restore fail.

https://bugzilla.redhat.com/show_bug.cgi?id=1217402

It seems it is related to --restore-permissions in cloud-init.py:

~~~
# sed -n '841,867p' /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py 
                engine_restore = (
                    ' - engine-backup --mode=restore --file={backup_file}'
                    ' --log=engine_restore.log --restore-permissions'
                    ' --provision-db {p_dwh_db} {p_reports_db}'
                    ' 1>{port}'
                    ' 2>&1\n'
                    ' - if [ $? -eq 0 ];'
                    ' then echo "{success_string}" >{port};'
                    ' else echo "{fail_string}" >{port};'
                    ' fi\n'
                ).format(
                    backup_file=self.environment[
                        ohostedcons.Upgrade.BACKUP_FILE
                    ],
                    p_dwh_db='--provision-dwh-db' if self.environment[
                        ohostedcons.Upgrade.RESTORE_DWH
                    ] else '',
                    p_reports_db='--provision-reports-db' if self.environment[
                        ohostedcons.Upgrade.RESTORE_REPORTS
                    ] else '',
                    port=(
                        ohostedcons.Const.VIRTIO_PORTS_PATH +
                        ohostedcons.Const.OVIRT_HE_CHANNEL_NAME
                    ),
                    success_string=ohostedcons.Const.E_RESTORE_SUCCESS_STRING,
                    fail_string=ohostedcons.Const.E_RESTORE_FAIL_STRING,
                )
~~~

I tried to restore inside the problematic HE VM with --no-restore-permissions and it passed.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. make dwh db to have extra permissions on db objects
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Administration_Guide/sect-History_Database.html#Allowing_Read_Only_Access_to_the_History_Database
2.
3.

Actual results:
restore inside HE VM via cloud-init fails

Expected results:
should pass or at least inform user somehow

Additional info:

Comment 2 Jiri Belka 2016-08-19 22:44:25 UTC
I used following workaround which made it pass successfully:

~~~
# diff -uNp /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py{.orig,}
--- /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py.orig     2016-08-19 23:01:10.404521450 +0200
+++ /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py  2016-08-19 23:01:40.761731067 +0200
@@ -840,7 +840,7 @@ class Plugin(plugin.PluginBase):
             ]:
                 engine_restore = (
                     ' - engine-backup --mode=restore --file={backup_file}'
-                    ' --log=engine_restore.log --restore-permissions'
+                    ' --log=engine_restore.log --no-restore-permissions'
                     ' --provision-db {p_dwh_db} {p_reports_db}'
                     ' 1>{port}'
                     ' 2>&1\n'
~~~

Comment 3 Yedidyah Bar David 2016-08-21 07:41:13 UTC
+1 for changing to --no-restore-permissions.

It was mentioned in [1] but somehow slipped later on and not included in [2].

We should also add a section about this to the docs, saying that the user
has to manually add extra users/grants as needed.

[1] https://gerrit.ovirt.org/#/c/56933/37/src/plugins/ovirt-hosted-engine-common/vm/cloud_init.py
[2] https://gerrit.ovirt.org/57521

Comment 4 Yaniv Lavi 2016-08-21 10:33:21 UTC
This will probably break the CFME integration.
Why is it like this? Can't we restore the role if the DB is self managed?

Comment 5 Yedidyah Bar David 2016-08-21 12:05:58 UTC
(In reply to Yaniv Dary from comment #4)
> This will probably break the CFME integration.

I admit I do not know much about how it works.

If it required manual actions during !=3.3 setup, it will require same manual actions during upgrade.

> Why is it like this? Can't we restore the role if the DB is self managed?

Please see the very long discussion in bug 1220791.

Comment 6 Yaniv Lavi 2016-08-21 12:13:53 UTC
Admins can creates users on the history DB and we need to support this. I understand that in backup\restore, we avoided this issue, but for the migration and self managed DBs, we need to recreate the roles to allow users to continue to work as before on the new DB. 
What are the options for design to resolve this issue?

Comment 7 Yedidyah Bar David 2016-08-21 13:03:35 UTC
(In reply to Yaniv Dary from comment #6)
> Admins can creates users on the history DB and we need to support this. I
> understand that in backup\restore, we avoided this issue, but for the
> migration and self managed DBs, we need to recreate the roles to allow users
> to continue to work as before on the new DB. 
> What are the options for design to resolve this issue?

What if engine or dwh (or both) used remote databases?

What if the user used the local postgres installation also for other applications, or for different versions/setups of oVirt? E.g. for development/testing?

The project should not treat the machine it's running on as if it's its owner. If we want to own postgresql, we should say so, see bug 1191995.

What if the admin had some other tools/agents/applications/whatever on this machine? Should we backup/restore that as well? I do not think so. The admin should understand well what we provide and what we don't, and prepare accordingly. We should help the admin by writing good documentation, not by trying to guess around stuff and risk more damage.

It should be quite easy to backup all users of a postgresql setup with 'pg_dumpall -g' (and restore with psql). No problem adding this somewhere in the documentation.

Should we add an option to engine-backup to do that? Not sure about that. Should it be enabled by default? I do not think so.

Comment 8 Yaniv Lavi 2016-08-21 15:01:10 UTC
(In reply to Yedidyah Bar David from comment #7)
> (In reply to Yaniv Dary from comment #6)
> > Admins can creates users on the history DB and we need to support this. I
> > understand that in backup\restore, we avoided this issue, but for the
> > migration and self managed DBs, we need to recreate the roles to allow users
> > to continue to work as before on the new DB. 
> > What are the options for design to resolve this issue?
> 
> What if engine or dwh (or both) used remote databases?

Then the user has a chance to create them in advance.

> 
> What if the user used the local postgres installation also for other
> applications, or for different versions/setups of oVirt? E.g. for
> development/testing?

This is out scope especially for HE.

> 
> Should we add an option to engine-backup to do that? Not sure about that.
> Should it be enabled by default? I do not think so.

For appliance migration and local DB we need to not break and allow users to get to the same place he left off.

Comment 9 Yedidyah Bar David 2016-08-24 12:34:29 UTC
Created bug 1369797 for now, to not fail upgrade.

bug 1369757 is to make engine-backup create the extra users, and if/when we do that, we can revert the patch for bug 1369797 to try and restore them.

Comment 10 Yaniv Lavi 2016-08-24 23:37:48 UTC
How will fixing this for 4.1 resolve the problem? It will be post migration and all the users will be gone.

Comment 11 Yedidyah Bar David 2016-08-25 14:07:36 UTC
No changes needed for ovirt-hosted-engine-setup, once bug 1369757 is fixed - can then move to QE. Keeping open instead of closing duplicate because the verification of bug 1369757 is simpler, so better try to verify current only after bug 1369757 is verified.

Comment 12 Yedidyah Bar David 2016-08-25 14:08:08 UTC
*** Bug 1369797 has been marked as a duplicate of this bug. ***

Comment 13 Sandro Bonazzola 2016-08-31 12:10:12 UTC
Setting Test Only for this bug.

Comment 14 Jiri Belka 2016-09-16 14:56:26 UTC
fails, i suppose it fails because BZ1369757 fails as well.


          |- Please note:
          |- The engine database was backed up at 2016-09-16 10:38:05.000000000 -0400 .
          |- Objects that were added, removed or changed after this date, such as virtual
          |- machines, disks, etc., are missing in the engine, and will probably require
          |- recovery or recreation.
          |- ------------------------------------------------------------------------------
          |- - DWH database 'ovirt_engine_history'
          |- FATAL: Errors while restoring database ovirt_engine_history
          |- HE_APPLIANCE_ENGINE_RESTORE_FAIL
[ ERROR ] Engine backup restore failed on the appliance
[ ERROR ] Failed to execute stage 'Closing up': engine-backup failed restoring the engine backup on the appliance Please check its log on the appliance. 
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, you can use --rollback-upgrade option to recover the engine VM disk from a backup
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160916163515-tupliq.log

Comment 15 Red Hat Bugzilla Rules Engine 2016-09-16 14:56:32 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 16 Yedidyah Bar David 2016-09-20 06:12:22 UTC
bug 1369757 is on modified, moving current too.

Comment 17 Jiri Belka 2016-09-23 12:24:13 UTC
ok, rhevm-appliance-20160922.0-1.el7ev.noarch

as part for hosted-engine --upgrade-appliance...

...
          |- Provisioning PostgreSQL users/databases:
          |- - user 'engine', database 'engine'
          |- - user 'ovirt_engine_history', database 'ovirt_engine_history'
          |- - extra user 'readonly' having grants on database ovirt_engine_history, created with a random password
...


Note You need to log in before you can comment on or make changes to this bug.