Bug 1369797

Summary: HE_APPLIANCE_ENGINE_RESTORE_FAIL - Passes '--restore-permissions' to engine-backup
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Yedidyah Bar David <didi>
Component: GeneralAssignee: Yedidyah Bar David <didi>
Status: CLOSED DUPLICATE QA Contact: meital avital <mavital>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 2.0.1.4CC: bugs, didi, jbelka, mgoldboi, sbonazzo, ylavi
Target Milestone: ovirt-4.0.4Flags: sbonazzo: ovirt-4.0.z?
sbonazzo: blocker?
rule-engine: planning_ack?
sbonazzo: devel_ack+
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1368589 Environment:
Last Closed: 2016-08-25 14:08:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2016-08-24 12:29:12 UTC
+++ This bug was initially created as a clone of Bug #1368589 +++

Description of problem:

Our brq-setup is long-running env, iirc engine deployment was done before 3.3 with then read-only user in reports. Then it was migrated to SHE and now we are migrating to 4.0.

Unfortunatelly hosted-engine --upgrade-appliance fails when restoring dwh db inside appliance OS...

~~~
     ...
          |- Provisioning PostgreSQL users/databases:
          |- - user 'engine', database 'engine'
          |- - user 'engine_history', database 'ovirt_engine_history'
          |- Restoring:
          |- - Engine database 'engine'
          |-   - Cleaning up temporary tables in engine database 'engine'
          |-   - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database
          |- ------------------------------------------------------------------------------
          |- Please note:
          |- The engine database was backed up at 2016-08-19 14:30:20.000000000 -0400 .
          |- Objects that were added, removed or changed after this date, such as virtual
          |- machines, disks, etc., are missing in the engine, and will probably require
          |- recovery or recreation.
          |- ------------------------------------------------------------------------------
          |- - DWH database 'ovirt_engine_history'
          |- FATAL: Errors while restoring database ovirt_engine_history
          |- HE_APPLIANCE_ENGINE_RESTORE_FAIL
[ ERROR ] Engine backup restore failed on the appliance
[ ERROR ] Failed to execute stage 'Closing up': engine-backup failed restoring the engine backup on the appliance Please check its log on the appliance.
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, you can use --rollback-upgrade option to recover the engine VM disk from a backup
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160819203422-skz2jm.log
~~~

In engine-restore.log I see:

~~~
...
2016-08-19 15:20:50 2620: pg_cmd running: pg_restore -w -U engine -h localhost -p 5432 -d engine -j 2 /tmp/engine-backup.cVuqEpikwA/db/engine_backup.db
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3181; 2612 2618669 PROCEDURAL LANGUAGE plpgsql engine
pg_restore: [archiver (db)] could not execute query: ERROR:  language "plpgsql" already exists
    Command was: CREATE PROCEDURAL LANGUAGE plpgsql;
...
pg_restore: [archiver (db)] could not execute query: ERROR:  role "nasrouser" does not exist
    Command was: REVOKE ALL ON TABLE calendar FROM PUBLIC;
REVOKE ALL ON TABLE calendar FROM engine_history;
GRANT ALL ON TABLE calendar TO e...
pg_restore: [archiver (db)] Error from TOC entry 4135; 0 0 ACL cluster_configuration engine_history
pg_restore: [archiver (db)] could not execute query: ERROR:  role "nasrouser" does not exist
    Command was: REVOKE ALL ON TABLE cluster_configuration FROM PUBLIC;
REVOKE ALL ON TABLE cluster_configuration FROM engine_history;
GRANT ...
...
~~~

IIRC there used to be a bug about 'nasrouser' causing restore fail.

https://bugzilla.redhat.com/show_bug.cgi?id=1217402

It seems it is related to --restore-permissions in cloud-init.py:

~~~
# sed -n '841,867p' /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py 
                engine_restore = (
                    ' - engine-backup --mode=restore --file={backup_file}'
                    ' --log=engine_restore.log --restore-permissions'
                    ' --provision-db {p_dwh_db} {p_reports_db}'
                    ' 1>{port}'
                    ' 2>&1\n'
                    ' - if [ $? -eq 0 ];'
                    ' then echo "{success_string}" >{port};'
                    ' else echo "{fail_string}" >{port};'
                    ' fi\n'
                ).format(
                    backup_file=self.environment[
                        ohostedcons.Upgrade.BACKUP_FILE
                    ],
                    p_dwh_db='--provision-dwh-db' if self.environment[
                        ohostedcons.Upgrade.RESTORE_DWH
                    ] else '',
                    p_reports_db='--provision-reports-db' if self.environment[
                        ohostedcons.Upgrade.RESTORE_REPORTS
                    ] else '',
                    port=(
                        ohostedcons.Const.VIRTIO_PORTS_PATH +
                        ohostedcons.Const.OVIRT_HE_CHANNEL_NAME
                    ),
                    success_string=ohostedcons.Const.E_RESTORE_SUCCESS_STRING,
                    fail_string=ohostedcons.Const.E_RESTORE_FAIL_STRING,
                )
~~~

I tried to restore inside the problematic HE VM with --no-restore-permissions and it passed.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. make dwh db to have extra permissions on db objects
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Administration_Guide/sect-History_Database.html#Allowing_Read_Only_Access_to_the_History_Database
2.
3.

Actual results:
restore inside HE VM via cloud-init fails

Expected results:
should pass or at least inform user somehow

Additional info:

--- Additional comment from Jiri Belka on 2016-08-20 01:44:25 IDT ---

I used following workaround which made it pass successfully:

~~~
# diff -uNp /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py{.orig,}
--- /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py.orig     2016-08-19 23:01:10.404521450 +0200
+++ /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-common/vm/cloud_init.py  2016-08-19 23:01:40.761731067 +0200
@@ -840,7 +840,7 @@ class Plugin(plugin.PluginBase):
             ]:
                 engine_restore = (
                     ' - engine-backup --mode=restore --file={backup_file}'
-                    ' --log=engine_restore.log --restore-permissions'
+                    ' --log=engine_restore.log --no-restore-permissions'
                     ' --provision-db {p_dwh_db} {p_reports_db}'
                     ' 1>{port}'
                     ' 2>&1\n'
~~~

--- Additional comment from Yedidyah Bar David on 2016-08-21 10:41:13 IDT ---

+1 for changing to --no-restore-permissions.

It was mentioned in [1] but somehow slipped later on and not included in [2].

We should also add a section about this to the docs, saying that the user
has to manually add extra users/grants as needed.

[1] https://gerrit.ovirt.org/#/c/56933/37/src/plugins/ovirt-hosted-engine-common/vm/cloud_init.py
[2] https://gerrit.ovirt.org/57521

--- Additional comment from Yaniv Dary on 2016-08-21 13:33:21 IDT ---

This will probably break the CFME integration.
Why is it like this? Can't we restore the role if the DB is self managed?

--- Additional comment from Yedidyah Bar David on 2016-08-21 15:05:58 IDT ---

(In reply to Yaniv Dary from comment #4)
> This will probably break the CFME integration.

I admit I do not know much about how it works.

If it required manual actions during !=3.3 setup, it will require same manual actions during upgrade.

> Why is it like this? Can't we restore the role if the DB is self managed?

Please see the very long discussion in bug 1220791.

--- Additional comment from Yaniv Dary on 2016-08-21 15:13:53 IDT ---

Admins can creates users on the history DB and we need to support this. I understand that in backup\restore, we avoided this issue, but for the migration and self managed DBs, we need to recreate the roles to allow users to continue to work as before on the new DB. 
What are the options for design to resolve this issue?

--- Additional comment from Yedidyah Bar David on 2016-08-21 16:03:35 IDT ---

(In reply to Yaniv Dary from comment #6)
> Admins can creates users on the history DB and we need to support this. I
> understand that in backup\restore, we avoided this issue, but for the
> migration and self managed DBs, we need to recreate the roles to allow users
> to continue to work as before on the new DB. 
> What are the options for design to resolve this issue?

What if engine or dwh (or both) used remote databases?

What if the user used the local postgres installation also for other applications, or for different versions/setups of oVirt? E.g. for development/testing?

The project should not treat the machine it's running on as if it's its owner. If we want to own postgresql, we should say so, see bug 1191995.

What if the admin had some other tools/agents/applications/whatever on this machine? Should we backup/restore that as well? I do not think so. The admin should understand well what we provide and what we don't, and prepare accordingly. We should help the admin by writing good documentation, not by trying to guess around stuff and risk more damage.

It should be quite easy to backup all users of a postgresql setup with 'pg_dumpall -g' (and restore with psql). No problem adding this somewhere in the documentation.

Should we add an option to engine-backup to do that? Not sure about that. Should it be enabled by default? I do not think so.

--- Additional comment from Yaniv Dary on 2016-08-21 18:01:10 IDT ---

(In reply to Yedidyah Bar David from comment #7)
> (In reply to Yaniv Dary from comment #6)
> > Admins can creates users on the history DB and we need to support this. I
> > understand that in backup\restore, we avoided this issue, but for the
> > migration and self managed DBs, we need to recreate the roles to allow users
> > to continue to work as before on the new DB. 
> > What are the options for design to resolve this issue?
> 
> What if engine or dwh (or both) used remote databases?

Then the user has a chance to create them in advance.

> 
> What if the user used the local postgres installation also for other
> applications, or for different versions/setups of oVirt? E.g. for
> development/testing?

This is out scope especially for HE.

> 
> Should we add an option to engine-backup to do that? Not sure about that.
> Should it be enabled by default? I do not think so.

For appliance migration and local DB we need to not break and allow users to get to the same place he left off.

Comment 1 Red Hat Bugzilla Rules Engine 2016-08-24 12:29:19 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 2 Yedidyah Bar David 2016-08-24 12:35:11 UTC
For now, change to '--no-restore-permissions'.

Comment 3 Red Hat Bugzilla Rules Engine 2016-08-24 12:36:47 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 4 Yedidyah Bar David 2016-08-25 14:08:08 UTC
Not needed anymore, bug 1369757 is already on POST.

*** This bug has been marked as a duplicate of bug 1368589 ***