Bug 1066654

Summary: 'engine-backup --mode=restore' fails after engine-cleanup on postgres 8
Product: [Retired] oVirt Reporter: Trey Dockendorf <treydock>
Component: ovirt-engine-coreAssignee: Yedidyah Bar David <didi>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Beňas <pbenas>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3CC: acathrow, didi, emesika, gklein, iheim, lvernia, pbenas, pstehlik, sbonazzo, s.kieske, treydock, yeylon
Target Milestone: ---   
Target Release: 3.4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1069131 (view as bug list) Environment:
Last Closed: 2014-05-08 13:36:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1069131    
Attachments:
Description Flags
pg_dump engine
none
ovirt-engine-remove-20140218135418.log none

Description Trey Dockendorf 2014-02-18 19:59:40 UTC
Description of problem:

After executing engine-cleanup, engine-backup --mode=restore does not work unless steps are performed that were supposed to be completed by engine-cleanup.

The ovirt-engine service is not stopped and the database is not properly removed or emptied.

Version-Release number of selected component (if applicable):

ovirt 3.3.3 on CentOS 6.5

How reproducible:

100%

Steps to Reproduce:
1. Backup using engine-backup

$ engine-backup --mode=backup --scope=all --file=/tmp/engine.tar.bz2 --log=/tmp/engine-backup.log

2. Run engine-cleanup answering Yes and OK to all prompts

3. Attempt to restore engine which fails with "FATAL: Engine service is active - can not restore backup"

$ engine-backup --mode=restore --scope=all --file=/tmp/engine.tar.bz2 --log=/tmp/engine-backup.log --change-db-credentials --db-host=localhost --db-port=5432 --db-user=engine --db-password=<PASS> --db-name=engine

4. Stop the ovirt-engine service (even though engine-cleanup output stated this was done)

5. Attempt restore again which fails with "FATAL: Database is not empty"

$ engine-backup --mode=restore --scope=all --file=/tmp/engine.tar.bz2 --log=/tmp/engine-backup.log --change-db-credentials --db-host=localhost --db-port=5432 --db-user=engine --db-password=<PASS> --db-name=engine

6. Drop engine database and re-create

$ su - postgres -c "dropdb engine"
$ su - postgres -c "psql -c \"create database engine owner engine template template0 encoding 'UTF8' lc_collate 'en_US.UTF-8' lc_ctype 'en_US.UTF-8'\""

7. Attempt restore, and this time it will succeed

$ engine-backup --mode=restore --scope=all --file=/tmp/engine.tar.bz2 --log=/tmp/engine-backup.log --change-db-credentials --db-host=localhost --db-port=5432 --db-user=engine --db-password=<PASS>--db-name=engine

Actual results:

After engine-cleanup the ovirt-engine service is not stopped and the engine database is not empty.

Based on mailing list suggestion, here is output from pg_dump

$ su - postgres -c "pg_dump engine | grep -i ^create"
CREATE PROCEDURAL LANGUAGE plpgsql;

Expected results:

The ovirt-engine service would be completely stopped and the database completely cleared or removed so a engine-backup --mode=restore would work without manual intervention.

Additional info:

Comment 1 Trey Dockendorf 2014-02-18 20:01:28 UTC
Created attachment 864774 [details]
pg_dump engine

engine database after engine-cleanup is run.

Comment 2 Trey Dockendorf 2014-02-18 20:03:56 UTC
Created attachment 864775 [details]
ovirt-engine-remove-20140218135418.log

log from engine-cleanup.  Domain name replaced with <OMIT>.

Comment 3 Trey Dockendorf 2014-02-19 00:36:46 UTC
A restore fails with ovirt-engine-3.4.0-0.7.beta2.el6.noarch as well.

Performed a backup as before then ran a full 'engine-cleanup'.  A restore fails with "FATAL: Database is not empty".  Dropping and re-creating the engine database allows restore to work.

Comment 4 Eli Mesika 2014-02-23 21:10:38 UTC
This is taken from engine-backup code 

---snip---
verifyConnection() {
        PGPASSFILE="${MYPGPASS}" psql \
                -w \
                -U "${ENGINE_DB_USER}" \
                -h "${ENGINE_DB_HOST}" \
                -p "${ENGINE_DB_PORT}" \
                -d "${ENGINE_DB_DATABASE}" \
                -c "select 1" \
                >> "${LOG}" 2>&1 \
                || logdie "Can't connect to the database. Please see '${0} --help'."

        PGPASSFILE="${MYPGPASS}" pg_dump \
                -U "${ENGINE_DB_USER}" \
                -h "${ENGINE_DB_HOST}" \
                -p "${ENGINE_DB_PORT}" \
                "${ENGINE_DB_DATABASE}" | \
                grep -vi '^create extension' | \
                grep -iq '^create' && \
                logdie "Database is not empty"
}
---snip---

Why the method used to check if DB is empty is by checking the CREATE EXTENSION in the output of pg_dump ??? This is not correct since the backup generates 
CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;

The db cleanup is not cleaning installed extensions and is not supposed to do so.

This is what is causing the restore to fail, please look instead for a the creation of the schema_version table 

The fix should be in the engine-backup code

Comment 5 Yedidyah Bar David 2014-02-24 06:46:23 UTC
(In reply to Eli Mesika from comment #4)
> This is taken from engine-backup code 
> 
> ---snip---
> verifyConnection() {
>         PGPASSFILE="${MYPGPASS}" psql \
>                 -w \
>                 -U "${ENGINE_DB_USER}" \
>                 -h "${ENGINE_DB_HOST}" \
>                 -p "${ENGINE_DB_PORT}" \
>                 -d "${ENGINE_DB_DATABASE}" \
>                 -c "select 1" \
>                 >> "${LOG}" 2>&1 \
>                 || logdie "Can't connect to the database. Please see '${0}
> --help'."
> 
>         PGPASSFILE="${MYPGPASS}" pg_dump \
>                 -U "${ENGINE_DB_USER}" \
>                 -h "${ENGINE_DB_HOST}" \
>                 -p "${ENGINE_DB_PORT}" \
>                 "${ENGINE_DB_DATABASE}" | \
>                 grep -vi '^create extension' | \
>                 grep -iq '^create' && \
>                 logdie "Database is not empty"
> }
> ---snip---
> 
> Why the method used to check if DB is empty is by checking the CREATE
> EXTENSION in the output of pg_dump ??? This is not correct since the backup
> generates 
> CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
> 
> The db cleanup is not cleaning installed extensions and is not supposed to
> do so.
> 
> This is what is causing the restore to fail, please look instead for a the
> creation of the schema_version table 
> 
> The fix should be in the engine-backup code

Please read carefully the code. It's doing pg_dump, then _removes_ (grep -v) from the output all 'create extension' statements, then checks if any other 'create' statements are left - and if any are found, the db is considered non-empty.

This was checked simply by creating a database and then immediately running pg_dump, to see what's always in the dump.

The current bug is not about 'CREATE EXTENSION' but about 'CREATE PROCEDURAL LANGUAGE plpgsql'. Should engine-backup code ignore this one as well? Or should engine-cleanup drop 'PROCEDURAL LANGUAGE plpgsql'? They might as well be the same thing, with different syntax due to reasons I do not know.

Comment 6 Eli Mesika 2014-02-24 09:50:13 UTC
(In reply to Yedidyah Bar David from comment #5)

> The current bug is not about 'CREATE EXTENSION' but about 'CREATE PROCEDURAL
> LANGUAGE plpgsql'. Should engine-backup code ignore this one as well? Or
> should engine-cleanup drop 'PROCEDURAL LANGUAGE plpgsql'? They might as well
> be the same thing, with different syntax due to reasons I do not know.

This is the difference between PG 8.4 and 9.x when the pgplsql language is created by the CREATE LANGUAGE in 8.4 and considered as EXTENSION in 9.x

I will handle that by making sure that the cleanup drops this language if exists as well

Comment 7 Eli Mesika 2014-02-24 11:06:43 UTC
Moving this to didi since 

The databases are created by user postgres that gives ownership to user engine by

su - postgres -c "psql -d template1 -c \"create database engine  owner engine;\""

template1 has already plpgsql installed with postgres ownership therefor user engine can not drop this language since the owner remains postgres.

Since the cleanup SP is running with user engine, it can not handle this and it must be handled in the context of the engine-backup utility

Only way I see to resolve that in engine-backup when mode = restore :

1)manipulate the backup sql to remove the "create language" statement 
2)run createlang and ignore errors to create the language if not exists 
3)restore the database

Please keep in mind that this should be tested in PG 8.4 and 9.2.x

Comment 8 Yedidyah Bar David 2014-03-09 10:04:10 UTC
*** Bug 1073372 has been marked as a duplicate of this bug. ***

Comment 9 Yedidyah Bar David 2014-03-12 08:31:35 UTC
(In reply to Trey Dockendorf from comment #0)
> Description of problem:
> 
> After executing engine-cleanup, engine-backup --mode=restore does not work
> unless steps are performed that were supposed to be completed by
> engine-cleanup.
> 
> The ovirt-engine service is not stopped and the database is not properly
> removed or emptied.

These are two different issues.

I am currently looking at the second one, "database is not emptied" (Since 3.3 it will not be removed).

About the first one: generally speaking, if 'service ovirt-engine stop', which is what engine-cleanup does, returns successfully and outputs 'OK', but does not actually stop the engine, that's a different bug. Is it possible that it just took a few seconds more? Can you please try again? I don't remember it ever happened to me. If this is reproducible, please: 1. Open another bug "'service ovirt-engine stop' does not stop the engine". 2. Attach to this bug also logs of the engine. Thanks!

Comment 10 Yedidyah Bar David 2014-03-12 09:35:17 UTC
(In reply to Eli Mesika from comment #7)
> Moving this to didi since 
> 
> The databases are created by user postgres that gives ownership to user
> engine by
> 
> su - postgres -c "psql -d template1 -c \"create database engine  owner
> engine;\""
> 
> template1 has already plpgsql installed with postgres ownership therefor
> user engine can not drop this language since the owner remains postgres.
> 
> Since the cleanup SP is running with user engine, it can not handle this and
> it must be handled in the context of the engine-backup utility
> 
> Only way I see to resolve that in engine-backup when mode = restore :
> 
> 1)manipulate the backup sql to remove the "create language" statement 
> 2)run createlang and ignore errors to create the language if not exists 
> 3)restore the database
> 
> Please keep in mind that this should be tested in PG 8.4 and 9.2.x

I now tried merely adding 'grep -vi "^create procedural language"' to the check if it's empty, and restore worked. The restore log also had:

psql:/tmp/engine-backup.MLHXxe1KRk/db/engine_backup.db:16: ERROR:  language "plpgsql" already exists
psql:/tmp/engine-backup.MLHXxe1KRk/db/engine_backup.db:19: ERROR:  must be owner of language plpgsql

Since we do not pass '--set ON_ERROR_STOP=1', it did not fail.

I checked and saw that also the restore done during rollback in engine-setup does not pass ON_ERROR_STOP=1.

Do you think we should add ON_ERROR_STOP=1, or can I just continue by grepping out 'create procedural language'?

I didn't test on fedora (PG 9), but I guess it's the same - we get an error on 'create extension' and ignore it.

I think I'll just push now this simple change, and if you think we should not ignore errors, we should probably open a new bug for that and check everywhere, not just engine-backup --mode=restore on PG 8...

Comment 11 Eli Mesika 2014-03-12 09:54:15 UTC
> I now tried merely adding 'grep -vi "^create procedural language"' to the
> check if it's empty, and restore worked. The restore log also had:
> 
> psql:/tmp/engine-backup.MLHXxe1KRk/db/engine_backup.db:16: ERROR:  language
> "plpgsql" already exists
> psql:/tmp/engine-backup.MLHXxe1KRk/db/engine_backup.db:19: ERROR:  must be
> owner of language plpgsql
> 
> Since we do not pass '--set ON_ERROR_STOP=1', it did not fail.
> 
> I checked and saw that also the restore done during rollback in engine-setup
> does not pass ON_ERROR_STOP=1.
> 
> Do you think we should add ON_ERROR_STOP=1, or can I just continue by
> grepping out 'create procedural language'?

You can continue grep , PG does a strange thing , its supports the 'IF EXISTS' in DROP LANGUAGE but do not provide any IF NOT EXISTS for CREATE LANGUAGE , so we can try to create and ignore the errors 

> 
> I didn't test on fedora (PG 9), but I guess it's the same - we get an error
> on 'create extension' and ignore it.

Yes, BTW , CREATE EXTENSION is only for 9.x and not supported in 8.4

> 
> I think I'll just push now this simple change, and if you think we should
> not ignore errors, we should probably open a new bug for that and check
> everywhere, not just engine-backup --mode=restore on PG 8...

That's OK for me as is if this is the error that we ignore...

Comment 12 Trey Dockendorf 2014-03-19 20:12:55 UTC
(In reply to Yedidyah Bar David from comment #9)

> About the first one: generally speaking, if 'service ovirt-engine stop',
> which is what engine-cleanup does, returns successfully and outputs 'OK',
> but does not actually stop the engine, that's a different bug. Is it
> possible that it just took a few seconds more? Can you please try again? I
> don't remember it ever happened to me. If this is reproducible, please: 1.
> Open another bug "'service ovirt-engine stop' does not stop the engine". 2.
> Attach to this bug also logs of the engine. Thanks!

I am unable to reproduce the service not being stopped.  It may have been taking a few seconds and so my initial reproduction just needed a pause after running engine-cleanup and the engine-backup restore.

Comment 13 Petr Beňas 2014-05-06 11:09:12 UTC
Verified with rhevm-setup-3.4.0-0.16.rc.el6ev.noarch the database is cleaned and the service stopped.

Comment 14 Sandro Bonazzola 2014-05-08 13:36:20 UTC
This is an automated message

oVirt 3.4.1 has been released:
 * should fix your issue
 * should be available at your local mirror within two days.

If problems still persist, please make note of it in this bug report.