Bug 1330715 - engine-cleanup failed after corrupt update
Summary: engine-cleanup failed after corrupt update
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 3.6.4.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Greg Padgett
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-26 18:53 UTC by gregor
Modified: 2017-12-22 07:45 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-09 20:12:27 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description gregor 2016-04-26 18:53:15 UTC
Description of problem:
I tried to upgrade from 3.6.4.1 to 3.6.5 which give me an error. After this the engine did not startup anymore because of many database errors. To recover to the old state I run engine-cleanup which gives me this:

[ ERROR ] Failed to execute stage 'Misc configuration': must be owner of function public.ovirt_repair_failed_merge

Version-Release number of selected component (if applicable):
ovirt-engine-3.6.4.1-1.el7.centos.noarch

How reproducible:
Upgrade from 3.6.4.1 to 3.6.5 with engine-setup, when it fails cleanup with "engine-cleanup"

Steps to Reproduce:
1. Start engine-setup and select yes to upgrade from 3.6.4.1 to 3.6.5 
2. Now the engine-setup fails with 
2016-04-26 20:10:59 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Misc configuration': Command '/usr/share/ovirt-engine/dbscripts/schema.sh' failed to execute
3. systemctl start ovirt-engine which fails, can't find the right error message because of the millions of messages
4. engine-cleanup to restore the settings previously made with engine-backup
5. failed with the above error messages

Actual results:
engine-cleanup did not finish his job

Expected results:
engine-cleanup should cleanup the engine so the settings can be restores from the previous backup made with engine-backup

Comment 1 gregor 2016-04-26 19:28:06 UTC
UPDATE:
I fixed the permissions of the function 'ovirt_repair_failed_merge' then engine-cleanup did his job.

These where my steps:

1. List users and note their id's

sudo -u postgres psql engine -c "
SELECT u.usename,
  u.usesysid,
  CASE WHEN u.usesuper AND u.usecreatedb THEN CAST('superuser, create
database' AS pg_catalog.text)
       WHEN u.usesuper THEN CAST('superuser' AS pg_catalog.text)
       WHEN u.usecreatedb THEN CAST('create database' AS
pg_catalog.text)
       ELSE CAST('' AS pg_catalog.text)
  END AS "Attributes"
FROM pg_catalog.pg_user u
ORDER BY 1;
"

2. Check the owner

sudo -u postgres psql engine -c "
SELECT  p.proname, p.proowner
FROM    pg_catalog.pg_namespace n
JOIN    pg_catalog.pg_proc p
ON      p.pronamespace = n.oid
WHERE   n.nspname = 'public' and p.proname = 'ovirt_repair_failed_merge'
"

3. Fix the owner of the function with the right id from the user 'engine' noted from step 1.

sudo -u postgres psql engine -c "
UPDATE pg_catalog.pg_proc
SET proowner = 16384
WHERE proname = 'ovirt_repair_failed_merge'
"

4. Check the owner as in step 2.
5. Run engine-cleanup

TODO: Restore and re-run the upgrade to check side-effects

Comment 2 gregor 2016-04-26 20:24:59 UTC
UPDATE: This fixed the problem and after restore the update worked ;-)

Should I close this? I think this can be a bug but its not clear why the permissions of the function 'ovirt_repair_failed_merge' was messed up.

Comment 3 Yaniv Kaul 2016-04-27 17:26:12 UTC
(In reply to gregor from comment #2)
> UPDATE: This fixed the problem and after restore the update worked ;-)

Thanks - this is important information.
> 
> Should I close this? I think this can be a bug but its not clear why the
> permissions of the function 'ovirt_repair_failed_merge' was messed up.

We'll check it, anything out of ordinary in your setup?

Comment 4 gregor 2016-04-27 20:39:28 UTC
Today I updated another host without problems. This host was first installed with something 3.6.*. The problem host was initially installed with something 3.5.* and as an AIO-installation.

Comment 5 Eli Mesika 2016-05-03 09:02:11 UTC
ovirt_repair_failed_merge is not in the context of engine source nor on engine-setup sources, I wonder which component owns it

Comment 6 Eli Mesika 2016-05-03 09:19:48 UTC
Seems related to https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (thanks to Didi!)

Greg, can you take a look at that, it seems that this new SP can be installed into the database with any legal user which can make problems in engine-cleanup

Comment 7 Greg Padgett 2016-05-03 13:38:57 UTC
(In reply to Eli Mesika from comment #6)
> Seems related to https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13
> (thanks to Didi!)
> 
> Greg, can you take a look at that, it seems that this new SP can be
> installed into the database with any legal user which can make problems in
> engine-cleanup

Thanks for finding this one.  The ovirt_repair_failed_merge is a stored procedure that is part of a temporary workaround for an engine bug in Live Merge; I should have included in the original instructions to remove the SP after use.

Consequently, the easiest way to fix this problem would be to simply drop the SP:

psql <dbname> -U <dbuser> -c 'DROP FUNCTION ovirt_repair_failed_merge(varchar, varchar, varchar, varchar)'

I'll update the other related bugs with this info as well.

Comment 8 Oved Ourfali 2016-05-04 08:39:41 UTC
Based on last comment, moving to Storage.

Comment 9 Allon Mureinik 2016-05-05 07:36:11 UTC
Greg, as this stored procedure is created as a manual workaround, is there anything to actually do here besides update the KBase?

Comment 10 Greg Padgett 2016-05-09 20:12:27 UTC
(In reply to Allon Mureinik from comment #9)
> Greg, as this stored procedure is created as a manual workaround, is there
> anything to actually do here besides update the KBase?

I don't think so.  I confirmed in bug 1308501 that the instructions in the KBase now indicates that the script should be removed after use, and also added that to other bugs mentioning the workaround.  I believe we're good here.

(Gregor, I'm closing this as notabug--not because it isn't an issue, but because I don't see anything that means "addressed outside the product code".  Thanks for finding/reporting it.)

Comment 11 Allon Mureinik 2016-05-18 13:28:44 UTC
(In reply to Greg Padgett from comment #10)
> (Gregor, I'm closing this as notabug--not because it isn't an issue, but
> because I don't see anything that means "addressed outside the product
> code".  Thanks for finding/reporting it.)
DEFERED kinda-sorta means that.


Note You need to log in before you can comment on or make changes to this bug.