Bug 2077387

Summary: Update to 4.5 failed due to failed database schema refresh
Product: [oVirt] ovirt-engine Reporter: Giorgio Biacchi <giorgio>
Component: Setup.EngineAssignee: Eli Mesika <emesika>
Status: CLOSED CURRENTRELEASE QA Contact: Lucie Leistnerova <lleistne>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.5.0.4CC: aperotti, apinnick, baumanmo, bugs, clint, dfodor, emesika, jbreitwe, jortialc, lleistne, lsvaty, michal.skrivanek, mkalinin, mperina, mschwabe, pkubica, rmcswain, serg
Target Milestone: ovirt-4.5.1Flags: mperina: ovirt-4.5+
michal.skrivanek: blocker-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.1 Doc Type: Release Note
Doc Text:
If vdc_options table contains records with NULL default value (most probably as a remains from ancient versions), the upgrade to ovirt-engine-4.5.0 fails. This bug fixes the issue, so upgrade to ovirt-engine-4.5.1 is successful.
Story Points: ---
Clone Of:
: 2099650 (view as bug list) Environment:
Last Closed: 2022-06-23 05:54:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2099650    

Description Giorgio Biacchi 2022-04-21 09:10:08 UTC
Created attachment 1874014 [details]
Complete ovirt-engine-setup log

Description of problem:
engine-setup is unable to complete successfully. During database schema update/refresh it fails to execute a query and engine-setup rollback to 4.4.

Here's the relevant part of the logfile:

NOTICE:  drop cascades to trigger remove_nvram_data_on_update on table vm_static
psql:/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0000_config.sql:22: NOTICE:  column "default_value" of relation "vdc_options" already exi
sts, skipping
psql:/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0000_config.sql:1449: ERROR:  column "default_value" contains null values
CONTEXT:  SQL statement "ALTER TABLE vdc_options ALTER COLUMN default_value SET NOT NULL"
PL/pgSQL function fn_db_change_column_null(character varying,character varying,boolean) line 10 at EXECUTE
FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0000_config.sql


Version-Release number of selected component (if applicable):
I'm updating from the latest 4.4 to 4.5


How reproducible:
Just run engine-setup and already have the column "default_value" with NULL values in the table "vdc_options" 


Steps to Reproduce:
1.
2.
3.

Actual results:
engine-setup fails and update is not done

Expected results:
a successful update to the latest Ovirt

Additional info:
I can easily update the NULL values with empty strings in the database to try to fix the problem if this is the correct thing to do

Comment 1 Martin Perina 2022-04-21 12:35:04 UTC
So there is something special inside your database which prevents the upgrade, I've just tested upgrade from 4.4.10.7 to 4.5.0.4 and it was successful. Could you please run sos-log-collector to gather all required information from your ovirt-engine (including database dump) and atach it to the bug so we can fully investigate the issue? (feel free to use "Make attachment and comment private (visible only to members of the redhat group)" option on the attachment so it's visible only to Red Hat).

Thanks

Comment 4 Martin Perina 2022-04-22 08:04:37 UTC
So following options in vdc_options table contains NULL as a default value:

AdminDomain
AuthenticationMethod
AllowDuplicateMacAddresses
DefaultWorkgroup
AdUserName
AdUserPassword
DomainName
LdapQueryPageSize
LDAPQueryTimeout
LDAPConnectTimeout
LDAPOperationTimeout
LDAPServerPort
LicenseCertificateFingerPrint
MaxLDAPQueryPartsNumber
MacPoolRanges
MaxMacsCountInPool
LdapServers
LDAPProviderTypes
LDAPSecurityAuthentication
VdsFenceOptions
AdUserId
VMMinMemorySizeInMB
EnableVdsHaReservation
KeystoneAuthUrl
GlusterTunedProfile

That is caused by an error in our internal fn_db_change_column_type function, which didn't handle setting NULL/NOT NULL column property. The function has been fixed along in https://gerrit.ovirt.org/117448 but we have missed options which no longer exists. Those options should have been deleted from vdc_options in previous releases, but unfortunately due to unknown reason they still exists. We have no easy way how to remove them now, so we need to fix their default values before properlly marking default value as NON NULL value.

As a workaround following SQL commands should be applied before performing an upgrade to 4.5:

1. Connect to the machine where ovirt-engine is installed
2. Run following commands:

   /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "UPDATE vdc_options SET default_value=option_value WHERE default_value IS NULL AND option_value IS NOT NULL;"
   /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "UPDATE vdc_options SET default_value='' WHERE default_value IS NULL AND option_value IS NULL;"

When successfully applied please continue with upgrade to 4.5

Could you please chec

Comment 5 Martin Perina 2022-04-22 08:06:12 UTC
Sorry, clicked on Save changes too early.

Could you please check above workaround?

Comment 6 Giorgio Biacchi 2022-04-22 08:32:27 UTC
The first query changed 25 rows, the second 0 rows.

The engine-setup worked like a charm and now I'm on 4.5.0.4-1.el8

Thanks for the quick solution.

Comment 7 Martin Perina 2022-04-22 08:44:38 UTC
(In reply to Giorgio Biacchi from comment #6)
> The first query changed 25 rows, the second 0 rows.
> 
> The engine-setup worked like a charm and now I'm on 4.5.0.4-1.el8
> 
> Thanks for the quick solution.

Great to hear it works, thanks a lot for confirmation!

Comment 8 Giorgio Biacchi 2022-04-22 08:58:25 UTC
Now I face another issue...

I run a yum update after the engine-setup to update the rest of the system and I've done a reboot since one of the updates was the kernel. Now I got "500 Internal server error" if I try to reach the UI via web.

I'm still looking for clues in the logs

Comment 10 Giorgio Biacchi 2022-04-22 09:08:15 UTC
Quickest solution ever :) I can confirm that downgrading postgresql-jdbc brings the UI back to life.

Thanks again

Comment 12 Clint Goudie 2022-05-08 14:10:25 UTC
I've been using ovirt for a very long time, using the same database, and I can confirm the problem:

[root@ovirt ovirt-engine]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select *  from vdc_options where default_value is null ;"
 option_id |        option_name         |                  option_value                  | version | default_value
-----------+----------------------------+------------------------------------------------+---------+---------------
       626 | ConfigDir                  | /etc/ovirt-engine                              | general |
       627 | AllowDuplicateMacAddresses | false                                          | general |
       645 | MacPoolRanges              | 00:<snip>-00:<snip>            | general |
       646 | MaxMacsCountInPool         | 100000                                         | general |
       625 | GlusterTunedProfile        | virtual-host,rhgs-sequential-io,rhgs-random-io | 4.2     |
(5 rows)

I dont go mucking about in my DB, so at some point these got added by the system without default values.

Comment 13 Clint Goudie 2022-05-08 14:14:48 UTC
/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "UPDATE vdc_options SET default_value=option_value WHERE default_value IS NULL AND option_value IS NOT NULL;" obviously will fix the schema problem. 

dnf downgrade postgresql-jdbc is also required on centos8 stream as you noted.

Comment 14 Clint Goudie 2022-05-08 14:16:53 UTC
Adding the line: exclude=postgresql-jdbc-42.2.14-1.el8 to /etc/yum.repos.d/CentOS-Stream-AppStream.rep is a good idea as well

Comment 15 Clint Goudie 2022-05-08 15:07:56 UTC
The JDBC section of this bug relates to: https://bugzilla.redhat.com/show_bug.cgi?id=2077794

Comment 16 Lucie Leistnerova 2022-05-23 13:25:27 UTC
Tested in ovirt-engine-4.5.0.7-0.9.el8ev.noarch and engine-setup failed on the same error

Steps:
1. installed 4.4.4 engine
2. added AdminDomain with default_value = NULL
3. added 4.5 repos
4. dnf update ovirt\*setup\*
5. engine-setup

2022-05-23 15:20:43,619+0300 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine_common.distro-rpm.packages packages.verbose:194 DNF     install   : ovirt-engine-dbscripts-4.5.0.7-0.9.el8ev.noarch
...
2022-05-23 15:25:18,210+0300 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema plugin.execute:926 execute-output: ['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p', '5432', '-u', 'engine', '-d', 'engine', '-l', '/var/log/ovirt-engine/setup/ovirt-engine-setup-20220523152034-jyxc56.log', '-c', 'apply'] stderr:
psql:/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0000_config.sql:1454: ERROR:  column "default_value" contains null values
CONTEXT:  SQL statement "ALTER TABLE vdc_options ALTER COLUMN default_value SET NOT NULL"
PL/pgSQL function fn_db_change_column_null(character varying,character varying,boolean) line 10 at EXECUTE
FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0000_config.sql


No 04_05_0100_fix_04_04_0050_set_to_not_null.sql did run

Comment 17 RHEL Program Management 2022-05-23 13:25:57 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 18 Eli Mesika 2022-05-23 16:12:11 UTC
(In reply to Lucie Leistnerova from comment #16)
> 
> No 04_05_0100_fix_04_04_0050_set_to_not_null.sql did run

Please provide the output of the following query on the database :

select * from schema_version where script ILIKE '%04_05_%';

Comment 19 Eli Mesika 2022-05-23 16:22:41 UTC
Reproduced on upgrade from 4.4.4 to master , no need for the info above

Comment 22 Michal Skrivanek 2022-05-24 11:27:50 UTC
this will need another fix. However the workaround is simple (and 4.5.0-1 is out anyway)

Comment 23 Marina Kalinin 2022-05-24 12:23:49 UTC
Eli, any recommendations on what to do if a downstream user hits this and can't get the fix yet?

Comment 24 Marina Kalinin 2022-05-24 14:14:47 UTC
(In reply to Marina Kalinin from comment #23)
> Eli, any recommendations on what to do if a downstream user hits this and
> can't get the fix yet?

Sorry, I missed comment#4. All set now.

Comment 27 Lucie Leistnerova 2022-06-14 07:59:03 UTC
engine-setup successful (update from 4.4.2), AdminDomain default_value changed to ''
Verified in ovirt-engine-4.5.1.1-0.14.el8ev.noarch

Comment 28 Martin Perina 2022-06-16 08:11:22 UTC
*** Bug 2097232 has been marked as a duplicate of this bug. ***

Comment 30 Sandro Bonazzola 2022-06-23 05:54:58 UTC
This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.