Bug 1412556

Summary: [UPGRADE] 4.0 -> 4.1 engine upgrade fails - ERROR: column "mac_pool_id" contains null values
Product: [oVirt] ovirt-engine Reporter: Gil Klein <gklein>
Component: Database.CoreAssignee: Martin Mucha <mmucha>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.1.0CC: bugs, danken, mperina
Target Milestone: ovirt-4.1.0-betaFlags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: http://resources.ovirt.org/repos/ovirt/experimental/4.1/latest.tested/rpm/el7/noarch/ovirt-engine-4.1.0-0.4.master.20170115090623.git8e588d9.el7.centos.noarch.rpm Doc Type: If docs needed, set a value
Doc Text:
upgrade script 04_01_0010_add_mac_pool_id_to_vds_group.sql assumed, that there cannot exist clusters without relation to some data center. Such clusters won't be able to run any VM and would have other serious problems, therefore it was assumed, that no one has this setup. This assumption was wrong and because of that db script failed on creation not null db constraint. After this fix upgrade works also for environments containing such clusters.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-23 13:11:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gil Klein 2017-01-12 09:46:49 UTC
Description of problem:

When upgrading 4.0->4.1 engine upgrade fails with "4.0 -> 4.1 engine upgrade failed - ERROR:  column "mac_pool_id" contains null values"


Version-Release number of selected component (if applicable):
From: ovirt-engine-4.0.6.3-0.1.el7ev.noarch
To: ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.3.beta2.el7.noarch
How reproducible:
100%

Reproduced on a copy of a QE production system - Steps to Reproduce:
1. Setup a RHEL 7.3 machine
2. Install it with 4.0 rpms
3. Restore the production system db
4. Run engine-setup
5. Set up the 4.1 repos
6. yum update ovirt-engine-setup  ovirt-engine-dwh
7. engine-setup

Actual results:
ERROR:  column "mac_pool_id" contains null values"


Expected results:
Upgrade should pass correctly


Additional info:

sql:/usr/share/ovirt-engine/dbscripts/common_sp.sql:1146: NOTICE:  drop cascades to function fn_db_get_async_tasks()
NOTICE:  drop cascades to trigger delete_disk_image_dynamic_for_image on table images
psql:/usr/share/ovirt-engine/dbscripts/upgrade/04_01_0010_add_mac_pool_id_to_vds_group.sql:17: ERROR:  column "mac_pool_id" contains null values
FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/04_01_0010_add_mac_pool_id_to_vds_group.sql

2017-01-12 10:57:19 ERROR otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema schema._misc:320 schema.sh: FATAL: Cannot execute sql command: --file=/usr
/share/ovirt-engine/dbscripts/upgrade/04_01_0010_add_mac_pool_id_to_vds_group.sql
2017-01-12 10:57:19 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/schema.py", line 322, in _misc
    raise RuntimeError(_('Engine schema refresh failed'))
RuntimeError: Engine schema refresh failed
2017-01-12 10:57:19 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Misc configuration': Engine schema refresh failed
2017-01-12 10:57:19 DEBUG otopi.transaction transaction.abort:119 aborting 'Yum Transaction'

Comment 2 Yaniv Kaul 2017-01-12 11:14:10 UTC
Dan, can you have someone look at this?

Comment 3 Dan Kenigsberg 2017-01-12 12:27:03 UTC
null mac addresses are the result of bugs, which we cordially ignored in the past, and caused us lots of pain in debugging. In 4.1 we've added db-level protection, to make sure they do not infect us.

Gil, can you provide your Engine credentials to mmucha?

Martin may be able to write a tool to list buggy db rows in 4.0, so that a user can fix them prior to upgrade.

Comment 4 Martin Mucha 2017-01-12 12:55:48 UTC
script fails on line 17; where fails adding not null constraint. This means that preceding update statement did not remove all null values in cluster.mac_pool_id.

I believe this problem is caused by unexpected existence of cluster, which is not part of any datacenter ( not sure why this should be valid, and how this worked in past, when all macs had scope of datacenter)

Not sure if this is possible, but until script is fixed, you can assign cluster which does not belong to any datacenter to some newly created arbitrary data center and then run upgrade.

Comment 5 Gil Klein 2017-01-12 13:07:11 UTC
(In reply to Dan Kenigsberg from comment #3)
> null mac addresses are the result of bugs, which we cordially ignored in the
> past, and caused us lots of pain in debugging. In 4.1 we've added db-level
> protection, to make sure they do not infect us.
> 
> Gil, can you provide your Engine credentials to mmucha?
Sure, I emailed you both privately. 
> 
> Martin may be able to write a tool to list buggy db rows in 4.0, so that a
> user can fix them prior to upgrade.

Comment 6 Martin Mucha 2017-01-12 13:52:28 UTC
updated original script in gerrit patch: 70108

Comment 7 Martin Mucha 2017-01-12 13:59:25 UTC
(In reply to Gil Klein from comment #5)
> (In reply to Dan Kenigsberg from comment #3)
> > null mac addresses are the result of bugs, which we cordially ignored in the
> > past, and caused us lots of pain in debugging. In 4.1 we've added db-level
> > protection, to make sure they do not infect us.
> > 
> > Gil, can you provide your Engine credentials to mmucha?
> Sure, I emailed you both privately. 
> > 
> > Martin may be able to write a tool to list buggy db rows in 4.0, so that a
> > user can fix them prior to upgrade.

seems that in shared environment there's cluster without non-null storage_pool_id, thus it seems to be same issue as one in 
https://bugzilla.redhat.com/show_bug.cgi?id=1410189

I created cluster without link to DC on my env, and tried to to upgrade. Before patch 70108 it failed, but after this fix upgrade was successful.

Comment 8 Dan Kenigsberg 2017-01-12 14:25:12 UTC
(In reply to Dan Kenigsberg from comment #3)
> null mac addresses are the result of bugs, which we cordially ignored in the
> past, and caused us lots of pain in debugging. In 4.1 we've added db-level
> protection, to make sure they do not infect us.

Please disregard my my comment 3. It refers to an unrelated issue.

This bug seems like a dup of bug 1412556.

Comment 9 Dan Kenigsberg 2017-01-16 09:05:20 UTC
Thanks for the doc suggestion, Martin, but no user has seen this script (as 4.1 is not released yet) so no need to document it.

Comment 10 Dan Kenigsberg 2017-01-23 13:11:27 UTC

*** This bug has been marked as a duplicate of bug 1410189 ***