Bug 1350220 - engine-setup schema.sh failed to refresh engine schema
Summary: engine-setup schema.sh failed to refresh engine schema
Keywords:
Status: CLOSED DUPLICATE of bug 1353219
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.0.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-4.0.2
: ---
Assignee: Ala Hino
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-26 14:38 UTC by Nathan Hill
Modified: 2017-05-11 09:24 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-07-25 12:18:20 UTC
oVirt Team: Storage
Embargoed:
tnisan: ovirt-4.0.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
Full setup log (1.20 MB, text/plain)
2016-06-26 14:38 UTC, Nathan Hill
no flags Details
snapshots found in DB (16.93 KB, text/plain)
2016-06-27 12:12 UTC, Nathan Hill
no flags Details

Description Nathan Hill 2016-06-26 14:38:23 UTC
Created attachment 1172559 [details]
Full setup log

Description of problem: Upgrade from 3.6.6.2-1 to 4.0 caused severe error during updating database schema resulting in a roll back and no ability to upgrade.


Version-Release number of selected component (if applicable): 4.0.0.6-1


How reproducible: Unknown. On this engine host, 100%


Steps to Reproduce:
1. Follow upgrade procedure listed http://www.ovirt.org/release/4.0.0/
2. Run engine-setup selecting local DWH store, automatic provisioning, basic sampling scale.


Actual results: Engine Schema Refresh fails with:
[ ERROR ] schema.sh: FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql
[ ERROR ] Failed to execute stage 'Misc configuration': Engine schema refresh failed

Expected results: Upgrade should complete. 

Additional info: I have another ovirt-engine installation with exact hardware and patch level that completed without issue. 

Additional steps tried to resolve:
-Browsing through the script I gathered that this might have to do with abnormally named snapshots. So I deleted all snapshots and reattempted. No success.
-Deleted all templates and their snapshots, same error.
-I attempted a "hard" upgrade using the migration procedure listed here:
http://www.ovirt.org/documentation/migration-engine-3.6-to-4.0/ - Same error.

I believe there may be something about the fact that my database doesn't have a standard name.

Comment 1 Nathan Hill 2016-06-26 14:40:03 UTC
Log area of interest:

Running upgrade sql script '/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql'...

2016-06-26 10:19:41 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema plugin.execute:926 execute-output: ['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p', '5432', '-u', 'engine_20140704223058', '-d', 'engine_20140704223058', '-l', '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160626101653-bv1zn6.log', '-c', 'apply'] stderr:
psql:/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql:93: ERROR:  insert or update on table "image_storage_domain_map" violates foreign key constraint "fk_image_storage_domain_map_storage_domain_static"
DETAIL:  Key (storage_domain_id)=(a96b9a1a-4dce-4de5-b70b-57111027ee84) is not present in table "storage_domain_static".
FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql

2016-06-26 10:19:41 ERROR otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema schema._misc:313 schema.sh: FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql
2016-06-26 10:19:41 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/schema.py", line 315, in _misc
    raise RuntimeError(_('Engine schema refresh failed'))
RuntimeError: Engine schema refresh failed

Comment 2 Nathan Hill 2016-06-26 14:42:36 UTC
Also, note that a reversion back to 3.6.6.2-1 using engine-backup with --provision-db then running engine-setup is 100% successful, the error does not occur on this version.

Comment 3 Nathan Hill 2016-06-26 15:23:25 UTC
More troubleshooting; The erroneous storage domain UUID a96b9a1a-4dce-4de5-b70b-57111027ee84 is not listed anywhere in the two detail tables...

engine_20140704223058=# select id,storage_name,storage_domain_type from storage_domain_static;
                  id                  |      storage_name      | storage_domain_type
--------------------------------------+------------------------+---------------------
 072fbaa1-08f3-4a40-9f34-a5ca22dd1d74 | ovirt-image-repository |                   4
 b8c6296b-51b8-4f53-8a6c-0796cf85a933 | NAS2_Exports           |                   3
 cc4eb137-8abb-4924-ba52-dd00b9c9ae7b | NAS2_ISO               |                   2
 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | TriDistRepl            |                   0
 b8fa24f6-d38c-4da6-ad67-d8d33a4667b1 | NAS2_Data              |                   1

engine_20140704223058=# select * from image_storage_domain_map
engine_20140704223058-# ;
               image_id               |          storage_domain_id           | quota_id |           disk_profile_id
--------------------------------------+--------------------------------------+----------+--------------------------------------
 6b67745e-9e39-4a75-b62b-5b6e16509bb6 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 349663c7-4ef9-446d-b3af-1f5db6499fd8 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 0f4ae615-6194-4500-8b36-bf70f037beee | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 73a7e0fb-6366-49b9-b27d-1f948f19ac95 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 18b676be-64dc-4474-8c01-8841caf7bcaa | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 6c5790dc-0485-48d9-872d-7ba6cd1e65f8 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 7a518bca-a2e3-4b50-a007-b2a8913a5ce4 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 ab7f18d8-2361-4fa7-a508-3b4d7a6ca746 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          |
 9e6336ca-9f11-408b-a4ed-78bfbb9713a9 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 0b2cc85a-87eb-42c8-9333-01471e9761bc | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 9d85bfea-0bcc-4725-bafe-28a9005168d6 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 7745ef9d-1369-446d-88d4-a1715c5cfc4e | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 e4d65f52-8750-42e6-a507-29a5dd031b63 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 0f5cacb2-c51d-4f5e-9fdf-194106731fca | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 7cd86e6a-5074-4aad-a99c-f797e08bff4e | b8fa24f6-d38c-4da6-ad67-d8d33a4667b1 |          | ad6f4188-c0cc-459a-b4b2-634e7df7a90a
 8a061e09-0988-42c9-9eec-b71d1c7387a5 | b8fa24f6-d38c-4da6-ad67-d8d33a4667b1 |          | ad6f4188-c0cc-459a-b4b2-634e7df7a90a
 cc3a3839-abdd-4ea6-b95d-cb69a7a8a773 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 071ef687-2697-4fab-8a08-f53f3c447916 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 2b6d2b39-6387-42d0-b385-874b6f045758 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 05e3f212-c8da-4341-82bc-72e919c341a4 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 484ffed9-f14e-4f52-afa2-e23fe4e836c1 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 220cd429-9654-4bd6-a18c-b6bb587643c7 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 a1d21a87-f98b-48d5-9848-962d31fa4381 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 2f6e7b37-909a-47aa-bb22-0b46e2e6e53a | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 26e4f4fa-4f2f-476c-8465-39efd53d3d25 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 de6d5f53-53b9-4f6b-9834-62afb7b36b28 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 82aba806-f776-4a2c-a081-c02c295bf2d1 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 851769ff-ca7e-4f15-87b9-918c82683124 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 094eab81-91a0-4d61-b6bb-e6cf97c67825 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 80680044-6cd5-4859-99df-a38c59a4ea7b | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 5b81a3ee-c2d1-42d3-b04d-78f615274246 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7
 76ae0b32-9f03-406c-9130-37ff389d52b3 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 |          | 774286ba-f33c-449c-ab59-590706bcfba7

Comment 4 Nathan Hill 2016-06-26 16:30:22 UTC
Figured it out. Call off the hounds.

It was caused by old snapshot for disks that were not present in the GUI from a previous storage domain that was deleted from migrating from GlusterFS 2x4 replication to 3x6. The disks were attached to active VM's and were migrated using Ovirt 3.5.

Using pg_dump and grep revealed the entries: 0349c76b-2385-4e5d-ba04-21f35f4e68e7 & efd689a7-9e3e-45a8-a174-c51c706736ba.

engine_20140704223058=# select snapshot_id,vm_id,description from snapshots;
             snapshot_id              |                vm_id                 |                description
--------------------------------------+--------------------------------------+-------------------------------------------
 1dda29f4-172f-4806-93e2-7ae4a97037af | 452b30ef-e677-4904-b944-b841510afc0c | Active VM
 7956ae7e-52f3-425d-b4ac-6aac005cda37 | 4eee549b-d189-40dc-8248-9ba70abe3cd5 | Active VM
 057906c6-f37e-4438-8ee2-0397d81f39f1 | 24bb86bf-a71a-4f19-8360-f278374830a6 | Active VM
 f472e9fa-5a5a-4a48-b4ab-027a244d0ea3 | 3edf4d7f-8264-4d2f-ac72-2fe7bbb169db | Active VM
 5057df6e-6039-4da4-b756-df20219e9600 | d237f4d8-3acd-46b6-889a-6184ec012e48 | Active VM
 4c1883ee-6a3b-471d-9dbd-b4969ede9ff5 | 6539942b-132d-4d26-bbeb-d4bbbe7dbfe6 | Active VM
 2546787c-1ff7-4acf-b87d-e27a5fe3155d | 7fccd835-1555-4c3c-9ca9-140f2eeb9d28 | Active VM
 945cb4de-fc4a-4dcd-abfd-35d3127a2a78 | a2948b53-9b3a-470e-badc-b15365da96cd | Active VM
 30b8e7ce-ebde-438c-9eb6-5628ac1af99c | a8ae58a9-fa25-4965-8393-2e5efed1e120 | Active VM
 38845e17-36e7-4c4d-9149-8ef9c269625d | b8ce9267-b2de-4610-8984-fc3ba0506b9d | Active VM
 014ff326-5a13-4eeb-8412-42fe22380615 | b8ce9267-b2de-4610-8984-fc3ba0506b9d | PreExpansion
 8a7001d0-75ba-4172-aa6d-fa1fc64c88cd | b8ce9267-b2de-4610-8984-fc3ba0506b9d | Auto-generated for Live Storage Migration
 cabaf979-bdbc-4d5f-af07-84f5ba737ff8 | d4349bdc-ebda-4eb3-9b5c-d99f002ab72b | Active VM
 b7d7882d-c4e3-4c09-8bf3-45a1ff52c051 | 63a50601-40f8-48bb-983a-e57a16018cba | Active VM
 0f3082df-cc78-499e-aa1b-8da7681dcb89 | c6bc5734-a57b-401d-b902-a9364c51242c | Active VM
 ebe3f4a4-543e-4780-bc16-d156c0667f2c | d314b0d7-5831-4194-b5de-b838bfcae644 | Active VM
 6204a972-9960-43d5-9657-bde9c836df42 | dcb016c9-b3df-4b0c-bb30-7bd4187c7c98 | Active VM
 f72ecfbb-92ab-47f4-967e-ec5a49a94d99 | b7a23821-f8e5-46fc-944d-877164fec75b | Active VM
 678a8d90-f638-41ab-8ab4-0871730b915f | b0d19e70-83d4-4468-8bdf-0b622fa462ce | Active VM
 0349c76b-2385-4e5d-ba04-21f35f4e68e7 | e4597cce-99b2-4731-94be-3e0227515c77 | Pre-Upgrade
 b23b09bd-bf66-44ab-8e9f-c6dd4333d585 | e4597cce-99b2-4731-94be-3e0227515c77 | Pre-upgrade no memory
 efd689a7-9e3e-45a8-a174-c51c706736ba | e4597cce-99b2-4731-94be-3e0227515c77 | Preupgrade post tool
 f2085569-2797-4767-8194-c5d94fd1bacf | e4597cce-99b2-4731-94be-3e0227515c77 | Active VM
 3b738e5a-3efe-4200-afa9-4e7d494965ea | e4597cce-99b2-4731-94be-3e0227515c77 | Complete-Hardened
 33f37529-3ea9-4c66-8da4-6fd28c205315 | 18c5493d-c615-4134-a362-89f51aaa8c0e | Active VM
 fff9dc13-e8ba-4d52-bd6b-10ddb046db20 | d798cd2e-b645-4016-8b2a-8273ff3115e8 | Active VM
 c9874262-e93d-4b4f-9d30-d5012c4bd933 | d798cd2e-b645-4016-8b2a-8273ff3115e8 | Auto-generated for Live Storage Migration
 4454c315-7d28-4d0e-8166-0bc4d3f5c8a3 | 707bb80d-5b7a-4e5d-8e9a-9ecca658414c | Active VM
 6412e0b3-c7cc-49c1-9a14-fda4612cc32f | 81519461-d3ca-4803-b980-7d36f8ccf86c | Active VM
 93fecad9-a4da-4b5c-8403-a10b3845c105 | f3d4daa3-92b1-46aa-9aab-dc1f948795c5 | Active VM
 e0d2ac4e-3481-46e7-9b8a-0e0ab37299c8 | c6bc5734-a57b-401d-b902-a9364c51242c | Pre-Midterm

Then used the DELETE FROM command to manually remove them as they were not present in the GUI.

Then attempted reinstall with success.

In theory, the old snaphots likely should have had their domain uuid set upon migration to prevent this. But it hasn't been an issue until 4.0.

Hope this helps someone.

Comment 5 Yedidyah Bar David 2016-06-27 06:33:03 UTC
(In reply to Nathan Hill from comment #4)
> Figured it out. Call off the hounds.
> 
> It was caused by old snapshot for disks that were not present in the GUI
> from a previous storage domain that was deleted from migrating from
> GlusterFS 2x4 replication to 3x6. The disks were attached to active VM's and
> were migrated using Ovirt 3.5.

Not sure I completely follow:
1. You had some storage domain
2. You had some disks on it, and some of them had snapshots
3. You migrated the VMs and their disks to another storage domain
4. You removed the old storage domain
5. And yet, some snapshots were not removed?

I realize this was quite some time ago, but any chance to find logs and/or recall how this happened? Did it require some manual changes in the db? If you only did everything from the gui, and all succeeded, it definitely sounds like a bug to me.

Thanks for the report and the accurate and detailed analysis!

Comment 6 Nathan Hill 2016-06-27 12:11:55 UTC
I'll give some more detail. 

In 3.6.0, oVirt started requiring replication-3 for glusterFS storage. I had an existing replication-2 volume with the UUID a96b9a1a-4dce-4de5-b70b-57111027ee84.

I created a new replication-3 volume and migrated the VM's using the "Move Disk" utility in the oVirt web interface, some of the machines had snapshots. Many of which have since been deleted, merged, etc.

After it was confirmed stable, I deleted the old storage domain and removed the gluster volume.

Over the course of time, upgrades from 3.6.0 to 3.6.6-2 had no engine schema refresh problems. The 4.0 update revealed the error mentioned in comment 1.

My inspection of the database revealed that there were two snapshots with UUID's 0349c76b-2385-4e5d-ba04-21f35f4e68e7 & efd689a7-9e3e-45a8-a174-c51c706736ba that had the old storage domain of a96b9a1a-4dce-4de5-b70b-57111027ee84. I have attached those entries from a dump. They are very old 2014-11-17, and were not present in the GUI. If you went to Storage -> Storage Domains -> Snapshots; they were not listed, however, if you used psql, I was able to find them still listed in the database in comment 4.

They required manual intervention to remove, ergo using the DELETE FROM psql commands. The UUID was also present in GlusterFS storage, but I decided to leave alone as it wasn't taking much storage. 

I agree on the part that it sounds like a bug, I'm just not sure it's the convert_memory_snapshots_to_disks script or if that script should ignore the source storage domain, or if the issue is in the "Move Disk" utility not restamping UUID's upon transfer.

Comment 7 Nathan Hill 2016-06-27 12:12:21 UTC
Created attachment 1172866 [details]
snapshots found in DB

Comment 8 Yedidyah Bar David 2016-06-27 12:43:45 UTC
(In reply to Nathan Hill from comment #6)
> I agree on the part that it sounds like a bug, I'm just not sure it's the
> convert_memory_snapshots_to_disks script

IMO the bug is not there, if any.

> or if that script should ignore the
> source storage domain, or if the issue is in the "Move Disk" utility not
> restamping UUID's upon transfer.

There. Or more generally, that you have snapshots in deleted storage domains.

I see in the attachment both creation_date 2014-11-17 and also _create_date 2015-11-18, which might have been when you moved them, thus quite recently.
Any chance you still have engine logs from that time frame?

Comment 9 Michal Skrivanek 2016-07-01 06:37:10 UTC
problem seems to be related to "Move Disk" functionality.

Comment 10 Red Hat Bugzilla Rules Engine 2016-07-04 09:31:30 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 11 Nathan Hill 2016-07-04 13:50:03 UTC
Hey everyone,

See this is changing hands. I checked my backups thoroughly to give the best shot for a engine.log that meets the timeframe. Unfortunately they've been rolled up since then and no luck.

I'd set up a test environment on 3.5 release on a gluster replica 2 volume then upgrade to 3.6 in an attempt to recreate the problem in the first place, I do not have the resources however to attempt this feat myself.

-Nathan

Comment 12 Allon Mureinik 2016-07-25 12:18:20 UTC

*** This bug has been marked as a duplicate of bug 1353219 ***


Note You need to log in before you can comment on or make changes to this bug.