Created attachment 1172559 [details] Full setup log Description of problem: Upgrade from 3.6.6.2-1 to 4.0 caused severe error during updating database schema resulting in a roll back and no ability to upgrade. Version-Release number of selected component (if applicable): 4.0.0.6-1 How reproducible: Unknown. On this engine host, 100% Steps to Reproduce: 1. Follow upgrade procedure listed http://www.ovirt.org/release/4.0.0/ 2. Run engine-setup selecting local DWH store, automatic provisioning, basic sampling scale. Actual results: Engine Schema Refresh fails with: [ ERROR ] schema.sh: FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql [ ERROR ] Failed to execute stage 'Misc configuration': Engine schema refresh failed Expected results: Upgrade should complete. Additional info: I have another ovirt-engine installation with exact hardware and patch level that completed without issue. Additional steps tried to resolve: -Browsing through the script I gathered that this might have to do with abnormally named snapshots. So I deleted all snapshots and reattempted. No success. -Deleted all templates and their snapshots, same error. -I attempted a "hard" upgrade using the migration procedure listed here: http://www.ovirt.org/documentation/migration-engine-3.6-to-4.0/ - Same error. I believe there may be something about the fact that my database doesn't have a standard name.
Log area of interest: Running upgrade sql script '/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql'... 2016-06-26 10:19:41 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema plugin.execute:926 execute-output: ['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p', '5432', '-u', 'engine_20140704223058', '-d', 'engine_20140704223058', '-l', '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160626101653-bv1zn6.log', '-c', 'apply'] stderr: psql:/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql:93: ERROR: insert or update on table "image_storage_domain_map" violates foreign key constraint "fk_image_storage_domain_map_storage_domain_static" DETAIL: Key (storage_domain_id)=(a96b9a1a-4dce-4de5-b70b-57111027ee84) is not present in table "storage_domain_static". FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql 2016-06-26 10:19:41 ERROR otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema schema._misc:313 schema.sh: FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/upgrade/04_00_0140_convert_memory_snapshots_to_disks.sql 2016-06-26 10:19:41 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/schema.py", line 315, in _misc raise RuntimeError(_('Engine schema refresh failed')) RuntimeError: Engine schema refresh failed
Also, note that a reversion back to 3.6.6.2-1 using engine-backup with --provision-db then running engine-setup is 100% successful, the error does not occur on this version.
More troubleshooting; The erroneous storage domain UUID a96b9a1a-4dce-4de5-b70b-57111027ee84 is not listed anywhere in the two detail tables... engine_20140704223058=# select id,storage_name,storage_domain_type from storage_domain_static; id | storage_name | storage_domain_type --------------------------------------+------------------------+--------------------- 072fbaa1-08f3-4a40-9f34-a5ca22dd1d74 | ovirt-image-repository | 4 b8c6296b-51b8-4f53-8a6c-0796cf85a933 | NAS2_Exports | 3 cc4eb137-8abb-4924-ba52-dd00b9c9ae7b | NAS2_ISO | 2 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | TriDistRepl | 0 b8fa24f6-d38c-4da6-ad67-d8d33a4667b1 | NAS2_Data | 1 engine_20140704223058=# select * from image_storage_domain_map engine_20140704223058-# ; image_id | storage_domain_id | quota_id | disk_profile_id --------------------------------------+--------------------------------------+----------+-------------------------------------- 6b67745e-9e39-4a75-b62b-5b6e16509bb6 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 349663c7-4ef9-446d-b3af-1f5db6499fd8 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 0f4ae615-6194-4500-8b36-bf70f037beee | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 73a7e0fb-6366-49b9-b27d-1f948f19ac95 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 18b676be-64dc-4474-8c01-8841caf7bcaa | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 6c5790dc-0485-48d9-872d-7ba6cd1e65f8 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 7a518bca-a2e3-4b50-a007-b2a8913a5ce4 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 ab7f18d8-2361-4fa7-a508-3b4d7a6ca746 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 9e6336ca-9f11-408b-a4ed-78bfbb9713a9 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 0b2cc85a-87eb-42c8-9333-01471e9761bc | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 9d85bfea-0bcc-4725-bafe-28a9005168d6 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 7745ef9d-1369-446d-88d4-a1715c5cfc4e | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 e4d65f52-8750-42e6-a507-29a5dd031b63 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 0f5cacb2-c51d-4f5e-9fdf-194106731fca | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 7cd86e6a-5074-4aad-a99c-f797e08bff4e | b8fa24f6-d38c-4da6-ad67-d8d33a4667b1 | | ad6f4188-c0cc-459a-b4b2-634e7df7a90a 8a061e09-0988-42c9-9eec-b71d1c7387a5 | b8fa24f6-d38c-4da6-ad67-d8d33a4667b1 | | ad6f4188-c0cc-459a-b4b2-634e7df7a90a cc3a3839-abdd-4ea6-b95d-cb69a7a8a773 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 071ef687-2697-4fab-8a08-f53f3c447916 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 2b6d2b39-6387-42d0-b385-874b6f045758 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 05e3f212-c8da-4341-82bc-72e919c341a4 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 484ffed9-f14e-4f52-afa2-e23fe4e836c1 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 220cd429-9654-4bd6-a18c-b6bb587643c7 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 a1d21a87-f98b-48d5-9848-962d31fa4381 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 2f6e7b37-909a-47aa-bb22-0b46e2e6e53a | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 26e4f4fa-4f2f-476c-8465-39efd53d3d25 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 de6d5f53-53b9-4f6b-9834-62afb7b36b28 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 82aba806-f776-4a2c-a081-c02c295bf2d1 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 851769ff-ca7e-4f15-87b9-918c82683124 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 094eab81-91a0-4d61-b6bb-e6cf97c67825 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 80680044-6cd5-4859-99df-a38c59a4ea7b | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 5b81a3ee-c2d1-42d3-b04d-78f615274246 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7 76ae0b32-9f03-406c-9130-37ff389d52b3 | 5d8b9ed2-7dae-472f-90f7-904085d4dbf9 | | 774286ba-f33c-449c-ab59-590706bcfba7
Figured it out. Call off the hounds. It was caused by old snapshot for disks that were not present in the GUI from a previous storage domain that was deleted from migrating from GlusterFS 2x4 replication to 3x6. The disks were attached to active VM's and were migrated using Ovirt 3.5. Using pg_dump and grep revealed the entries: 0349c76b-2385-4e5d-ba04-21f35f4e68e7 & efd689a7-9e3e-45a8-a174-c51c706736ba. engine_20140704223058=# select snapshot_id,vm_id,description from snapshots; snapshot_id | vm_id | description --------------------------------------+--------------------------------------+------------------------------------------- 1dda29f4-172f-4806-93e2-7ae4a97037af | 452b30ef-e677-4904-b944-b841510afc0c | Active VM 7956ae7e-52f3-425d-b4ac-6aac005cda37 | 4eee549b-d189-40dc-8248-9ba70abe3cd5 | Active VM 057906c6-f37e-4438-8ee2-0397d81f39f1 | 24bb86bf-a71a-4f19-8360-f278374830a6 | Active VM f472e9fa-5a5a-4a48-b4ab-027a244d0ea3 | 3edf4d7f-8264-4d2f-ac72-2fe7bbb169db | Active VM 5057df6e-6039-4da4-b756-df20219e9600 | d237f4d8-3acd-46b6-889a-6184ec012e48 | Active VM 4c1883ee-6a3b-471d-9dbd-b4969ede9ff5 | 6539942b-132d-4d26-bbeb-d4bbbe7dbfe6 | Active VM 2546787c-1ff7-4acf-b87d-e27a5fe3155d | 7fccd835-1555-4c3c-9ca9-140f2eeb9d28 | Active VM 945cb4de-fc4a-4dcd-abfd-35d3127a2a78 | a2948b53-9b3a-470e-badc-b15365da96cd | Active VM 30b8e7ce-ebde-438c-9eb6-5628ac1af99c | a8ae58a9-fa25-4965-8393-2e5efed1e120 | Active VM 38845e17-36e7-4c4d-9149-8ef9c269625d | b8ce9267-b2de-4610-8984-fc3ba0506b9d | Active VM 014ff326-5a13-4eeb-8412-42fe22380615 | b8ce9267-b2de-4610-8984-fc3ba0506b9d | PreExpansion 8a7001d0-75ba-4172-aa6d-fa1fc64c88cd | b8ce9267-b2de-4610-8984-fc3ba0506b9d | Auto-generated for Live Storage Migration cabaf979-bdbc-4d5f-af07-84f5ba737ff8 | d4349bdc-ebda-4eb3-9b5c-d99f002ab72b | Active VM b7d7882d-c4e3-4c09-8bf3-45a1ff52c051 | 63a50601-40f8-48bb-983a-e57a16018cba | Active VM 0f3082df-cc78-499e-aa1b-8da7681dcb89 | c6bc5734-a57b-401d-b902-a9364c51242c | Active VM ebe3f4a4-543e-4780-bc16-d156c0667f2c | d314b0d7-5831-4194-b5de-b838bfcae644 | Active VM 6204a972-9960-43d5-9657-bde9c836df42 | dcb016c9-b3df-4b0c-bb30-7bd4187c7c98 | Active VM f72ecfbb-92ab-47f4-967e-ec5a49a94d99 | b7a23821-f8e5-46fc-944d-877164fec75b | Active VM 678a8d90-f638-41ab-8ab4-0871730b915f | b0d19e70-83d4-4468-8bdf-0b622fa462ce | Active VM 0349c76b-2385-4e5d-ba04-21f35f4e68e7 | e4597cce-99b2-4731-94be-3e0227515c77 | Pre-Upgrade b23b09bd-bf66-44ab-8e9f-c6dd4333d585 | e4597cce-99b2-4731-94be-3e0227515c77 | Pre-upgrade no memory efd689a7-9e3e-45a8-a174-c51c706736ba | e4597cce-99b2-4731-94be-3e0227515c77 | Preupgrade post tool f2085569-2797-4767-8194-c5d94fd1bacf | e4597cce-99b2-4731-94be-3e0227515c77 | Active VM 3b738e5a-3efe-4200-afa9-4e7d494965ea | e4597cce-99b2-4731-94be-3e0227515c77 | Complete-Hardened 33f37529-3ea9-4c66-8da4-6fd28c205315 | 18c5493d-c615-4134-a362-89f51aaa8c0e | Active VM fff9dc13-e8ba-4d52-bd6b-10ddb046db20 | d798cd2e-b645-4016-8b2a-8273ff3115e8 | Active VM c9874262-e93d-4b4f-9d30-d5012c4bd933 | d798cd2e-b645-4016-8b2a-8273ff3115e8 | Auto-generated for Live Storage Migration 4454c315-7d28-4d0e-8166-0bc4d3f5c8a3 | 707bb80d-5b7a-4e5d-8e9a-9ecca658414c | Active VM 6412e0b3-c7cc-49c1-9a14-fda4612cc32f | 81519461-d3ca-4803-b980-7d36f8ccf86c | Active VM 93fecad9-a4da-4b5c-8403-a10b3845c105 | f3d4daa3-92b1-46aa-9aab-dc1f948795c5 | Active VM e0d2ac4e-3481-46e7-9b8a-0e0ab37299c8 | c6bc5734-a57b-401d-b902-a9364c51242c | Pre-Midterm Then used the DELETE FROM command to manually remove them as they were not present in the GUI. Then attempted reinstall with success. In theory, the old snaphots likely should have had their domain uuid set upon migration to prevent this. But it hasn't been an issue until 4.0. Hope this helps someone.
(In reply to Nathan Hill from comment #4) > Figured it out. Call off the hounds. > > It was caused by old snapshot for disks that were not present in the GUI > from a previous storage domain that was deleted from migrating from > GlusterFS 2x4 replication to 3x6. The disks were attached to active VM's and > were migrated using Ovirt 3.5. Not sure I completely follow: 1. You had some storage domain 2. You had some disks on it, and some of them had snapshots 3. You migrated the VMs and their disks to another storage domain 4. You removed the old storage domain 5. And yet, some snapshots were not removed? I realize this was quite some time ago, but any chance to find logs and/or recall how this happened? Did it require some manual changes in the db? If you only did everything from the gui, and all succeeded, it definitely sounds like a bug to me. Thanks for the report and the accurate and detailed analysis!
I'll give some more detail. In 3.6.0, oVirt started requiring replication-3 for glusterFS storage. I had an existing replication-2 volume with the UUID a96b9a1a-4dce-4de5-b70b-57111027ee84. I created a new replication-3 volume and migrated the VM's using the "Move Disk" utility in the oVirt web interface, some of the machines had snapshots. Many of which have since been deleted, merged, etc. After it was confirmed stable, I deleted the old storage domain and removed the gluster volume. Over the course of time, upgrades from 3.6.0 to 3.6.6-2 had no engine schema refresh problems. The 4.0 update revealed the error mentioned in comment 1. My inspection of the database revealed that there were two snapshots with UUID's 0349c76b-2385-4e5d-ba04-21f35f4e68e7 & efd689a7-9e3e-45a8-a174-c51c706736ba that had the old storage domain of a96b9a1a-4dce-4de5-b70b-57111027ee84. I have attached those entries from a dump. They are very old 2014-11-17, and were not present in the GUI. If you went to Storage -> Storage Domains -> Snapshots; they were not listed, however, if you used psql, I was able to find them still listed in the database in comment 4. They required manual intervention to remove, ergo using the DELETE FROM psql commands. The UUID was also present in GlusterFS storage, but I decided to leave alone as it wasn't taking much storage. I agree on the part that it sounds like a bug, I'm just not sure it's the convert_memory_snapshots_to_disks script or if that script should ignore the source storage domain, or if the issue is in the "Move Disk" utility not restamping UUID's upon transfer.
Created attachment 1172866 [details] snapshots found in DB
(In reply to Nathan Hill from comment #6) > I agree on the part that it sounds like a bug, I'm just not sure it's the > convert_memory_snapshots_to_disks script IMO the bug is not there, if any. > or if that script should ignore the > source storage domain, or if the issue is in the "Move Disk" utility not > restamping UUID's upon transfer. There. Or more generally, that you have snapshots in deleted storage domains. I see in the attachment both creation_date 2014-11-17 and also _create_date 2015-11-18, which might have been when you moved them, thus quite recently. Any chance you still have engine logs from that time frame?
problem seems to be related to "Move Disk" functionality.
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Hey everyone, See this is changing hands. I checked my backups thoroughly to give the best shot for a engine.log that meets the timeframe. Unfortunately they've been rolled up since then and no luck. I'd set up a test environment on 3.5 release on a gluster replica 2 volume then upgrade to 3.6 in an attempt to recreate the problem in the first place, I do not have the resources however to attempt this feat myself. -Nathan
*** This bug has been marked as a duplicate of bug 1353219 ***