Bug 1399191 - VM creation from template on ceph/cinder backed storage fails.
Summary: VM creation from template on ceph/cinder backed storage fails.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.0.5.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.2.0
: 4.2.0.2
Assignee: Daniel Erez
QA Contact: Avihai
URL:
Whiteboard:
: 1421293 (view as bug list)
Depends On: 1399540
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-28 14:16 UTC by Jordan Conway
Modified: 2018-02-22 09:58 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1399540 (view as bug list)
Environment:
Last Closed: 2018-02-22 09:58:42 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
enginelog (113.21 KB, text/plain)
2016-11-28 17:14 UTC, Jordan Conway
no flags Details
original error engine log (107.00 KB, text/plain)
2016-11-28 17:32 UTC, Jordan Conway
no flags Details
lines previous to provided logs in ovirt-engine.log.origin (29.90 KB, text/plain)
2016-11-28 17:48 UTC, Jordan Conway
no flags Details
engine (136.90 KB, text/plain)
2016-11-28 18:23 UTC, Jordan Conway
no flags Details

Description Jordan Conway 2016-11-28 14:16:22 UTC
I've got a problem where since upgrading to 4.0.5 I am still unable to create a vm from a template.
The issue now is that it seems to be stuck in a loop trying to and failing to create the vm, which is making my logs explode with the following (which is still going, I've tried taskcleaner.sh and deleting from the db and it keeps filling up)


2016-11-24 14:35:55,466 ERROR [org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand] (DefaultQuartzScheduler1) [216b9d17] Ending command 'org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand' with failure.
2016-11-24 14:35:55,466 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler1) [216b9d17] transaction rolled back
2016-11-24 14:35:55,468 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [216b9d17] Failed invoking callback end method 'onFailed' for command '9b6f4dc2-3331-4838-bad6-d607b4123b0d' with exception 'Index: 0, Size: 0', the callback is marked for end method retries
2016-11-24 14:35:55,473 ERROR [org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand] (DefaultQuartzScheduler1) [1c709434] Ending command 'org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand' with failure.
2016-11-24 14:35:55,473 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler1) [1c709434] transaction rolled back
2016-11-24 14:35:55,475 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [1c709434] Failed invoking callback end method 'onFailed' for command '145b5c75-a9e1-47a6-94c4-1f6fa81f4bb3' with exception 'Index: 0, Size: 0', the callback is marked for end method retries

And in the engine UI "Failed to complete VM fromtemplate-1 creation." thousands of times.

The issue started I believe with this sql error

ERROR:  duplicate key value violates unique constraint "uq_command_id_entity_id"
DETAIL:  Key (command_id, entity_id, entity_type, snapshot_type)=(43466eb8-cb63-4b3d-94be-a674a9946771, 6e8bc30f-7742-46c3-9495-7def0c47ad05, org.ovirt.engine.core.common.businessentities.VmDynamic, 2) already exists.
CONTEXT:  SQL statement "INSERT INTO business_entity_snapshot (
                id,
                command_id,
                command_type,
                entity_id,
                entity_type,
                entity_snapshot,
                snapshot_class,
                snapshot_type,
                insertion_order
                )
            VALUES (
                v_id,
                v_command_id,
                v_command_type,
                v_entity_id,
                v_entity_type,
                v_entity_snapshot,
                v_snapshot_class,
                v_snapshot_type,
                v_insertion_order
                )"
    PL/pgSQL function insert_entity_snapshot(uuid,uuid,character varying,character varying,character varying,text,character varying,integer,integer) line 4 at SQL statement
STATEMENT:  select * from insert_entity_snapshot($1, $2, $3, $4, $5, $6, $7, $8, $9) as result


Any insight on how I can stop the persistent error would be appreciated, as it is also preventing me from running engine-setup due to running jobs.

Comment 1 Liron Aravot 2016-11-28 16:59:59 UTC
Hi Jordan,
can you please attach the engine.log file? we need a bit more of logs to determine the issue.

thanks.

Comment 2 Jordan Conway 2016-11-28 17:14:24 UTC
Created attachment 1225395 [details]
enginelog

Here's the last 500 lines of my engine log, I've got several gigs just like this.
I am unable to get meaningful information out of it due to this.

Comment 3 Liron Aravot 2016-11-28 17:24:43 UTC
Thanks,
I'd appreciate if you could add more logs from the time the issue has started (the sql exception you quoted).

In order to stop the persistent error you can clear the command_entities table.
I'd suggest to do it only after you've made a db backup when there are no running operations and the engine is down.

thanks,
Liron

Comment 4 Jordan Conway 2016-11-28 17:32:21 UTC
Created attachment 1225398 [details]
original error engine log

Found the original bit where the error was along with the next 5 minutes or so of logs. I hope this is more helpful.

Comment 5 Liron Aravot 2016-11-28 17:39:07 UTC
Thanks Jordan.
I'd appreciate if you could upload the logs starting at 2016-11-23 18:30 - the beginning of the execution is missing.

thanks,
Liron

Comment 6 Jordan Conway 2016-11-28 17:48:16 UTC
Created attachment 1225414 [details]
lines previous to provided logs in ovirt-engine.log.origin

Hi Liron,
This is what precedes those lines.

Thank you for the tip, I had previously tried clearing command_entities but with ovirt-engine running it kept repopulating, this did the trick!

Thanks,
Jordan

Comment 7 Jordan Conway 2016-11-28 17:50:52 UTC
Now that I've got quiet logs again, I can restart the create template, create vm from template process to get a fresh look at it to see how reproducible it is if you would like.

Comment 8 Liron Aravot 2016-11-28 18:06:51 UTC
Thanks Jordan,
I really appreciate it, but right now it's not needed - will update you if it will be.
The log in comment #6 is between 18:30 and 18:32, While the execution started in around 18:37 - so we have a gap.
I will be helpful if you could paste the entire log between 2016-11-23 18:30 and 2016-11-23 18:45.

thanks,
Liron.

Comment 9 Jordan Conway 2016-11-28 18:23:22 UTC
Created attachment 1225451 [details]
engine

There is nothing before the 18:31 entry until unrelated things at 18:12, attached is everything between 18:30 and 18:45

Comment 10 Liron Aravot 2016-11-29 09:22:37 UTC
Thanks Jordan,
as it seems we have few issues here:

1. flow related issue - the context passed to the CloneSingleCinderDisk commands - the compensation context has the parent commmand id leading to the FK violation. The command should be executed with the compensation context of its own.

2. infrastructural issue - the AddVmFromTemplate command isn't marked as one requires compensation context, yet one is still created (that's handled by the infrastructure).

3. AddVmFromTemplate should be using a compensation context (so it should be marked as command using it).

I'm cloning this bug to a new bug to tackle to infrastructural issues, this flow should tackle the flow related issues.

Comment 11 Tal Nisan 2016-11-29 10:13:34 UTC
Thanks Liron, Daniel please take it from here

Comment 13 Sandro Bonazzola 2017-01-25 07:57:22 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 14 Liron Aravot 2017-02-16 15:36:33 UTC
*** Bug 1421293 has been marked as a duplicate of this bug. ***

Comment 15 Daniel Erez 2017-06-04 14:45:34 UTC
VM creation from template with Cinder disks works as expected on latest build (due to recent infrastructural CoCo changes). Moving to MODIFIED for verification.

Comment 16 Avihai 2018-02-12 13:50:40 UTC
verified on 4.2.1.6-0.1.el7

Comment 17 Sandro Bonazzola 2018-02-22 09:58:42 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.