Bug 1399540 - Default compensation context is created even when the command isn't marked as using compensation
Summary: Default compensation context is created even when the command isn't marked as...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.0.5.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Martin Perina
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks: 1399191
TreeView+ depends on / blocked
 
Reported: 2016-11-29 09:38 UTC by Liron Aravot
Modified: 2022-06-30 08:39 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1399191
Environment:
Last Closed: 2017-02-15 11:59:47 UTC
oVirt Team: Infra
Embargoed:
sbonazzo: ovirt-4.1-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-46813 0 None None None 2022-06-30 08:39:38 UTC

Description Liron Aravot 2016-11-29 09:38:12 UTC
CommandBase.createCompensationContext() will always return the default compensation context when the transaction scope isn't suppress, possibly leading to errors as in the original bug.

+++ This bug was initially created as a clone of Bug #1399191 +++

I've got a problem where since upgrading to 4.0.5 I am still unable to create a vm from a template.
The issue now is that it seems to be stuck in a loop trying to and failing to create the vm, which is making my logs explode with the following (which is still going, I've tried taskcleaner.sh and deleting from the db and it keeps filling up)


2016-11-24 14:35:55,466 ERROR [org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand] (DefaultQuartzScheduler1) [216b9d17] Ending command 'org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand' with failure.
2016-11-24 14:35:55,466 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler1) [216b9d17] transaction rolled back
2016-11-24 14:35:55,468 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [216b9d17] Failed invoking callback end method 'onFailed' for command '9b6f4dc2-3331-4838-bad6-d607b4123b0d' with exception 'Index: 0, Size: 0', the callback is marked for end method retries
2016-11-24 14:35:55,473 ERROR [org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand] (DefaultQuartzScheduler1) [1c709434] Ending command 'org.ovirt.engine.core.bll.storage.disk.cinder.RemoveCinderDiskCommand' with failure.
2016-11-24 14:35:55,473 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler1) [1c709434] transaction rolled back
2016-11-24 14:35:55,475 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [1c709434] Failed invoking callback end method 'onFailed' for command '145b5c75-a9e1-47a6-94c4-1f6fa81f4bb3' with exception 'Index: 0, Size: 0', the callback is marked for end method retries

And in the engine UI "Failed to complete VM fromtemplate-1 creation." thousands of times.

The issue started I believe with this sql error

ERROR:  duplicate key value violates unique constraint "uq_command_id_entity_id"
DETAIL:  Key (command_id, entity_id, entity_type, snapshot_type)=(43466eb8-cb63-4b3d-94be-a674a9946771, 6e8bc30f-7742-46c3-9495-7def0c47ad05, org.ovirt.engine.core.common.businessentities.VmDynamic, 2) already exists.
CONTEXT:  SQL statement "INSERT INTO business_entity_snapshot (
                id,
                command_id,
                command_type,
                entity_id,
                entity_type,
                entity_snapshot,
                snapshot_class,
                snapshot_type,
                insertion_order
                )
            VALUES (
                v_id,
                v_command_id,
                v_command_type,
                v_entity_id,
                v_entity_type,
                v_entity_snapshot,
                v_snapshot_class,
                v_snapshot_type,
                v_insertion_order
                )"
    PL/pgSQL function insert_entity_snapshot(uuid,uuid,character varying,character varying,character varying,text,character varying,integer,integer) line 4 at SQL statement
STATEMENT:  select * from insert_entity_snapshot($1, $2, $3, $4, $5, $6, $7, $8, $9) as result


Any insight on how I can stop the persistent error would be appreciated, as it is also preventing me from running engine-setup due to running jobs.

--- Additional comment from Liron Aravot on 2016-11-28 11:59:59 EST ---

Hi Jordan,
can you please attach the engine.log file? we need a bit more of logs to determine the issue.

thanks.

--- Additional comment from Jordan Conway on 2016-11-28 12:14 EST ---

Here's the last 500 lines of my engine log, I've got several gigs just like this.
I am unable to get meaningful information out of it due to this.

--- Additional comment from Liron Aravot on 2016-11-28 12:24:43 EST ---

Thanks,
I'd appreciate if you could add more logs from the time the issue has started (the sql exception you quoted).

In order to stop the persistent error you can clear the command_entities table.
I'd suggest to do it only after you've made a db backup when there are no running operations and the engine is down.

thanks,
Liron

--- Additional comment from Jordan Conway on 2016-11-28 12:32 EST ---

Found the original bit where the error was along with the next 5 minutes or so of logs. I hope this is more helpful.

--- Additional comment from Liron Aravot on 2016-11-28 12:39:07 EST ---

Thanks Jordan.
I'd appreciate if you could upload the logs starting at 2016-11-23 18:30 - the beginning of the execution is missing.

thanks,
Liron

--- Additional comment from Jordan Conway on 2016-11-28 12:48 EST ---

Hi Liron,
This is what precedes those lines.

Thank you for the tip, I had previously tried clearing command_entities but with ovirt-engine running it kept repopulating, this did the trick!

Thanks,
Jordan

--- Additional comment from Jordan Conway on 2016-11-28 12:50:52 EST ---

Now that I've got quiet logs again, I can restart the create template, create vm from template process to get a fresh look at it to see how reproducible it is if you would like.

--- Additional comment from Liron Aravot on 2016-11-28 13:06:51 EST ---

Thanks Jordan,
I really appreciate it, but right now it's not needed - will update you if it will be.
The log in comment #6 is between 18:30 and 18:32, While the execution started in around 18:37 - so we have a gap.
I will be helpful if you could paste the entire log between 2016-11-23 18:30 and 2016-11-23 18:45.

thanks,
Liron.

--- Additional comment from Jordan Conway on 2016-11-28 13:23 EST ---

There is nothing before the 18:31 entry until unrelated things at 18:12, attached is everything between 18:30 and 18:45

--- Additional comment from Liron Aravot on 2016-11-29 04:22:37 EST ---

Thanks Jordan,
as it seems we have few issues here:

1. flow related issue - the context passed to the CloneSingleCinderDisk commands - the compensation context has the parent commmand id leading to the FK violation. The command should be executed with the compensation context of its own.

2. infrastructural issue - the AddVmFromTemplate command isn't marked as one requires compensation context, yet one is still created (that's handled by the infrastructure).

3. AddVmFromTemplate should be using a compensation context (so it should be marked as command using it).

I'm cloning this bug to a new bug to tackle to infrastructural issues, this flow should tackle the flow related issues.

Comment 1 Martin Perina 2016-12-22 20:54:30 UTC
> 2. infrastructural issue - the AddVmFromTemplate command isn't marked as one
> requires compensation context, yet one is still created (that's handled by
> the infrastructure).
> 
> 3. AddVmFromTemplate should be using a compensation context (so it should be
> marked as command using it).
> 
> I'm cloning this bug to a new bug to tackle to infrastructural issues, this
> flow should tackle the flow related issues.

AddVmFromTemplate is not using compensation although it's calling parent AddVmCommand, which is marked as @NonTransactiveCommandAttribute and which is using compensations properly.
Arik, could you please take a look?

Comment 2 Arik 2017-01-24 10:20:58 UTC
(In reply to Martin Perina from comment #1)
> AddVmFromTemplate is not using compensation although it's calling parent
> AddVmCommand, which is marked as @NonTransactiveCommandAttribute and which
> is using compensations properly.
> Arik, could you please take a look?

Indeed, seems like a bug.

Comment 3 Liron Aravot 2017-01-24 10:37:02 UTC
Hi Martin/Arik - I've left the original bug to tackle the flow issue.
This one is about the infrastructural issue - the current code of CommandBase.createCompensationContext() is attached ([1]). You can see that if the transaction scope isn't Suppress the default compensation context is always created even if the command wasn't marked to use compensation (in that case the NoOpCompensationContext should be returned).

[1]
    private CompensationContext createCompensationContext(TransactionScopeOption transactionScopeOption,
            boolean forceCompensation) {
        if (transactionScopeOption == TransactionScopeOption.Suppress && !forceCompensation) {
            return NoOpCompensationContext.getInstance();
        }

        return createDefaultCompensationContext();
    }

Comment 4 Martin Perina 2017-01-24 11:15:02 UTC
Liron, createCompensationContext() is called from handleTransactivity() and if command is marked with @NonTransactiveCommandAttribute then we always set TranscationScope to Suppress:

    private void handleTransactivity() {
        scope =
                (getParameters() != null) ? getParameters().getTransactionScopeOption()
                        : TransactionScopeOption.Required;
        endActionScope = scope;
        boolean forceCompensation = getForceCompensation();
        // @NonTransactiveAttribute annotation overrides the scope passed by the
        // command parameters
        if (!getTransactive()) {
            scope = TransactionScopeOption.Suppress;

            // Set the end action scope to suppress only for non-compensating commands, or the end action for commands
            // will run without transaction but compensation is not supported for end action.
            endActionScope = forceCompensation ? endActionScope : scope;
        }

        if (getCompensationContext() == null) {
            context.withCompensationContext(createCompensationContext(scope, forceCompensation));
        }
    }

So the only issue here is, that AddVmFromTemplate have to be marked with @NonTransactiveCommandAttribute, right? Or am I missing something?

Comment 5 Liron Aravot 2017-01-24 11:19:28 UTC
Martin, I'll elaborate-
Let's assume we have a command X which isn't marked with any annotation, in that case DefaultCompensationContext will be created for the command instead of NoOpCompensationContext (as this command isn't marked to be using compensation - there's no annotation).

I've left the original bug (that this one was cloned from) to fix AddVmFromTemplate.

Comment 6 Martin Perina 2017-01-24 15:23:11 UTC
(In reply to Liron Aravot from comment #5)
> Martin, I'll elaborate-
> Let's assume we have a command X which isn't marked with any annotation, in
> that case DefaultCompensationContext will be created for the command instead
> of NoOpCompensationContext (as this command isn't marked to be using
> compensation - there's no annotation).

I've discussed with Moti:

1. If command is annotated @NonTransactiveCommandAttribute(forceCompensation = true) then we create default compensation context and execute compensation flow for it

2. If the command is annotated only with @NonTransactiveCommandAttribute, then we create with NoOpCompensationContext

3. TransactionScopeOption can be also defined in command parameters, so if the command is using TransactionScopeOption.Suppress, we will create it with NoCompensationScope (by default TransactionScopeOption is set to Required)


Liron, what you are saying is true, but hypothetical. Or do you have any valid use case for that?

I'm asking because at the moment we don't have any reliable verification steps or test cases that changing createCompensationContext() will not break anything in current code. So it's IMO it's better to fix problematic commands then doing change to fix hypothetical issue. So suggesting to close this one as WONTFIX.

> 
> I've left the original bug (that this one was cloned from) to fix
> AddVmFromTemplate.

Arik, please fix AddVmFromTemplace using BZ1399191

Comment 7 Moti Asayag 2017-01-24 15:33:35 UTC
(In reply to Liron Aravot from comment #5)
> Martin, I'll elaborate-
> Let's assume we have a command X which isn't marked with any annotation, in
> that case DefaultCompensationContext will be created for the command instead
> of NoOpCompensationContext (as this command isn't marked to be using
> compensation - there's no annotation).
> 
> I've left the original bug (that this one was cloned from) to fix
> AddVmFromTemplate.

That means the only case for no-op compensation context occurs when running command in suppress mode without forcing compensation.

Any other combination, i.e. required or required-new transaction will expect any sort of compensation context, which will be used *only* if the command actually took a snapshot of the compensating entities.

So for commands, it makes less sense not to declare the no-transaction annotation and to expect compensation to exist.

I wouldn't consider this a bug, since if we're dealing with a child-command, and we wish not to invoke any compensation for it, we should set it to no-op when we're invoke that child-command internally (assuming not compensation data was recorded by the command, and the only reason for it is inheriting it from the parent command).

Comment 8 Sandro Bonazzola 2017-01-25 07:56:48 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 9 Liron Aravot 2017-02-12 21:58:37 UTC
> 
> Liron, what you are saying is true, but hypothetical. Or do you have any
> valid use case for that?
> 
> I'm asking because at the moment we don't have any reliable verification
> steps or test cases that changing createCompensationContext() will not break
> anything in current code. So it's IMO it's better to fix problematic
> commands then doing change to fix hypothetical issue. So suggesting to close
> this one as WONTFIX.
> 
> > 
> > I've left the original bug (that this one was cloned from) to fix
> > AddVmFromTemplate.
> 

Thanks Martin,
Yes - there are use cases, for example bz 1418648 is caused by that issue (without getting into whether the command code should be changed).
We are in a complex situation, perhaps we can do the following -
a. add a RequireCompensationContext annotation which will enforce creation of the default compensation context.
b. use that annotation in the current flows affected by this bug.
c. change the implementation of handleTransactivity() to not create the compensation context on those cases.

that way we'll be able to know that new flows aren't relying on that behavior while we'll be able to fix/inspect the old flows one by one and to remove the annotation usage.

I do agree that it may be an overkill - but we do have such cases (intentional or not..) - we need to scope the amount of work required for that.

(In reply to Moti Asayag from comment #7
> 
> That means the only case for no-op compensation context occurs when running
> command in suppress mode without forcing compensation.
> 
> Any other combination, i.e. required or required-new transaction will expect
> any sort of compensation context, which will be used *only* if the command
> actually took a snapshot of the compensating entities.

Indeed, the problem is that many times the code is shared between compensation/non compensating commands- causing to the CompensationContext.snapshot*() calls.

> I wouldn't consider this a bug, since if we're dealing with a child-command,
> and we wish not to invoke any compensation for it, we should set it to no-op
> when we're invoke that child-command internally (assuming not compensation
> data was recorded by the command, and the only reason for it is inheriting
> it from the parent command).

I'm not really in favor of it, that means that every parent will have to remember to set the compensation context when executing a specific child command to not encounter that bug.

Comment 10 Martin Perina 2017-02-15 11:59:47 UTC
After offline discussion we agreed that changing the code as suggested above is too dangerous, because we have no reliable way how to verify that we won't break existing flows. Because the issue caused by current status is corner case and it could be avoided by more careful reviews.

Comment 11 Moti Asayag 2017-02-15 12:05:26 UTC
Due to the delicate nature of the code around the compensation and transaction handling, we (me and Liron) decided to fix bug 1418648 by changing the specific command within it and to leave the compensation framework intact.


Note You need to log in before you can comment on or make changes to this bug.