Bug 1567509 - [Infra] Error writing to data store- node was deleted by another transaction
Summary: [Infra] Error writing to data store- node was deleted by another transaction
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 14.0 (Rocky)
Assignee: Michael Vorburger
QA Contact: Noam Manos
URL:
Whiteboard: Infra
: 1567508 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-14 14:29 UTC by Sai Sindhur Malleni
Modified: 2019-03-06 16:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-06 16:15:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenDaylight Bug NETVIRT-1215 None None None 2018-04-16 08:52:50 UTC
OpenDaylight gerrit 70968 None None None 2018-04-25 14:04:20 UTC
OpenDaylight gerrit 71216 None None None 2018-04-25 14:04:47 UTC
OpenDaylight gerrit 71399 None None None 2018-04-26 10:03:48 UTC
OpenDaylight gerrit 71400 None None None 2018-04-26 10:03:18 UTC

Description Sai Sindhur Malleni 2018-04-14 14:29:39 UTC
Description of problem:
When running a Browbeat+Rally use case in my downstream scale CI which does:
Create a network
Create a sbunet
Create a router
Attach router to subnet and oublic network
Boot VM with floating IP
Ping VM

with concurrency 8 and times set to 50 we see that some VMs remain unpingable even after 300 seconds and we see a lot of the following exceptions in karaf logs


2018-04-13T23:40:08,068 | ERROR | ForkJoinPool-1-worker-27 | ElanUtils                        | 351 - org.opendaylight.netvirt.elanmanager-impl - 0.6.0.redhat-5 | Error writing to datastore {}
java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was deleted by other transaction.]]}
        at org.opendaylight.mdsal.common.api.MappingCheckedFuture.wrapInExecutionException(MappingCheckedFuture.java:66) [296:org.opendaylight.mdsal.common-api:2.4.0.redhat-5]
        at org.opendaylight.mdsal.common.api.MappingCheckedFuture.get(MappingCheckedFuture.java:79) [296:org.opendaylight.mdsal.common-api:2.4.0.redhat-5]
        at org.opendaylight.netvirt.elan.utils.ElanUtils.waitForTransactionToComplete(ElanUtils.java:1381) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5]
        at org.opendaylight.netvirt.elan.internal.ElanInterfaceManager.removeElanInterface(ElanInterfaceManager.java:275) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5]
        at org.opendaylight.netvirt.elan.internal.InterfaceRemoveWorkerOnElan.call(InterfaceRemoveWorkerOnElan.java:52) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5]
        at org.opendaylight.netvirt.elan.internal.InterfaceRemoveWorkerOnElan.call(InterfaceRemoveWorkerOnElan.java:21) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5]
        at org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:416) [281:org.opendaylight.infrautils.jobcoordinator-impl:1.3.0.redhat-5]
        at org.opendaylight.infrautils.utils.concurrent.LoggingUncaughtThreadDeathContextRunnable.run(LoggingUncaughtThreadDeathContextRunnable.java:60) [287:org.opendaylight.infrautils.util:1.3.0.redhat-5]
        at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:?]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:?]
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:?]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:?]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:?]
Caused by: org.opendaylight.controller.md.sal.common.api.data.OptimisticLockFailedException: Optimistic lock failed.
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:740) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:778) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:725) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:808) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:84) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleReadyLocalTransaction(ShardCommitCoordinator.java:200) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.Shard.handleReadyLocalTransaction(Shard.java:731) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:333) ~[?:?]
        at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270) ~[?:?]
        at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44) ~[?:?]
        at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:274) ~[?:?]
        at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104) ~[?:?]
        at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:545) ~[?:?]
        at akka.actor.Actor.aroundReceive(Actor.scala:514) ~[?:?]
        at akka.actor.Actor.aroundReceive$(Actor.scala:512) ~[?:?]
 at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:272) ~[?:?]
        at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:663) ~[?:?]
        at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:183) ~[?:?]
        at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:182) ~[?:?]
        at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:272) ~[?:?]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527) ~[?:?]
        at akka.actor.ActorCell.invoke(ActorCell.scala:496) ~[?:?]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) ~[?:?]
        at akka.dispatch.Mailbox.run(Mailbox.scala:224) ~[?:?]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ~[?:?]
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was deleted by other transaction.
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:287) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.StructuralContainerModificationStrategy.checkApplicable(StructuralContainerModificationStrategy.java:101) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.checkApplicable(RootModificationApplyOperation.java:72) ~[?:?]
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractDataTreeTip.validate(AbstractDataTreeTip.java:35) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:732) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:778) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:725) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:808) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:84) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleReadyLocalTransaction(ShardCommitCoordinator.java:200) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.Shard.handleReadyLocalTransaction(Shard.java:731) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:333) ~[?:?]
        at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270) ~[?:?]
        at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44) ~[?:?]
        at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:274) ~[?:?]
        at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104) ~[?:?]
        at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:545) ~[?:?]
        at akka.actor.Actor.aroundReceive(Actor.scala:514) ~[?:?]
        at akka.actor.Actor.aroundReceive$(Actor.scala:512) ~[?:?]
        at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:272) ~[?:?]
        at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:663) ~[?:?]
        at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:183) ~[?:?]
        at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:182) ~[?:?]
        at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:272) ~[?:?]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527) ~[?:?]
        at akka.actor.ActorCell.invoke(ActorCell.scala:496) ~[?:?]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) ~[?:?]
        at akka.dispatch.Mailbox.run(Mailbox.scala:224) ~[?:?]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) ~[?:?]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ~[?:?]


Version-Release number of selected component (if applicable):
OSP 13
opendaylight-8.0.0-5.el7ost.noarch


How reproducible: 100%

Steps to Reproduce:
1. Install OSp13 + ODL
2. Run scale usecase mentioned above
3.

Actual results:
Exceptions in karaf logs and some vms remain unpingable

Expected results:
No exceptions and VMs should be pingable

Additional info:

Comment 1 Mike Kolesnik 2018-04-16 06:22:12 UTC
*** Bug 1567508 has been marked as a duplicate of this bug. ***

Comment 3 Michael Vorburger 2018-04-16 08:52:16 UTC
> make sure to clean the logs from the stack trace since this exception is expected

where was this determined, and how will that actually make the VM pingable?

Let's have a closer look in https://jira.opendaylight.org/browse/NETVIRT-1215 ...

Comment 4 Michael Vorburger 2018-04-16 09:24:49 UTC
Sai, as far as I can tell from looking at the netvrit code upstream (which I'm not an expert on), there should already be retries in the code, so this is indeed a "false" error log.. I've just raised two changes which should suppress this, but it won't actually "fix" anything - so if "we see that some VMs remain unpingable", that's actually another real issue probably (do you see any other errors in the log when this happens?) - would you like to open a new BZ about that for someone else, so that we can close this one when we'll have the logging cleaned up?

Comment 6 Mike Kolesnik 2018-04-23 10:06:40 UTC
AFAIK when an optimistic lock fails it is an expected behavior since that's how optimistic locking works.

Let me know if that's not the case..

Comment 7 Michael Vorburger 2018-04-23 11:19:50 UTC
> when an optimistic lock fails it is an expected behavior
> since that's how optimistic locking works.
> Let me know if that's not the case.

Well, no, it's not the case.  It's really a little bit more complicated:

IFF we have code for handling optimistic lock failures in ODL e.g. via retries, then we shold not show ERROR logs.   This is often done wrong in code I see and review.

But if we do not, then such ERROR logs do indicate the writes failed.  So to just blindly demand "to clean the logs from the stack trace since this exception is expected" is not the correct solution.  Instead, we must make sure we "handle" this in the respective code (e.g. with retries triggered by correct exception handling) - see NETVIRT-1215.

Comment 10 Franck Baudin 2019-03-06 16:15:14 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality

Comment 11 Franck Baudin 2019-03-06 16:17:21 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality


Note You need to log in before you can comment on or make changes to this bug.