Description of problem: When running a Browbeat+Rally use case in my downstream scale CI which does: Create a network Create a sbunet Create a router Attach router to subnet and oublic network Boot VM with floating IP Ping VM with concurrency 8 and times set to 50 we see that some VMs remain unpingable even after 300 seconds and we see a lot of the following exceptions in karaf logs 2018-04-13T23:40:08,068 | ERROR | ForkJoinPool-1-worker-27 | ElanUtils | 351 - org.opendaylight.netvirt.elanmanager-impl - 0.6.0.redhat-5 | Error writing to datastore {} java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was deleted by other transaction.]]} at org.opendaylight.mdsal.common.api.MappingCheckedFuture.wrapInExecutionException(MappingCheckedFuture.java:66) [296:org.opendaylight.mdsal.common-api:2.4.0.redhat-5] at org.opendaylight.mdsal.common.api.MappingCheckedFuture.get(MappingCheckedFuture.java:79) [296:org.opendaylight.mdsal.common-api:2.4.0.redhat-5] at org.opendaylight.netvirt.elan.utils.ElanUtils.waitForTransactionToComplete(ElanUtils.java:1381) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5] at org.opendaylight.netvirt.elan.internal.ElanInterfaceManager.removeElanInterface(ElanInterfaceManager.java:275) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5] at org.opendaylight.netvirt.elan.internal.InterfaceRemoveWorkerOnElan.call(InterfaceRemoveWorkerOnElan.java:52) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5] at org.opendaylight.netvirt.elan.internal.InterfaceRemoveWorkerOnElan.call(InterfaceRemoveWorkerOnElan.java:21) [351:org.opendaylight.netvirt.elanmanager-impl:0.6.0.redhat-5] at org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:416) [281:org.opendaylight.infrautils.jobcoordinator-impl:1.3.0.redhat-5] at org.opendaylight.infrautils.utils.concurrent.LoggingUncaughtThreadDeathContextRunnable.run(LoggingUncaughtThreadDeathContextRunnable.java:60) [287:org.opendaylight.infrautils.util:1.3.0.redhat-5] at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:?] Caused by: org.opendaylight.controller.md.sal.common.api.data.OptimisticLockFailedException: Optimistic lock failed. at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:740) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:778) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:725) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:808) ~[?:?] at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:84) ~[?:?] at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleReadyLocalTransaction(ShardCommitCoordinator.java:200) ~[?:?] at org.opendaylight.controller.cluster.datastore.Shard.handleReadyLocalTransaction(Shard.java:731) ~[?:?] at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:333) ~[?:?] at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270) ~[?:?] at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44) ~[?:?] at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:274) ~[?:?] at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104) ~[?:?] at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:545) ~[?:?] at akka.actor.Actor.aroundReceive(Actor.scala:514) ~[?:?] at akka.actor.Actor.aroundReceive$(Actor.scala:512) ~[?:?] at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:272) ~[?:?] at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:663) ~[?:?] at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:183) ~[?:?] at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:182) ~[?:?] at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:272) ~[?:?] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527) ~[?:?] at akka.actor.ActorCell.invoke(ActorCell.scala:496) ~[?:?] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) ~[?:?] at akka.dispatch.Mailbox.run(Mailbox.scala:224) ~[?:?] at akka.dispatch.Mailbox.exec(Mailbox.scala:234) ~[?:?] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[?:?] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[?:?] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) ~[?:?] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ~[?:?] Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was deleted by other transaction. at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:287) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.StructuralContainerModificationStrategy.checkApplicable(StructuralContainerModificationStrategy.java:101) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.checkApplicable(RootModificationApplyOperation.java:72) ~[?:?] at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractDataTreeTip.validate(AbstractDataTreeTip.java:35) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:732) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:778) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:725) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:808) ~[?:?] at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:84) ~[?:?] at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236) ~[?:?] at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleReadyLocalTransaction(ShardCommitCoordinator.java:200) ~[?:?] at org.opendaylight.controller.cluster.datastore.Shard.handleReadyLocalTransaction(Shard.java:731) ~[?:?] at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:333) ~[?:?] at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270) ~[?:?] at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44) ~[?:?] at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:274) ~[?:?] at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104) ~[?:?] at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:545) ~[?:?] at akka.actor.Actor.aroundReceive(Actor.scala:514) ~[?:?] at akka.actor.Actor.aroundReceive$(Actor.scala:512) ~[?:?] at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:272) ~[?:?] at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:663) ~[?:?] at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:183) ~[?:?] at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:182) ~[?:?] at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:272) ~[?:?] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527) ~[?:?] at akka.actor.ActorCell.invoke(ActorCell.scala:496) ~[?:?] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) ~[?:?] at akka.dispatch.Mailbox.run(Mailbox.scala:224) ~[?:?] at akka.dispatch.Mailbox.exec(Mailbox.scala:234) ~[?:?] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[?:?] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[?:?] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) ~[?:?] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ~[?:?] Version-Release number of selected component (if applicable): OSP 13 opendaylight-8.0.0-5.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Install OSp13 + ODL 2. Run scale usecase mentioned above 3. Actual results: Exceptions in karaf logs and some vms remain unpingable Expected results: No exceptions and VMs should be pingable Additional info:
*** Bug 1567508 has been marked as a duplicate of this bug. ***
> make sure to clean the logs from the stack trace since this exception is expected where was this determined, and how will that actually make the VM pingable? Let's have a closer look in https://jira.opendaylight.org/browse/NETVIRT-1215 ...
Sai, as far as I can tell from looking at the netvrit code upstream (which I'm not an expert on), there should already be retries in the code, so this is indeed a "false" error log.. I've just raised two changes which should suppress this, but it won't actually "fix" anything - so if "we see that some VMs remain unpingable", that's actually another real issue probably (do you see any other errors in the log when this happens?) - would you like to open a new BZ about that for someone else, so that we can close this one when we'll have the logging cleaned up?
AFAIK when an optimistic lock fails it is an expected behavior since that's how optimistic locking works. Let me know if that's not the case..
> when an optimistic lock fails it is an expected behavior > since that's how optimistic locking works. > Let me know if that's not the case. Well, no, it's not the case. It's really a little bit more complicated: IFF we have code for handling optimistic lock failures in ODL e.g. via retries, then we shold not show ERROR logs. This is often done wrong in code I see and review. But if we do not, then such ERROR logs do indicate the writes failed. So to just blindly demand "to clean the logs from the stack trace since this exception is expected" is not the correct solution. Instead, we must make sure we "handle" this in the respective code (e.g. with retries triggered by correct exception handling) - see NETVIRT-1215.
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality