Bug 1033087 - Forwarded Prepare/Commit executed after transaction finished
Summary: Forwarded Prepare/Commit executed after transaction finished
Keywords:
Status: VERIFIED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: CR1
: 6.2.0
Assignee: Tristan Tarrant
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks: 1017190
TreeView+ depends on / blocked
 
Reported: 2013-11-21 14:23 UTC by Radim Vansa
Modified: 2014-01-07 13:21 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-3745 0 Critical Resolved Forwarded Prepare/Commit executed after transaction finished 2014-04-02 11:07:27 UTC

Description Radim Vansa 2013-11-21 14:23:51 UTC
Replicated TX cache, nodes A, B, C

0. A and B have topology 2, C already got topology 3
1. A sends prepare with topology 2 to B and C, both apply the prepare and respond
2. C forwards prepare to B with topology 3
3. A sends commit with topology 2 to B and C, both commit and respond
4. again, C forwards prepare to B with topology 3
5. A and B get updated topology id
6. A executes another transaction on the same entry
7. prepare and commit from first transaction with topology 3 arrive at B - B overwrites (or removes) the entry again

Result: on B we have inconsistent state

Comment 2 JBoss JIRA Server 2013-12-02 15:27:48 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-3745

[~rvansa] What's the cache configuration? The forwarding is always done synchronously, so node A couldn't receive the prepare response and send the commit until C finished its forwarding.

Comment 3 JBoss JIRA Server 2013-12-03 08:07:37 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3745

You're right, as I have synchronous tx cache, the forwarding should be synchronous. Regrettably, I miss the logs from the forwarding node (it got truncated), just to let you see what happened:

{code}
04:19:29,410 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-95,default,apex862-11617) Attempting to execute command: CommitCommand {gtx=GlobalTransaction:<apex861-22006>:164595:
local, cacheName='testCache', topologyId=18} [sender=apex861-22006]
04:19:29,411 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-14) Calling perform() on CommitCommand {gtx=GlobalTransaction:<apex861-22006>:164595:remote, cacheName='testCache', topologyId=18}
04:19:29,412 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-14) About to send back response SuccessfulResponse{responseValue=null}  for command CommitCommand {gtx=GlobalTransaction:<apex861-22006>:164595:remote, cacheName='testCache', topologyId=18}
04:19:31,301 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-78,default,apex862-11617) Attempting to execute command: PrepareCommand {modifications=[ ... ], onePhaseCommit=false, gtx=GlobalTransaction:<apex861-22006>:164595:local, cacheName='testCache', topologyId=19} [sender=apex863-20495]
{code}

Comment 4 JBoss JIRA Server 2013-12-03 08:10:31 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3745

Thinking about that once more, the broadcast optimization may be the villain here as well, because the apex863 (sender) has just joined. It got the prepare/commit as this was broadcast but nobody waited for its response. Then, it could forward the commands to the old nodes and these executed it again.

Comment 5 JBoss JIRA Server 2013-12-04 11:28:25 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-3745

Is topologyId = 18 the id of the topology that contains the joiner, or the topology before? If it's the new topology, and the command was initially invoked remotely with topology 17, then the command was forwarded, otherwise it was likely retransmitted by JGroups. 

I'm inclined to think it's caused by JGroups retransmitting the message to the joiner and the originator not waiting for the response, too.

Comment 6 JBoss JIRA Server 2013-12-04 12:25:23 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3745

Topology 18 does not contain the joiner, 19 contains it.

Comment 7 JBoss JIRA Server 2013-12-19 11:40:43 UTC
Dan Berindei <dberinde> updated the status of jira ISPN-3745 to Resolved


Note You need to log in before you can comment on or make changes to this bug.