Bug 1033087 - Forwarded Prepare/Commit executed after transaction finished
Forwarded Prepare/Commit executed after transaction finished
Status: VERIFIED
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan (Show other bugs)
6.2.0
Unspecified Unspecified
unspecified Severity high
: CR1
: 6.2.0
Assigned To: Tristan Tarrant
Martin Gencur
:
Depends On:
Blocks: 1017190
  Show dependency treegraph
 
Reported: 2013-11-21 09:23 EST by Radim Vansa
Modified: 2014-01-07 08:21 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker ISPN-3745 Critical Resolved Forwarded Prepare/Commit executed after transaction finished 2014-04-02 07:07:27 EDT

  None (edit)
Description Radim Vansa 2013-11-21 09:23:51 EST
Replicated TX cache, nodes A, B, C

0. A and B have topology 2, C already got topology 3
1. A sends prepare with topology 2 to B and C, both apply the prepare and respond
2. C forwards prepare to B with topology 3
3. A sends commit with topology 2 to B and C, both commit and respond
4. again, C forwards prepare to B with topology 3
5. A and B get updated topology id
6. A executes another transaction on the same entry
7. prepare and commit from first transaction with topology 3 arrive at B - B overwrites (or removes) the entry again

Result: on B we have inconsistent state
Comment 2 JBoss JIRA Server 2013-12-02 10:27:48 EST
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3745

[~rvansa] What's the cache configuration? The forwarding is always done synchronously, so node A couldn't receive the prepare response and send the commit until C finished its forwarding.
Comment 3 JBoss JIRA Server 2013-12-03 03:07:37 EST
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3745

You're right, as I have synchronous tx cache, the forwarding should be synchronous. Regrettably, I miss the logs from the forwarding node (it got truncated), just to let you see what happened:

{code}
04:19:29,410 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-95,default,apex862-11617) Attempting to execute command: CommitCommand {gtx=GlobalTransaction:<apex861-22006>:164595:
local, cacheName='testCache', topologyId=18} [sender=apex861-22006]
04:19:29,411 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-14) Calling perform() on CommitCommand {gtx=GlobalTransaction:<apex861-22006>:164595:remote, cacheName='testCache', topologyId=18}
04:19:29,412 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-14) About to send back response SuccessfulResponse{responseValue=null}  for command CommitCommand {gtx=GlobalTransaction:<apex861-22006>:164595:remote, cacheName='testCache', topologyId=18}
04:19:31,301 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-78,default,apex862-11617) Attempting to execute command: PrepareCommand {modifications=[ ... ], onePhaseCommit=false, gtx=GlobalTransaction:<apex861-22006>:164595:local, cacheName='testCache', topologyId=19} [sender=apex863-20495]
{code}
Comment 4 JBoss JIRA Server 2013-12-03 03:10:31 EST
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3745

Thinking about that once more, the broadcast optimization may be the villain here as well, because the apex863 (sender) has just joined. It got the prepare/commit as this was broadcast but nobody waited for its response. Then, it could forward the commands to the old nodes and these executed it again.
Comment 5 JBoss JIRA Server 2013-12-04 06:28:25 EST
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3745

Is topologyId = 18 the id of the topology that contains the joiner, or the topology before? If it's the new topology, and the command was initially invoked remotely with topology 17, then the command was forwarded, otherwise it was likely retransmitted by JGroups. 

I'm inclined to think it's caused by JGroups retransmitting the message to the joiner and the originator not waiting for the response, too.
Comment 6 JBoss JIRA Server 2013-12-04 07:25:23 EST
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3745

Topology 18 does not contain the joiner, 19 contains it.
Comment 7 JBoss JIRA Server 2013-12-19 06:40:43 EST
Dan Berindei <dberinde@redhat.com> updated the status of jira ISPN-3745 to Resolved

Note You need to log in before you can comment on or make changes to this bug.