Bug 1021461 - In non-tx caches, write operations may not be atomic during rebalance
In non-tx caches, write operations may not be atomic during rebalance
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: ER6
: 6.2.0
Assigned To: Tristan Tarrant
Martin Gencur
Depends On:
Blocks: 1017190
  Show dependency treegraph
Reported: 2013-10-21 06:56 EDT by Radim Vansa
Modified: 2014-03-24 21:38 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
In a non-transactional cache, conditional operations executed during cluster rebalance after a node joins or leaves may not test and set the value atomically - two competing commands may both override the value.
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker ISPN-3422 Critical Resolved In non-tx caches, write operations may not be atomic during rebalance 2016-08-15 04:47 EDT

  None (edit)
Description Radim Vansa 2013-10-21 06:56:54 EDT
If the cache topology changes while a write command is running and before it has actually committed the entry to the data container, we retry the command (see ISPN-3366 and ISPN-3357). But before we detect the topology change, one or more of the backup owners may have already applied the modification.

Retrying the command re-acquires the key lock on the primary owner (even if the primary owner didn't change). That means another command could have modified the same key in the meantime, but the retried command is going to ignore any changes and is going to return the value before the first attempt.

I have observed a situation where two concurrent putIfAbsent operations are retried because of topology change, and the second retry has overwritten the first retry.
Comment 2 JBoss JIRA Server 2013-10-23 03:09:29 EDT
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3422

Are there any ideas how to solve this issue? I am rather curious about it.
Comment 3 JBoss JIRA Server 2013-10-24 06:23:57 EDT
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3422

Yes, we should be able to fix this by comparing the PutIfAbsentCommand's value with the value that exists in the data container even when ignorePreviousValue is set to true. If the value in the data container is either null or the same, then the command should succeed, otherwise it should fail.

I didn't implement this back when fixing ISPN-3357 because of 2 reasons:
1) Values would have to implement equals() for this to work. AFAICT we don't really support versioning in non-tx mode.
2) We should only do the comparison check on the primary, because we want the final value to be the same on all the owners even if there was an inconsistency before. So we'd need to inject a {{ClusteringDependentLogic}}  in all the conditional commands.
Comment 4 JBoss JIRA Server 2013-10-24 07:57:57 EDT
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3422

I have a feeling (already mentioned in ISPN-3600) that the ignorePreviousValue is rather abused, as even with the flag set we would do the comparison, and with the flag set to false on non-origin nodes the command is ignored at all in perform()... I know this is an implementation detail, but the flag name does not correspond to the use at all.

ad 1) I don't quite understand the problem - conditional commands already require equals implementation on values or defined valueEquivalence within the command
Comment 5 JBoss JIRA Server 2013-11-14 06:55:44 EST
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3422

At point 1), I was thinking of {{putIfAbsent()}}, which only compares the values with {{null}} ATM. You're right about the other conditional commands, though.
Comment 6 JBoss JIRA Server 2013-12-04 06:12:36 EST
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3422

These are the changes I made:
    * Replace the ignorePreviousValue flag with a ValueMatchingPolicy enum.
    * Set the matching policy to MATCH_EXPECTED_OR_NEW when retrying a
      putIfAbsent(k, value).
    * replace(k, expected, newValue) and remove(k, expected) will also be
      retried with MATCH_EXPECTED_OR_NEW.
    * Check the primary owner's response and update the status of the command.
      Only successful commands should be retried if the topology changes.

Regular put(k, value) still won't be atomic, because we don't keep the previous value around. Same as regular put(k, value), replace(k, newValue) and remove(k) won't be atomic.
Comment 7 JBoss JIRA Server 2013-12-09 09:33:34 EST
William Burns <wburns@redhat.com> updated the status of jira ISPN-3422 to Resolved

Note You need to log in before you can comment on or make changes to this bug.