This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 989808 - Insufficient owners with putIfAbsent during rebalance
Insufficient owners with putIfAbsent during rebalance
Status: VERIFIED
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan (Show other bugs)
6.1.0
Unspecified Unspecified
medium Severity medium
: DR4
: 6.2.0
Assigned To: Tristan Tarrant
Martin Gencur
:
Depends On:
Blocks: 989809
  Show dependency treegraph
 
Reported: 2013-07-29 22:02 EDT by Takayoshi Kimura
Modified: 2014-01-13 18:46 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: TBD Consequence: Fix: Result:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker ISPN-3357 Critical Resolved Insufficient owners with putIfAbsent during rebalance 2014-07-28 06:03:41 EDT

  None (edit)
Description Takayoshi Kimura 2013-07-29 22:02:31 EDT
Product ticket for ISPN-3357
Comment 1 JBoss JIRA Server 2013-08-01 11:04:21 EDT
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3357

I have written down a test in library mode exhibiting this behaviour. Please, let me know as fix is ready, I'll try to check that.
Comment 2 JBoss JIRA Server 2013-08-12 02:57:32 EDT
Takayoshi Kimura <tkimura@redhat.com> made a comment on jira ISPN-3357

It's rare but this issue also happens with node leave rebalance so I've updated the title accordingly.

I've attached logs from my test - 8 nodes dist cluster, a node leave with kill -9, 6 backups are missing.
Comment 3 JBoss JIRA Server 2013-08-12 03:09:41 EDT
Takayoshi Kimura <tkimura@redhat.com> made a comment on jira ISPN-3357

It's rare but this issue also happens with node leave rebalance so I've updated the title accordingly.

I've attached logs from my test - 8 nodes dist cluster, a node leave with kill -9, 6 backups are missing. The event sequence is basically same as node join rebalance.
Comment 4 JBoss JIRA Server 2013-08-15 14:51:38 EDT
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3357

It makes sense that this would happen during leave as well, as a leave is handled by first updating the CH to remove the leaver, and then adding a new owner just like during joins. The only difference is that the owners in the pending CH are almost always a superset of the owners in the current CH, so the write CH is the same as the pending CH and the final CH update doesn't actually change anything.

The write commands already have an {{ignorePreviousValue}} flag that is used in tx caches to skip the remote get and the value check (ISPN-3235). We can use the same flag in non-tx caches to make the command non-conditional on backup nodes.

There is a complication, though: say there is a state transfer in progress, the owners of k in the current CH are [A, B], and the owners in the pending CH are [C, B] (so the write CH owners are [A, B, C]). If A initiates a putIfAbsent(k, v) operation and C rejects the command because the topology changed (so the owners are now [C, B]), but B writes the value, then A will retry the command, C will do the previous value check, and the operation will fail - leaving B without an update.

So we have to ignore the previous value on C, but only if it's the same as the one we're trying to set. Otherwise we could overwrite a value set by another operation, and we wouldn't know whether to return {{null}} or the value existing on C. That means that the values need to implement {{equals()}} for this to work - but this shouldn't be a problem with HotRod clients.

One more possible source of inconsistencies is that if the originator notices a topology change at the end of the operation (in StateTransferInterceptor), it forwards the command again to all the owners. This happens without a lock, so it could overlap with another operation and leave a different value on each node. I plan to replace this with another topology in EntryWrappingInterceptor, just before applying the update, to make sure that if state transfer missed the update, then the command will be retried.

Finally, there is another case where we break consistency that I wanted to mention: if the primary owner dies, and there are multiple backup owners (either because numOwners >= 3 or because there is a join in progress), only one of the backups may execute the command. The originator would report a failure, but the cache would still be inconsistent, and retrying the putIfAbsent call wouldn't fix it. The inconsistency can also appear if a command is rejected only on some of the backup owners, and the originator dies before retrying. [~tkimura], I know you mentioned that you ignored the keys for which ISPN reported an error in your tests, I just wanted to make sure that this is acceptable.
Comment 5 JBoss JIRA Server 2013-08-15 14:57:11 EDT
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3357

It makes sense that this would happen during leave as well, as a leave is handled by first updating the CH to remove the leaver, and then adding a new owner just like during joins. The only difference is that the owners in the pending CH are almost always a superset of the owners in the current CH, so the write CH is the same as the pending CH and the final CH update doesn't actually change anything.

The write commands already have an {{ignorePreviousValue}} flag that is used in tx caches to skip the remote get and the value check (ISPN-3235). We can use the same flag in non-tx caches to make the command non-conditional on backup nodes.

There is a complication, though: say there is a state transfer in progress, the owners of k in the current CH are [A, B], and the owners in the pending CH are [C, B] (so the write CH owners are [A, B, C]). If A initiates a putIfAbsent(k, v) operation and B rejects the command because the topology changed (so the owners are now [C, B]), but C writes the value, then A will retry the command, C will do the previous value check, and the operation will fail - leaving B without an update.

So we have to ignore the previous value on C, but only if it's the same as the one we're trying to set. Otherwise we could overwrite a value set by another operation, and we wouldn't know whether to return {{null}} or the value existing on C. That means that the values need to implement {{equals()}} for this to work - but this shouldn't be a problem with HotRod clients.

One more possible source of inconsistencies is that if the originator notices a topology change at the end of the operation (in StateTransferInterceptor), it forwards the command again to all the owners. This happens without a lock, so it could overlap with another operation and leave a different value on each node. I plan to replace this with another topology in EntryWrappingInterceptor, just before applying the update, to make sure that if state transfer missed the update, then the command will be retried.

Finally, there is another case where we break consistency that I wanted to mention: if the primary owner dies, and there are multiple backup owners (either because numOwners >= 3 or because there is a join in progress), only one of the backups may execute the command. The originator would report a failure, but the cache would still be inconsistent, and retrying the putIfAbsent call wouldn't fix it. The inconsistency can also appear if a command is rejected only on some of the backup owners, and the originator dies before retrying. [~tkimura], I know you mentioned that you ignored the keys for which ISPN reported an error in your tests, I just wanted to make sure that this is acceptable.
Comment 6 JBoss JIRA Server 2013-08-19 04:06:01 EDT
Takayoshi Kimura <tkimura@redhat.com> made a comment on jira ISPN-3357

Yes the "originator dies" scenario is acceptable.
Comment 7 JBoss JIRA Server 2013-08-19 18:44:53 EDT
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-3357

For now we are completely ignoring the previous value on the primary owner when retrying, not just if the current value matches the final value of the command.
Comment 8 Martin Gencur 2013-09-05 03:20:58 EDT
It turns out that this bug will be fixed in ER1 (after integrating https://github.com/infinispan/infinispan/commit/f9982dea28). So moving the target milestone.
Comment 9 gsheldon 2014-01-13 18:46:03 EST
removed doc text flag.

Note You need to log in before you can comment on or make changes to this bug.