Bug 989808 - Insufficient owners with putIfAbsent during rebalance
Summary: Insufficient owners with putIfAbsent during rebalance
Keywords:
Status: CLOSED EOL
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: DR4
: 6.2.0
Assignee: Tristan Tarrant
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks: 989809
TreeView+ depends on / blocked
 
Reported: 2013-07-30 02:02 UTC by Takayoshi Kimura
Modified: 2024-05-01 00:22 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: TBD Consequence: Fix: Result:
Clone Of:
Environment:
Last Closed: 2024-05-01 00:22:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-3357 0 Critical Resolved Insufficient owners with putIfAbsent during rebalance 2018-09-12 23:58:23 UTC

Description Takayoshi Kimura 2013-07-30 02:02:31 UTC
Product ticket for ISPN-3357

Comment 1 JBoss JIRA Server 2013-08-01 15:04:21 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3357

I have written down a test in library mode exhibiting this behaviour. Please, let me know as fix is ready, I'll try to check that.

Comment 2 JBoss JIRA Server 2013-08-12 06:57:32 UTC
Takayoshi Kimura <tkimura> made a comment on jira ISPN-3357

It's rare but this issue also happens with node leave rebalance so I've updated the title accordingly.

I've attached logs from my test - 8 nodes dist cluster, a node leave with kill -9, 6 backups are missing.

Comment 3 JBoss JIRA Server 2013-08-12 07:09:41 UTC
Takayoshi Kimura <tkimura> made a comment on jira ISPN-3357

It's rare but this issue also happens with node leave rebalance so I've updated the title accordingly.

I've attached logs from my test - 8 nodes dist cluster, a node leave with kill -9, 6 backups are missing. The event sequence is basically same as node join rebalance.

Comment 4 JBoss JIRA Server 2013-08-15 18:51:38 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-3357

It makes sense that this would happen during leave as well, as a leave is handled by first updating the CH to remove the leaver, and then adding a new owner just like during joins. The only difference is that the owners in the pending CH are almost always a superset of the owners in the current CH, so the write CH is the same as the pending CH and the final CH update doesn't actually change anything.

The write commands already have an {{ignorePreviousValue}} flag that is used in tx caches to skip the remote get and the value check (ISPN-3235). We can use the same flag in non-tx caches to make the command non-conditional on backup nodes.

There is a complication, though: say there is a state transfer in progress, the owners of k in the current CH are [A, B], and the owners in the pending CH are [C, B] (so the write CH owners are [A, B, C]). If A initiates a putIfAbsent(k, v) operation and C rejects the command because the topology changed (so the owners are now [C, B]), but B writes the value, then A will retry the command, C will do the previous value check, and the operation will fail - leaving B without an update.

So we have to ignore the previous value on C, but only if it's the same as the one we're trying to set. Otherwise we could overwrite a value set by another operation, and we wouldn't know whether to return {{null}} or the value existing on C. That means that the values need to implement {{equals()}} for this to work - but this shouldn't be a problem with HotRod clients.

One more possible source of inconsistencies is that if the originator notices a topology change at the end of the operation (in StateTransferInterceptor), it forwards the command again to all the owners. This happens without a lock, so it could overlap with another operation and leave a different value on each node. I plan to replace this with another topology in EntryWrappingInterceptor, just before applying the update, to make sure that if state transfer missed the update, then the command will be retried.

Finally, there is another case where we break consistency that I wanted to mention: if the primary owner dies, and there are multiple backup owners (either because numOwners >= 3 or because there is a join in progress), only one of the backups may execute the command. The originator would report a failure, but the cache would still be inconsistent, and retrying the putIfAbsent call wouldn't fix it. The inconsistency can also appear if a command is rejected only on some of the backup owners, and the originator dies before retrying. [~tkimura], I know you mentioned that you ignored the keys for which ISPN reported an error in your tests, I just wanted to make sure that this is acceptable.

Comment 5 JBoss JIRA Server 2013-08-15 18:57:11 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-3357

It makes sense that this would happen during leave as well, as a leave is handled by first updating the CH to remove the leaver, and then adding a new owner just like during joins. The only difference is that the owners in the pending CH are almost always a superset of the owners in the current CH, so the write CH is the same as the pending CH and the final CH update doesn't actually change anything.

The write commands already have an {{ignorePreviousValue}} flag that is used in tx caches to skip the remote get and the value check (ISPN-3235). We can use the same flag in non-tx caches to make the command non-conditional on backup nodes.

There is a complication, though: say there is a state transfer in progress, the owners of k in the current CH are [A, B], and the owners in the pending CH are [C, B] (so the write CH owners are [A, B, C]). If A initiates a putIfAbsent(k, v) operation and B rejects the command because the topology changed (so the owners are now [C, B]), but C writes the value, then A will retry the command, C will do the previous value check, and the operation will fail - leaving B without an update.

So we have to ignore the previous value on C, but only if it's the same as the one we're trying to set. Otherwise we could overwrite a value set by another operation, and we wouldn't know whether to return {{null}} or the value existing on C. That means that the values need to implement {{equals()}} for this to work - but this shouldn't be a problem with HotRod clients.

One more possible source of inconsistencies is that if the originator notices a topology change at the end of the operation (in StateTransferInterceptor), it forwards the command again to all the owners. This happens without a lock, so it could overlap with another operation and leave a different value on each node. I plan to replace this with another topology in EntryWrappingInterceptor, just before applying the update, to make sure that if state transfer missed the update, then the command will be retried.

Finally, there is another case where we break consistency that I wanted to mention: if the primary owner dies, and there are multiple backup owners (either because numOwners >= 3 or because there is a join in progress), only one of the backups may execute the command. The originator would report a failure, but the cache would still be inconsistent, and retrying the putIfAbsent call wouldn't fix it. The inconsistency can also appear if a command is rejected only on some of the backup owners, and the originator dies before retrying. [~tkimura], I know you mentioned that you ignored the keys for which ISPN reported an error in your tests, I just wanted to make sure that this is acceptable.

Comment 6 JBoss JIRA Server 2013-08-19 08:06:01 UTC
Takayoshi Kimura <tkimura> made a comment on jira ISPN-3357

Yes the "originator dies" scenario is acceptable.

Comment 7 JBoss JIRA Server 2013-08-19 22:44:53 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-3357

For now we are completely ignoring the previous value on the primary owner when retrying, not just if the current value matches the final value of the command.

Comment 8 Martin Gencur 2013-09-05 07:20:58 UTC
It turns out that this bug will be fixed in ER1 (after integrating https://github.com/infinispan/infinispan/commit/f9982dea28). So moving the target milestone.

Comment 9 gsheldon 2014-01-13 23:46:03 UTC
removed doc text flag.


Note You need to log in before you can comment on or make changes to this bug.