Bug 1001634

Summary: Write command may be ignored during state transfer
Product: [JBoss] JBoss Data Grid 6 Reporter: Radim Vansa <rvansa>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: VERIFIED --- QA Contact: Martin Gencur <mgencur>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2.0CC: jdg-bugs, mgencur
Target Milestone: ER3   
Target Release: 6.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1017190, 1010419    

Description Radim Vansa 2013-08-27 12:34:16 UTC
Distributed sync non-tx cache.
Situation:

1) A node is joining the cluster, requesting some segment
2) RemoveCommand is sent to backup owner with ignorePreviousValue=true
3) It looks up the entry and finds null
4) State transfer invokes the PutKeyValueCommand and sets the value for removed entry (updateKeys has not the key yet)
5) RemoveCommand adds its key to updateKeys set, but it does not remove the value as it is already null (in its context)

Result: the value is removed on primary but on backup this is still present

Comment 2 JBoss JIRA Server 2013-08-28 10:52:23 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3443

Confirmed also for transactional cache.

Comment 3 JBoss JIRA Server 2013-08-28 10:55:06 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3443

The issue may happen for PutKeyValueCommand as well - the command commits the entry, then the ST can commit the entry and only after the command records this key into the updateKeys set.

Comment 4 JBoss JIRA Server 2013-10-07 09:14:03 UTC
Radim Vansa <rvansa> updated the status of jira ISPN-3443 to Reopened

Comment 5 JBoss JIRA Server 2013-10-07 09:14:03 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3443

I still have ignored commands - see comment I've posted 28/Aug/13 12:55 PM. Experienced with ReplaceCommand on backup owner in non-tx cache.
I believe the cause is race condition between checking the update keys in EntryWrappingInterceptor.commitEntryIfNeeded and actually committing it - if the ReplaceCommand is executed just between the check and committing the value, the value is overwritten by ST.

Comment 6 JBoss JIRA Server 2013-10-09 08:52:54 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-3443

Radim, I missed your comment about the race condition in EntryWrappingInterceptor.commitEntryIfNeeded. I don't think it's a case of the WriteCommand being ignored, though: the command is committed, it's just that state transfer than overwrites the value.

Comment 7 JBoss JIRA Server 2013-10-09 09:01:31 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3443

You're right - should I rename this JIRA back to RemoveCommand may be ignored during state transfer and create different JIRA for overwriting by ST?

Comment 8 JBoss JIRA Server 2013-10-10 12:01:07 UTC
Radim Vansa <rvansa> updated the status of jira ISPN-3443 to Reopened

Comment 9 JBoss JIRA Server 2013-10-10 12:01:07 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3443

[~dan.berindei]: Regrettably, the fix is not correct. You have to atomically check and ADD the key to updated keys set. Otherwise, a situation when both transaction and state transfer commit the entry (in this order) is possible, resulting with outdated entry on the node.

Comment 10 JBoss JIRA Server 2013-10-11 14:28:14 UTC
Radim Vansa <rvansa> updated the status of jira ISPN-3443 to Resolved

Comment 11 JBoss JIRA Server 2013-10-11 14:28:14 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3443

My apologies, old version got into the test.

Comment 12 Radim Vansa 2013-12-10 10:10:15 UTC
*** Bug 1024918 has been marked as a duplicate of this bug. ***