Bug 1017796

Summary: Inconsistent L1 in non-tx distributed cache
Product: [JBoss] JBoss Data Grid 6 Reporter: Radim Vansa <rvansa>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: VERIFIED --- QA Contact: Martin Gencur <mgencur>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2.0CC: jdg-bugs
Target Milestone: ER3   
Target Release: 6.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1010419    

Description Radim Vansa 2013-10-10 14:15:30 UTC
When the change is replicated to backup owner, it sends the InvalidateL1Command to backup owners before committing the entry in EntryWrappingInterceptor (it performs the WriteCommand in parallel with sending the invalidation commmand, but then it waits until the invalidation request gets acked. If a GET is executed between the invalidation and committing the entry, the response contains outdated result and the L1 will not be invalidated until next write operation.

Comment 2 JBoss JIRA Server 2013-10-11 04:21:02 UTC
William Burns <wburns> updated the status of jira ISPN-3617 to Coding In Progress

Comment 3 JBoss JIRA Server 2013-10-11 20:44:48 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Side note if we introduced a fix with ISPN-3426 that makes it so we only put L1 entries when retrieved from the owner this wouldn't be an issue.

Comment 5 Tristan Tarrant 2013-10-18 11:19:22 UTC
Fixed in ER3.1

Comment 6 JBoss JIRA Server 2013-10-21 08:25:06 UTC
Radim Vansa <rvansa> updated the status of jira ISPN-3617 to Reopened

Comment 7 JBoss JIRA Server 2013-10-21 08:25:06 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3617

I think there's another but highly-related problem. When the L1ManagerImpl builds the invalidation address list for L1LastChanceInterceptor, it removes the node where the request originated. However, as it might execute another GET just before that (and the result was cached on the origin), the origin would just stay in requestor map but the entry would not be invalidated.

There's a notice about some kind of loop (and this is the purpose of origin being removed from the address list). Please, could you elaborate a bit more about this?

Comment 8 JBoss JIRA Server 2013-10-21 15:22:14 UTC
William Burns <wburns> made a comment on jira ISPN-3617

If a non owner updates a value for which it has an L1 value, it removes the value from it's own L1 cache.  It isn't done through invalidation like other operations.  This is done in L1NonTxInterceptor at line 238.  Did you have a case where this wasn't done?

Comment 9 JBoss JIRA Server 2013-10-21 17:05:22 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put ran on primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated on non owner
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.

Comment 10 JBoss JIRA Server 2013-10-21 17:08:14 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put ran on primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated the non owner L1
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.

Comment 11 JBoss JIRA Server 2013-10-21 17:08:48 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put originated from non owner sent to primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated the non owner L1
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.

Comment 12 JBoss JIRA Server 2013-10-21 18:57:30 UTC
William Burns <wburns> updated the status of jira ISPN-3617 to Coding In Progress

Comment 13 JBoss JIRA Server 2013-10-22 06:02:46 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3617

Right, after pointing me to the invalidation upon response to put from primary I've realized that the entry is about to be invalidated (and the log says so) but it is not committed as it is just wrapped for removal in the context - but the entry is not marked as removed.

Anyway, the {{EntryFactoryImpl.createWrappedEntry}} has {{forRemoval}} argument - why this is not honoured?

Comment 14 JBoss JIRA Server 2013-10-22 15:47:33 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Radim, I think it is just an artifact of code since refactored.  The wrapping itself doesn't cause the entry to be removed, that is done through a command.  I have changed the code to also send down an invalidation command after it is wrapped which fixes the issue in my test.