Bug 1017796 - Inconsistent L1 in non-tx distributed cache
Summary: Inconsistent L1 in non-tx distributed cache
Keywords:
Status: VERIFIED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ER3
: 6.2.0
Assignee: Tristan Tarrant
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks: 1010419
TreeView+ depends on / blocked
 
Reported: 2013-10-10 14:15 UTC by Radim Vansa
Modified: 2023-03-02 08:27 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-3617 0 Critical Resolved Inconsistent L1 in non-tx distributed cache 2013-11-20 11:25:42 UTC

Description Radim Vansa 2013-10-10 14:15:30 UTC
When the change is replicated to backup owner, it sends the InvalidateL1Command to backup owners before committing the entry in EntryWrappingInterceptor (it performs the WriteCommand in parallel with sending the invalidation commmand, but then it waits until the invalidation request gets acked. If a GET is executed between the invalidation and committing the entry, the response contains outdated result and the L1 will not be invalidated until next write operation.

Comment 2 JBoss JIRA Server 2013-10-11 04:21:02 UTC
William Burns <wburns> updated the status of jira ISPN-3617 to Coding In Progress

Comment 3 JBoss JIRA Server 2013-10-11 20:44:48 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Side note if we introduced a fix with ISPN-3426 that makes it so we only put L1 entries when retrieved from the owner this wouldn't be an issue.

Comment 5 Tristan Tarrant 2013-10-18 11:19:22 UTC
Fixed in ER3.1

Comment 6 JBoss JIRA Server 2013-10-21 08:25:06 UTC
Radim Vansa <rvansa> updated the status of jira ISPN-3617 to Reopened

Comment 7 JBoss JIRA Server 2013-10-21 08:25:06 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3617

I think there's another but highly-related problem. When the L1ManagerImpl builds the invalidation address list for L1LastChanceInterceptor, it removes the node where the request originated. However, as it might execute another GET just before that (and the result was cached on the origin), the origin would just stay in requestor map but the entry would not be invalidated.

There's a notice about some kind of loop (and this is the purpose of origin being removed from the address list). Please, could you elaborate a bit more about this?

Comment 8 JBoss JIRA Server 2013-10-21 15:22:14 UTC
William Burns <wburns> made a comment on jira ISPN-3617

If a non owner updates a value for which it has an L1 value, it removes the value from it's own L1 cache.  It isn't done through invalidation like other operations.  This is done in L1NonTxInterceptor at line 238.  Did you have a case where this wasn't done?

Comment 9 JBoss JIRA Server 2013-10-21 17:05:22 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put ran on primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated on non owner
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.

Comment 10 JBoss JIRA Server 2013-10-21 17:08:14 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put ran on primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated the non owner L1
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.

Comment 11 JBoss JIRA Server 2013-10-21 17:08:48 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put originated from non owner sent to primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated the non owner L1
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.

Comment 12 JBoss JIRA Server 2013-10-21 18:57:30 UTC
William Burns <wburns> updated the status of jira ISPN-3617 to Coding In Progress

Comment 13 JBoss JIRA Server 2013-10-22 06:02:46 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-3617

Right, after pointing me to the invalidation upon response to put from primary I've realized that the entry is about to be invalidated (and the log says so) but it is not committed as it is just wrapped for removal in the context - but the entry is not marked as removed.

Anyway, the {{EntryFactoryImpl.createWrappedEntry}} has {{forRemoval}} argument - why this is not honoured?

Comment 14 JBoss JIRA Server 2013-10-22 15:47:33 UTC
William Burns <wburns> made a comment on jira ISPN-3617

Radim, I think it is just an artifact of code since refactored.  The wrapping itself doesn't cause the entry to be removed, that is done through a command.  I have changed the code to also send down an invalidation command after it is wrapped which fixes the issue in my test.


Note You need to log in before you can comment on or make changes to this bug.