Bug 1017796 - Inconsistent L1 in non-tx distributed cache
Inconsistent L1 in non-tx distributed cache
Status: VERIFIED
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan (Show other bugs)
6.2.0
Unspecified Unspecified
unspecified Severity urgent
: ER3
: 6.2.0
Assigned To: Tristan Tarrant
Martin Gencur
:
Depends On:
Blocks: 1010419
  Show dependency treegraph
 
Reported: 2013-10-10 10:15 EDT by Radim Vansa
Modified: 2014-04-28 11:39 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker ISPN-3617 Critical Resolved Inconsistent L1 in non-tx distributed cache 2013-11-20 06:25:42 EST

  None (edit)
Description Radim Vansa 2013-10-10 10:15:30 EDT
When the change is replicated to backup owner, it sends the InvalidateL1Command to backup owners before committing the entry in EntryWrappingInterceptor (it performs the WriteCommand in parallel with sending the invalidation commmand, but then it waits until the invalidation request gets acked. If a GET is executed between the invalidation and committing the entry, the response contains outdated result and the L1 will not be invalidated until next write operation.
Comment 2 JBoss JIRA Server 2013-10-11 00:21:02 EDT
William Burns <wburns@redhat.com> updated the status of jira ISPN-3617 to Coding In Progress
Comment 3 JBoss JIRA Server 2013-10-11 16:44:48 EDT
William Burns <wburns@redhat.com> made a comment on jira ISPN-3617

Side note if we introduced a fix with ISPN-3426 that makes it so we only put L1 entries when retrieved from the owner this wouldn't be an issue.
Comment 5 Tristan Tarrant 2013-10-18 07:19:22 EDT
Fixed in ER3.1
Comment 6 JBoss JIRA Server 2013-10-21 04:25:06 EDT
Radim Vansa <rvansa@redhat.com> updated the status of jira ISPN-3617 to Reopened
Comment 7 JBoss JIRA Server 2013-10-21 04:25:06 EDT
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3617

I think there's another but highly-related problem. When the L1ManagerImpl builds the invalidation address list for L1LastChanceInterceptor, it removes the node where the request originated. However, as it might execute another GET just before that (and the result was cached on the origin), the origin would just stay in requestor map but the entry would not be invalidated.

There's a notice about some kind of loop (and this is the purpose of origin being removed from the address list). Please, could you elaborate a bit more about this?
Comment 8 JBoss JIRA Server 2013-10-21 11:22:14 EDT
William Burns <wburns@redhat.com> made a comment on jira ISPN-3617

If a non owner updates a value for which it has an L1 value, it removes the value from it's own L1 cache.  It isn't done through invalidation like other operations.  This is done in L1NonTxInterceptor at line 238.  Did you have a case where this wasn't done?
Comment 9 JBoss JIRA Server 2013-10-21 13:05:22 EDT
William Burns <wburns@redhat.com> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put ran on primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated on non owner
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.
Comment 10 JBoss JIRA Server 2013-10-21 13:08:14 EDT
William Burns <wburns@redhat.com> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put ran on primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated the non owner L1
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.
Comment 11 JBoss JIRA Server 2013-10-21 13:08:48 EDT
William Burns <wburns@redhat.com> made a comment on jira ISPN-3617

Actually I think this is related to ISPN-3273.  I have a guess as to what you may have seen.

My guess is that this operation happened:

# put originated from non owner sent to primary owner
# primary owner replicates to backup owner (not yet committed on primary) - backup owner invalidated the non owner L1
# get from non owner retrieves the old value from owner
# primary owner completes but doesn't invalidate the non owner since it was the one who updated

I will have to write up a test, but I am pretty sure this is what you saw.
Comment 12 JBoss JIRA Server 2013-10-21 14:57:30 EDT
William Burns <wburns@redhat.com> updated the status of jira ISPN-3617 to Coding In Progress
Comment 13 JBoss JIRA Server 2013-10-22 02:02:46 EDT
Radim Vansa <rvansa@redhat.com> made a comment on jira ISPN-3617

Right, after pointing me to the invalidation upon response to put from primary I've realized that the entry is about to be invalidated (and the log says so) but it is not committed as it is just wrapped for removal in the context - but the entry is not marked as removed.

Anyway, the {{EntryFactoryImpl.createWrappedEntry}} has {{forRemoval}} argument - why this is not honoured?
Comment 14 JBoss JIRA Server 2013-10-22 11:47:33 EDT
William Burns <wburns@redhat.com> made a comment on jira ISPN-3617

Radim, I think it is just an artifact of code since refactored.  The wrapping itself doesn't cause the entry to be removed, that is done through a command.  I have changed the code to also send down an invalidation command after it is wrapped which fixes the issue in my test.

Note You need to log in before you can comment on or make changes to this bug.