847809 – Cluster with non-shared JDBC cache store has too many entries after node failure

Bug 847809 - Cluster with non-shared JDBC cache store has too many entries after node failure

Summary: Cluster with non-shared JDBC cache store has too many entries after node failure

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	JBoss Data Grid 6
Classification:	JBoss
Component:	Infinispan
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	GA
Target Release:	7.0.0
Assignee:	Tristan Tarrant
QA Contact:	Martin Gencur
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-13 15:48 UTC by Radim Vansa
Modified:	2023-04-01 08:00 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	During a node restart, Red Hat JBoss Data Grid may not start correctly because of duplicate entries in a cache. As a workaround, use a shared cache store instead of local cache stores. Using this workaround, JBoss Data Grid works correctly across restarts and the cache store does not contain duplicate entries.
Clone Of:
Environment:
Last Closed:
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
JDG standalone configuration file (14.34 KB, text/xml) 2012-08-13 15:48 UTC, Radim Vansa	no flags	Details
Trace output (1.02 MB, text/plain) 2012-08-14 14:21 UTC, Radim Vansa	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	ISPN-2198	0	Major	Resolved	Cluster with non-shared JDBC cache store has too much entries after node failure	2016-03-31 17:59:53 UTC

Description Radim Vansa 2012-08-13 15:48:49 UTC

Created attachment 604033 [details]
JDG standalone configuration file

Description of problem:

In resilience test with 4-node cluster where one node is killed a weird situation appears. Before the node kill have this number of entries:

210602;215820;209400;203038 = 838860 entries

After the kill the number of entries changes for a while:

210602;null;209400;203038
250602;null;269400;243038
290602;null;269400;273038
300602;null;289400;293038
300602;null;289400;293038
321218;null;296035;293038

But then it stabilizes on 

326899;null;305039;314165 = 946103 entries

When the node02 is restarted it complains about duplicit entries:

ERROR [org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore] (OOB-124,null) ISPN008024: Error while storing string key to database; key: '8Az4Ia2V5NzYzNDI=', buffer size of value: 1050 bytes: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '?8Az4Ia2V5NzYzNDI=' for key 'PRIMARY'

Is this a bug or wrong configuration?


Version-Release number of selected component (if applicable):

6.0.1.ER2

Comment 1 JBoss JIRA Server 2012-08-14 11:34:14 UTC

Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2198

Hmmm, odd. The key should be '8Az4Ia2V5NzYzNDI=', but the exception says that the key is 'PRIMARY'?

@Radim, can you replicate this in a smaller scale and generate some logs with TRACE on for org.infinispan package?

Tristan/Mircea, can either of you have a look to this?

Comment 2 JBoss JIRA Server 2012-08-14 14:06:05 UTC

Radim Vansa <rvansa> made a comment on jira ISPN-2198

I have used only 26 entries (with 2 owners each) and one client asking for the entries (still there's enough jabber).
The sfout.txt contains org.infinispan TRACE log (together with the test log), the cache_entries shows that originaly the cluster has 18+10+13+11=52 entries and after the kill it's 22+16+20=58.

Comment 3 JBoss JIRA Server 2012-08-14 14:06:51 UTC

Radim Vansa <rvansa> made a comment on jira ISPN-2198

I have used only 26 entries (with 2 owners each) and one client asking for the entries (still there's enough jabber).
The sfout.txt contains org.infinispan TRACE log (together with the test log), the cache_entries.csv shows that originaly the cluster has 18+10+13+11=52 entries and after the kill it's 22+16+20=58.

Comment 4 Radim Vansa 2012-08-14 14:21:56 UTC

Created attachment 604320 [details]
Trace output

Comment 5 JBoss JIRA Server 2012-08-17 13:52:29 UTC

Mircea Markus <mmarkus> made a comment on jira ISPN-2198

Couldn't reproduce the issue locally, through a unit test. Waiting from Radim to upload the trace log files from his environment.

Comment 6 JBoss JIRA Server 2012-08-17 14:12:27 UTC

Radim Vansa <rvansa> made a comment on jira ISPN-2198

As requested by mmarkus, I enclose more logs.

Comment 7 JBoss JIRA Server 2012-08-20 10:25:57 UTC

Mircea Markus <mmarkus> made a comment on jira ISPN-2198

@Radim - the attached logs files were produced with DEBUG level enabled. This is not good for me, as it doesn't highlight individual key added to the cache. Can you please reproduce with TRACE level?

Comment 8 JBoss JIRA Server 2012-08-20 23:08:19 UTC

Mircea Markus <mmarkus> made a comment on jira ISPN-2198

Looking at the attached logs I can see that a put takes place on the node 4 *at the same* time when the other server(number 2) is shutdown (time 12:51:17,827).
My understanding of the problem is that there's no (put) activity *during*  and after the shutdown - otherwise the increasing number of entries might simply be explained by the addition of more entries to the system. Can you please confirm this?
I also didn't see any size() being invoked on all the caches in the cluster (e.g on node 3) - how was the size of each individual cache obtained?

Comment 9 JBoss JIRA Server 2012-08-21 07:16:48 UTC

Radim Vansa <rvansa> made a comment on jira ISPN-2198

The client thread is doing puts and gets all the time, however, the set of keys it uses is static and therefore no other keys should be added to the cache.
The statistics are obtained through JMX on jboss.infinispan:type=Cache,name="testCache(dist_sync)",manager="default",component=Statistics, querying attribute numberOfEntries

Comment 10 mark yarborough 2012-08-22 13:37:24 UTC

Moving to 6.1 since is not regression.

Comment 11 mark yarborough 2012-08-22 13:37:24 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
CCFR from mmarkus

Comment 12 Mircea Markus 2012-08-22 14:55:52 UTC

The root couse of this problem needs to be still analysed. As a workaround, using a shared cache store (vs local cache stores) should work.

Comment 15 Misha H. Ali 2013-05-07 03:44:03 UTC

Set flag to nominate for 6.2 release notes.

Comment 17 Misha H. Ali 2014-07-14 09:17:03 UTC

Not required for release notes.

Comment 18 JBoss JIRA Server 2015-06-03 13:22:58 UTC

Dan Berindei <dberinde> updated the status of jira ISPN-2198 to Resolved

Note You need to log in before you can comment on or make changes to this bug.