Bug 1122662

Summary: Dist Cache with each node having a seperate JDBC cache store does not work
Product: [JBoss] JBoss Data Grid 6 Reporter: John Osborne <josborne>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: CLOSED NOTABUG QA Contact: Martin Gencur <mgencur>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2.1CC: afield, dberinde, dmehra, jdg-bugs, wfink
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-03 11:35:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mvn projects and server.logs none

Description John Osborne 2014-07-23 17:40:08 UTC
Created attachment 920324 [details]
mvn projects and server.logs

I am running a 2 node DIST Cache (also tested 4 nodes) in which each node has access to its own JDBC Cache Store.

node 1 has 3.5M entries with the key range (1,3500000)
node 2 has 3.5M entries with the key range (3500001,7000000)

Each node starts and preloads 3.5M keys - correct

Sometimes the nodes do slight rebalancing and sometimes they do not rebalance.

However, after the startup, cache.size() on each node returns 3500000 not 7000000.

I have a servlet that gets the first 100 entries (key range 1-100).

If I run the servlet on the first node it will work. If I run the servlet on the second node it will return null for all 100 entries since both nodes only seem to see the entries they preloaded.

It's acting as if each node is its own cache, even though its in DIST mode and sometimes they do some sort of rebalance.

I am attaching the code/configs and log files.  I have this running on a VM inside the RH network, so please ask to see a demo of this issue.

Comment 2 Misha H. Ali 2014-07-23 22:58:24 UTC
Not sure if this is a docs issue. Copying in Divya. Divya, could you comment on whether this should be a dev issue (which is what it looks like to me) rather than just something we need to fix in docs.

Comment 4 Martin Gencur 2014-07-24 09:36:19 UTC
John,
the description suggests that each node has its own JDBC cache store defined. Does it mean that they use a "shared" JDBC cache store and all nodes load the data from a single database? Or are there two databases (each node loading from a different database)? In that case, the shared="true" attribute (which I've seen in the configuration) would not make sense. 
Thanks

Comment 5 Martin Gencur 2014-07-24 09:45:52 UTC
John, after taking another look at the configuration, I think I found the problem.
You have fetchInMemoryState="false" which means that a node that is started does not fetch in-memory state from the other nodes. So it only has what was loaded from the JDBC cache store. And since cache.size() only returns the number of elements stored on that single node, you see only entries which you loaded from the database. However, when you call a get() operation, it should load the entry from the other nodes, you should not get null. Can you confirm that setting fetchInMemoryState back to default "true" fixes your problem?

Comment 6 Martin Gencur 2014-07-24 09:48:23 UTC
Note that keySet() operation in library mode returns only the locally stored entries. This operation does not fetch (AFAIK) entries from other nodes.

Comment 7 John Osborne 2014-07-30 20:01:31 UTC
I'll check this week and update the case.  Sorry for the delay.

Comment 8 John Osborne 2014-08-01 15:07:00 UTC
Hi Martin,

In the test I just ran, I had two nodes, each loading 3.5M entries.

When I set fetchInMemoryState="true" the node entries start transferring.

On node 1 none of the nodes transferred and then I received timeout ERRORs, despite increasing the timeouts by 10-100x from the defaults.

On node 2 they were transferring but they are very slow.  I am receiving roughly 12.5K entries (each entry is 1K so 12.5 MB) about every 45 seconds.  That is transfer rate of about 277KB/s.  The VMs are on the same host and if I run scp I get transfer rate of about 120MB/s.

How quickly should I see entries transferred?

What should my timeout values be when running several GBs of data?

Comment 9 John Osborne 2014-08-01 15:09:52 UTC
After 30 minutes. Only 1M entries (1GB) had been transferred to node 2.

Node 1 was not working properly and the logs are filled with timeout errors.

Comment 10 Alan Field 2014-09-01 08:27:11 UTC
Hey John,

I have a couple of questions for you about this report.

1) Can you explain what you mean by "the entries start transferring"? Do you mean that you are able to call egt() on entries that exist on the other node or do you mean that the number of entries on each node is changing?

2) When you stop and restart the EAP nodes, do you do them in the same order each time?

Thanks,
Alan

Comment 11 Alan Field 2014-09-02 14:50:38 UTC
Hey John,

OK, you can ignore the questions in Comment 10: https://bugzilla.redhat.com/show_bug.cgi?id=1122662#c10

I think I have determined that the configuration you are using is not going to work well. (i.e. using a shared JDBC cache store with preloading enabled on each node and separate database tables for each node) Let me describe the problem that I ran into. Here are the steps I used:

1) Start two EAP nodes that have no tables in Postgress
2) Add 100 entries on each node
3) Traverse the entries on both nodes (200 entries are found on both nodes.)
4) Shut down both EAP instances
5) Start them back up and make sure they are clustered
6) Repeat step (3) (On the node that is started first, only the number of keys in the table for that node are found. On the second node, all 200 keys are found.)

When each node starts, it preloads the entries from the database into a local mode cache. When the second node starts, state transfer sends entries from node 1 to node 2. That's why the second node can traverse all entries in the cache. Infinispan expects that in a cluster configured with a shared cache store that one node in the cluster will load all entries when preload is enabled. The other issue with this configuration is that if a node fails, then the entries from that node will not be available to the other nodes in the cluster.

I'm going to talk to engineering about whether this configuration should be legal, but I wanted to let you know what I had found out so far.

Thanks,
Alan

Comment 12 Alan Field 2014-09-03 11:35:20 UTC
Hey John,

This is the response that I got from Dan on the engineering team:

"Nope, it's not legal. If shared="true" then it means both nodes should have access to the same data in the cache store. However, John's problems are not really related to using shared="true", the behaviour with shared="false" would be the same.

When node 2 starts, it will become an owner for roughly half of the keys/hash segments. 

Node 1 assumes that it had access to all the entries from the beginning. So it never does a remote get on node 2, and returns null if it doesn't find the key in memory or in the cache store.

Node 2 will receive the entries of node 1 through state transfer (assuming fetchInMemoryState = true), because it thinks it doesn't have any data. It assumes the preloaded entries are stale, so it will override any preloaded entry if it receives an entry with the same key from node 1. It does not delete the preloaded entries however, so it will have all the entries locally, which adds more confusion."

I'm going to close this as NOTABUG, and I'm going to open a JIRA that logs a warning if Infinispan detects this configuration.

Thanks,
Alan