Summary: | [ISPN-3375] Map/Reduce jobs with small value sizes fail because of timeouts putting intermediate keys into the cache | ||
---|---|---|---|
Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Alan Field <afield> |
Component: | Embedded | Assignee: | Tristan Tarrant <ttarrant> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Martin Gencur <mgencur> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.2.0 | CC: | dmehra, gsheldon, jdg-bugs, mhusnain, myarboro, sjacobs |
Target Milestone: | --- | ||
Target Release: | 6.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
<para>
When a Map/Reduce task is executed against a cache with many small values, the task may experience <literal>TimeoutExceptions</literal> when writing the intermediate results to the cache. As a result, when these exceptions occur, the task fails.
</para>
<para>
Several workarounds are available for this issue, including:
<orderedlist>
<listitem>
<para>
Use larger values in the cache where the Map/Reduce task is executed.
</para>
</listitem>
<listitem>
<para>
Use a collator and/or a combiner on the Map/Reduce task if this is possible.
</para>
</listitem>
<listitem>
<para>
Configure the size of the intermediate chunks written to the cache using the state transfer chunkSize parameter.
</para>
</listitem>
</orderedlist>
</para>
</listitem>
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-01-26 14:05:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Bug Depends On: | |||
Bug Blocks: | 1104639 |
Description
Alan Field
2013-09-27 14:50:28 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-3375 [~afield] can you still reproduce this? I'm not sure if TRACE logs are feasible, but they would be nice to have. If this appears/ed in Jenkins, please add a link to the Jenkins job as well. Dan Berindei <dberinde> made a comment on jira ISPN-3375 [~afield], let's try with org.infinispan.distexec, org.infinispan.interceptors.locking and org.infinispan.remoting.InboundInvocationHandlerImpl. Actually, since TRACE will slow things down considerably, maybe it would be best to do a test run without any TRACE logs, just to verify that the issue still appears. Dan Berindei <dberinde> made a comment on jira ISPN-3375 The error with TRACE logs is a little different, but it still happens while inserting intermediary values in the temporary cache: {noformat} 17:06:28,331 ERROR [org.radargun.stages.MapReduceStage] (pool-1-thread-1) executeMapReduceTask() returned an exception org.infinispan.commons.CacheException: java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from edg-perf07-26990, see cause for remote stack trace Caused by: java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from edg-perf07-26990, see cause for remote stack trace Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from edg-perf07-26990, see cause for remote stack trace Caused by: org.infinispan.commons.CacheException: org.infinispan.commons.CacheException: Could not move intermediate keys/values for M/R task 739f64b9-af84-4625-8023-5974fc835d92 Caused by: org.infinispan.commons.CacheException: Could not move intermediate keys/values for M/R task 739f64b9-af84-4625-8023-5974fc835d92 Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from edg-perf08-36579, see cause for remote stack traceCaused by: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [tñùñ] for requestor [Thread[remote-thread-53,5,main]]! Lock held by [null] {noformat} The map/reduce task seems to generate a lot of traffic. Because it doesn't use a combiner (with {{combinedWith(Reducer)}}), after the map phase some of the intermediary keys end up having 400k values in them. All the intermediary values generated on a node for a key are inserted in the temporary cache with a single PutKeyValueCommand. At 5 bytes per value (4-byte int + 1-byte tag), the serialized command ends up weighing 2MB, and with all the other traffic in the cluster it takes > 5s to transfer it. Since the intermediary cache is non-transactional, for each PutKeyValueCommand the primary owner acquires the lock, forwards the command to the backup owner, and releases the lock only when it receives an answer from the backup. When there are more than 3 concurrent commands all trying to write to the same key, it's somewhat expected that some of them will take more than 10 seconds (and time out). It's not as easy to explain the timeout in the builds without TRACE, since the replication timeout is 60s, but the excessive replication times could be caused by UNICAST2 and the large number of UFC credits. It might be useful to add a {{chunkSize}} attribute to {{MapReduceTask}}, allowing the user to split the values inserted in the temporary cache. But I'm pretty sure the error could be eliminated with some changes in the test and its configuration: * Add a combiner for the M/R task (should probably be the same as the reducer). * Replace UNICAST2 with UNICAST3 * Reduce {{UFC.max_credits}} to 1m and reduce {{FRAG2.frag_size}} to 30k. The huge traffic also caused nodes to be frequently suspected by FD_ALL (although only one, perf03, was confirmed by VERIFY_SUSPECT). FD_ALL is below NAKACK2 in the protocol stack, so heartbeats are not retransmitted. {{FD_ALL.timeout}} is 15s and {{FD_ALL.interval}} is 8s (the default), so it's enough to lose one heartbeat for FD_ALL to suspect the node, and VERIFY_SUSPECT will confirm it after one more missed heartbeat. The interval should be much smaller, e.g. 4s. Alan Field <afield> made a comment on jira ISPN-3375 [~dan.berindei] The trace job was executed with a combiner. (Search for collatorFqn= in the logs to see it being passed to the MapReduceStage) I have also run the job with fewer credits, and the timeout still occurs. I'll keep trying your suggestions, and the error does go away if the size of the values used is larger. I don't think that means this is less a problem, though since changing the value size is not always possible. I have been wondering if there is some way to avoid the contention in the intermediate cache by using key affinity and then combining the results in the reduce phase. Alan Field <afield> made a comment on jira ISPN-3375 [~dan.berindei] OK, I have rerun the tests with the suggested changes to the JGroups configuration, and the TimeoutException are still occurring. (With collator: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/user/afield@REDHAT.COM/my-views/view/afield's%20jobs/job/jdg-radargun-mapreduce-test/230/ and without collator: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/user/afield@REDHAT.COM/my-views/view/afield's%20jobs/job/jdg-radargun-mapreduce-test/229/) I can get another trace log, if you like. Alan Field <afield> made a comment on jira ISPN-3375 Updated JGroups configuration file for M/R task Dan Berindei <dberinde> made a comment on jira ISPN-3375 [~afield], I think MapReduceStage still doesn't use a combiner. Note that the combiner and the collator are different things, the pipeline goes like this: entries -> mapper (on entry node) -> combiner (on entry node) -> reducer (on intermediate node) -> collator (on originator). OTOH, I'd really like to see a log with TRACE enabled for org.jgroups and org.infinispan.remoting, as 60 seconds should be plenty even with 2MB entries (especially with UNICAST3). Alan Field <afield> made a comment on jira ISPN-3375 [~dan.berindei] Yes, I missed the addition of the combiner in Infinispan 5.2. I just sent a PR to add support for it in RadarGun, so I'll be able to try to see how it affects the performance of this scenario. I am also running this Jenkins job with TRACE enabled for org.jgroups and org.infinispan.remoting: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/user/afield@REDHAT.COM/my-views/view/afield's%20jobs/job/jdg-radargun-mapreduce-test/232/ Dan Berindei <dberinde> made a comment on jira ISPN-3375 I guess the logs were too much, it looks like the job was interrupted after 8 hours and it wasn't even finished with inserting the initial data. Let's wait until we see the results with the PR integrated. Alan Field <afield> made a comment on jira ISPN-3375 [~dan.berindei] Yeah, I could try a run with the previous log4j settings, but I have another bug I need to investigate in the short term. Alan Field <afield> updated the status of jira ISPN-3375 to Reopened Alan Field <afield> made a comment on jira ISPN-3375 I am reopening this issue, because changing the chunkSize has ni impact on the TimeoutException. Ditto what I said in the JIRA Fixing this requires some complex changes in the M/R framework that are not suitable within the scope of JDG 6.2.0. These were discussed with product management (Divya) and scheduled for JDG 6.3. Alan Field <afield> updated the status of jira ISPN-3375 to Resolved Verified that JDG 6.3.0 ER6 does not generate any timeout or suspect exceptions when small values are used in Map/Reduce jobs Excellent! Thanks |