Bug 1017376
Summary: | 4 hour REST soak test With JDG 6.2.0 ER1 starts getting TimeoutException putting values and shows large thread spike | ||
---|---|---|---|
Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Alan Field <afield> |
Component: | Performance | Assignee: | Nobody <nobody> |
Status: | VERIFIED --- | QA Contact: | Martin Gencur <mgencur> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.2.0 | CC: | afield |
Target Milestone: | --- | ||
Target Release: | 6.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
JGroups subsystem may deplete its threadpools with all threads deadlocked (waiting for credits). Requests to this node are not handled, the node itself may issue requests but it never gets the response, which results with TimeoutExceptions from Infinispan.
In this situation the node still stays as a part of the cluster.
This problem applies both to embedded and server mode.
Workaround is increasing UFC.max_credits and MFC.max_credits to high value (suggested value is 10M) or removing these protocols from the JGroups configuration at all.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1017190 |
Description
Alan Field
2013-10-09 18:10:35 UTC
According to Radim, this is another instance of the JGroups deadlock described in JGRP-1675. thanks, that should explain in. btw, is there any way in which we can get a thread dump is such a situation. That should help a lot with the investigation. This execution of the REST soak test executed with increased FC credits completed without the TimeoutExceptions and thread spike: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-CS/job/jdg-perf-cs-soak-test-rest/29 Mircea, I don't believe we have an automated way to generate a thread dump, but there is a Byteman script that is triggered when the thread pool reaches a certain size. I am going to try and run the job again with the original configuration to see if I can trigger that script. It appears that setting MFC and UFC max_credits to 10M or removing these protocols at all is a workaround for this issue. Removing this issues from Beta blockers as it has a workaround. Dan Berindei <dberinde> made a comment on jira JGRP-1675 I have a test that hangs reliably on my machine with 3.4.0.Final: https://github.com/danberindei/infinispan/blob/73ca65f9b629698da68916c720ec4efa0d97e2b4/core/src/test/java/org/infinispan/stress/RemoteGetStressTest.java The test is using Infinispan, but it should be pretty easy to replicate it with JGroups only if necessary. This is what it does: * Populate a distributed cache with huge values (100kb), making sure the coordinator doesn't own any entries. * Launch 500 remote get operations in parallel from a single node, resulting in 1500 response messages (numOwners is 3). * Slow down the processing of response messages by adding a {{Thread.sleep(Random.nextInt(10))}} to keep the OOB threads busy. Interesting enough, with max_credits=1m and min_threshold=0.40, the FC logs always show at least 390k remaining credits. Yet the remote get operations still time out after 60 seconds, and the OOB threads on most nodes are stuck in {{FlowControl.decrementIfEnoughCredits()}}. Bela Ban <bela> updated the status of jira JGRP-1675 to Reopened Bela Ban <bela> made a comment on jira JGRP-1675 Tried to replicate the stress test in JGroups (RemoteGetStressTest ia attached). With 500 threads, the tests completes within 60 and 90 seconds. This is with a random delay de-serializing the BigObject between 1 and 10 ms, and with a DISCARD up-rate of 20%. If {{USE_SLEEPS}} is changed to false, the test finishes within a few second (between 2 and 5 secs). Bela Ban <bela> made a comment on jira JGRP-1675 Tried to replicate the stress test in JGroups (RemoteGetStressTest ia attached). With 500 threads, the tests completes within 60 and 90 seconds. This is with a random delay de-serializing the BigObject between 1 and 10 ms, and with a DISCARD up-rate of 20%. If {{USE_SLEEPS}} is changed to false, the test finishes within a few second (between 2 and 5 secs). This is with an OOB pool of min=1/max=5/no-queue. The delaying and discarding certainly slows things downm, but I never experience a deadlock. Bela Ban <bela> made a comment on jira JGRP-1675 The test is on branch JGRP-1675: https://github.com/belaban/JGroups/blob/JGRP-1675/tests/stress/org/jgroups/tests/RemoteGetStressTest.java Bela Ban <bela> made a comment on jira JGRP-1675 Tried to replicate the stress test in JGroups ({{RemoteGetStressTest}} is attached). With 500 threads, the tests completes between 60 and 90 seconds. This is with a random delay de-serializing the BigObject between 1 and 10 ms, and with a DISCARD up-rate of 20%. If {{USE_SLEEPS}} is changed to false, the test finishes within a few seconds (between 2 and 5 secs). This is with an OOB pool of min=1/max=5/no-queue. The delaying and discarding certainly slows things down, but I never experienced a deadlock. Bela Ban <bela> made a comment on jira JGRP-1675 Works (= doesn't block) with jgroups-udp.radim.xml. The different is the larger thread pools in Radim's config. Bela Ban <bela> made a comment on jira JGRP-1675 The test to see if this blocks or not is Dan's UPerf2 (attached) Dan Berindei <dberinde> made a comment on jira JGRP-1675 I was able to reproduce the deadlock again with an updated test, here: https://github.com/danberindei/JGroups/blob/JGRP-1675/tests/stress/org/jgroups/tests/RemoteGetStressTest.java It looks like a regular message can be processed on the INT thread pool: {noformat} "INT-2,cluster,D@1395" prio=5 tid=0x32 nid=NA waiting java.lang.Thread.State: WAITING at java.lang.Object.wait(Object.java:-1) at org.jgroups.protocols.FlowControl$Credit.decrementIfEnoughCredits(FlowControl.java:567) at org.jgroups.protocols.UFC.handleDownMessage(UFC.java:121) at org.jgroups.protocols.FlowControl.down(FlowControl.java:330) at org.jgroups.protocols.FlowControl.down(FlowControl.java:340) at org.jgroups.protocols.FRAG2.fragment(FRAG2.java:243) at org.jgroups.protocols.FRAG2.down(FRAG2.java:122) at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1024) at org.jgroups.JChannel.down(JChannel.java:760) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:684) at org.jgroups.blocks.RequestCorrelator.sendResponse(RequestCorrelator.java:516) at org.jgroups.blocks.RequestCorrelator.sendReply(RequestCorrelator.java:506) at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:476) at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:374) at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:247) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:666) at org.jgroups.JChannel.up(JChannel.java:730) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1019) at org.jgroups.protocols.FRAG2.up(FRAG2.java:182) at org.jgroups.protocols.FlowControl.up(FlowControl.java:434) at org.jgroups.protocols.FlowControl.up(FlowControl.java:434) at org.jgroups.stack.Protocol.up(Protocol.java:409) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294) at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:796) at org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:752) at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:466) at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:662) at org.jgroups.stack.Protocol.up(Protocol.java:409) at org.jgroups.protocols.TP.passBatchUp(TP.java:1422) at org.jgroups.protocols.TP$BatchHandler.run(TP.java:1574) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} While debugging, I was able to look at the message batch in UNICAST3.handleBatchReceived, and see that it still has a REPLENISH message waiting to be delivered. Dan Berindei <dberinde> made a comment on jira JGRP-1675 There seems to be another scenario where the deadlock happens right at the beginning of the test and the originators don't receive any complete responses: UFC:REPLENISH messages are usually sent after a message has been delivered to the application. If the application also wants to send a response message, and the response message can't be sent because of insufficient credits, the node won't send REPLENISH messages. The thread that is trying to send the response will periodically wake up and send a CREDIT_REQUEST message. But the CREDIT_REQUEST message doesn't have the OOB flag, so it will be blocked in UNICAST3 if the recipient is also overloaded and already dropped some OOB messages. Radim Vansa <rvansa> made a comment on jira JGRP-1675 [~dan.berindei]: This is the original description of this bug :) CREDIT_REQUEST is not marked as OOB by purpose (there's even a comment about that when the message is created) - the sender (which sends the credit request) should be blocked and NOT send any more messages until the previous ones get processed. The C_REQ shouldn't be needed at all because receiver should keep the credit amount on the sender side and as it approaches the threshold, send him the credit replenishment (and this is OOB). As you speak about deadlock: the question is why the messages are not eventually retransmitted so that the request is processed - retransmissions are not blocked by FlowControl. Bela Ban <bela> made a comment on jira JGRP-1675 UPerf2's GET request uses a GET_FIRST on 2 members. This clogs up the receivers' OOB thread pools, and thus credits are dropped. Sending the CREDIT_REQUESTS as OOB fixes this, but the performance is still slow because of the exhaused OOB pool. I suggest send the GET as a unicast to the owner and if the owner doesn't respond within N ms (perhaps computed as 30% over the rolling avg time for GETs), then try the other owner (essentially staggered GETs as suggested by Manik). Alternatively, a GET could be an anycast, but the target list should include only the primary owner, conceptually this is the same as above. Radim Vansa <rvansa> made a comment on jira JGRP-1675 The credits (REPLENISH messagge) is not dropped, it is not sent at all. As OOB TP is depleted, there is no delivered message which would trigger the REPLENISH. The CREDIT_REQUEST, that could trigger the REPLENISH, is not delivered either (processed in INT TP) because it is moved to UNICASTx tables and waits for sequential delivery to FlowControl after the rest of OOB messages. Radim Vansa <rvansa> made a comment on jira JGRP-1675 The credits (REPLENISH message) is not dropped, it is not sent at all. As OOB TP is depleted, there is no delivered message which would trigger the REPLENISH. The CREDIT_REQUEST, that could trigger the REPLENISH, is not delivered either (processed in INT TP) because it is moved to UNICASTx tables and waits for sequential delivery to FlowControl after the rest of OOB messages. Bela Ban <bela> made a comment on jira JGRP-1675 Both RemoteGetStressTest and UPerf2 pass now. Dan Berindei <dberinde> made a comment on jira JGRP-1675 [~rvansa] Sorry, I thought the original issue was fixed when the internal thread pool was enabled so I didn't read the description again. [~belaban], there is still the issue of regular messages being processed on the INT thread pool and potentially blocking the processing of REPLENISH messages. Looks like the INTERNAL messages are correctly separated in a batch in TP.receive(), but then in UNICAST3, Table.removeMany() can return a list containing all kinds of messages - there's no way to request just the messages marked INTERNAL, and the INT thread will process all of them. Bela Ban <bela> made a comment on jira JGRP-1675 Internal pool doesn't exist in 3.2, fix is only applicable to 3.5 Bela Ban <bela> made a comment on jira JGRP-1675 [~dan.berindei] Yes, it is possible that an INT or OOB thread will process messages with different flags in UNICAST or NAKACK. This is as designed, and the probability of this happening is proportional to the thread pool sizes and the types of messages received. So if we have a large OOB pool and receive a lot of OOB messages and only a few regular messages, then chances are that the regular messages will be batched together with the OOB messages in UNICAST or NAKACK and will get passed up to the application via an OOB thread. Radim Vansa <rvansa> made a comment on jira JGRP-1675 I'd prefer JGroups to be always working rather than working with probability dependent on INT:OOB threadpool ratio/messages frequency/dark magic. INT TP has much lower value if regular messages which are non-blocking from application perspective may get the thread stuck -> lead to threadpool depletion. I agree with dan that listing messages according. [~belaban]: Could you specify what's the fix (as this is fixed in 3.5)? Marking the FlowControl messages ad DONT_BUNDLE, telling ISPN developers to fix GET_FIRST flooding or something else? Does the comment about the two tests passing mean "passing as long as these send the get with GET_ALL" (the number of targets doesn't matter)? Radim Vansa <rvansa> made a comment on jira JGRP-1675 I'd prefer JGroups to be always working rather than working with probability dependent on INT:OOB threadpool ratio/messages frequency/dark magic. INT TP has much lower value if regular messages which are non-blocking from application perspective may get the thread stuck -> lead to threadpool depletion. I agree with Dan that listing messages according to flags would make the system more reliable. [~belaban]: Could you specify what's the fix (as this is fixed in 3.5)? Marking the FlowControl messages ad DONT_BUNDLE, telling ISPN developers to fix GET_FIRST flooding or something else? Does the comment about the two tests passing mean "passing as long as these send the get with GET_ALL" (the number of targets doesn't matter)? Radim Vansa <rvansa> made a comment on jira JGRP-1675 I'd prefer JGroups to be always working rather than working with probability dependent on INT:OOB threadpool ratio/messages frequency/dark magic. INT TP has much lower value if regular messages which are non-blocking from application perspective may get the thread stuck -> lead to threadpool depletion. I agree with Dan that listing messages according to flags would make the system more reliable. [~belaban]: -Could you specify what's the fix (as this is fixed in 3.5)? Marking the FlowControl messages ad DONT_BUNDLE, telling ISPN developers to fix GET_FIRST flooding or something else? Does the comment about the two tests passing mean "passing as long as these send the get with GET_ALL" (the number of targets doesn't matter)?- OK, I've found in another mail that it is marking the credit request as INT | OOB, but it should be mentioned more clearly somewhere in the JIRA. Bela Ban <bela> updated the status of jira JGRP-1675 to Resolved |