Bug 1206590
| Summary: | Nearcache broken after SocketTimeoutException | ||
|---|---|---|---|
| Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Vojtech Juranek <vjuranek> |
| Component: | Server | Assignee: | Galder ZamarreƱo <galder.zamarreno> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Martin Gencur <mgencur> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.5.0 | CC: | jdg-bugs, pzapataf, rmarwaha, slaskawi, ttarrant |
| Target Milestone: | ER4 | ||
| Target Release: | 6.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: When near cache doesn't receive remote events for some time (e.g. cluster is inactive for some time or client doesn't load any new data), connection to remote JDG server fails with java.net.SocketTimeoutException.
Consequence: Near cache is not able to receive any new remote events from remote JDG server, which makes near unusable.
Fix: NA
Result:
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-06-23 12:25:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Vojtech Juranek
2015-03-27 13:26:42 UTC
Still see the exception (see bellow, it's from https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-CS/job/jdg-perf-cs-near-caching/57/console-edg-perf07/). However, substantial part of the responses measured by RadarGun are pretty fast (about 2 us, see e.g. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-CS/job/jdg-perf-cs-near-caching/57/artifact/results/html/histogram_near-caching-eager_BasicOperations.Get_Infinispan_HotRod_Remote_listeners_1_0_total.png), which suggests, that the near cache is actually used, so maybe the error is actually not unrecoverable as it states. 09:54:09,409 ERROR [org.infinispan.client.hotrod.event.ClientListenerNotifier] (Client-Listener-9c1feb28604f4289) ISPN004043: Unrecoverable error reading event from server /172.18.1.5:11222, exiting event reader thread org.infinispan.client.hotrod.exceptions.TransportException:: java.net.SocketTimeoutException at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.readByte(TcpTransport.java:184) at org.infinispan.client.hotrod.impl.protocol.Codec20.readMagic(Codec20.java:282) at org.infinispan.client.hotrod.impl.protocol.Codec20.readEvent(Codec20.java:126) at org.infinispan.client.hotrod.event.ClientListenerNotifier$EventDispatcher.run(ClientListenerNotifier.java:236) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.readByte(TcpTransport.java:179) ... 8 more > which suggests, that the near cache is actually used, so maybe the error is
> actually not unrecoverable as it states.
well, I realized, that the timeout is not due to warm up phase, but due to fact that whole data are loaded in near cache, so all data are read from near cache and thus connection to the server times out (i.e. the exception is probably really unrecoverable)
Looking at the code, a SocketTimeoutException is to be expected: the clientlistenernotifier blocks on the connection waiting for data to appear. If no events happen within the set timeout, the socket timeout will trigger. This is pretty normal behaviour and the client should just retry ignoring the exception. I don't why the error is being logged though. my fault, I use JDG ER4 for server, but forgot to rebuild RG with ER4 and thus RG use old HR client. Thanks Alan for pointing this out! Moving back to ON_QA. Sorry for that! verified, sorry for the rush once again |