There are lots of SuspectExceptions in resilience test run: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-resilience-dist-4-3/54/artifact/report/serverlogs.zip We managed to keep these under cover for 5.1.x branch. This is of course a cosmetic problem, but it polutes the log with unnecessary ERRORs. Parsed logs here: http://www.qa.jboss.com/~mlinhard/test_results/run54-parsed/
Created ISPN JIRA
Mircea Markus <mmarkus> made a comment on jira ISPN-2577 This has more of a cosmetic impact, but nice to have for the final.
Michal Linhard <mlinhard> made a comment on jira ISPN-2577 I didn't want to create a new JIRA this kind of cosmetic task, but if we're doing this SuspectException silencing, let's sync it with behaviour on hotrod client. If we have a RetryOnFailureOperation should it log an error (in the retry mode) ? My proposal would be switching this to warning. {code} 09:25:19,154 ERROR [org.jboss.smartfrog.jdg.loaddriver.DriverThread] (DriverThread-378) Error doing: PUT key655878 to node node0003, took 1395 ms org.infinispan.client.hotrod.exceptions.RemoteNodeSuspecException:Request for message id[1981607] returned server error (status=0x85): org.infinispan.remoting.transport.jgroups.SuspectException: One or more nodes have left the cluster while replicating command SingleRpcCommand{cacheName='testCache', command=PutKeyValueCommand{key=ByteArrayKey{data=ByteArray{size=12, hashCode=727c8fa, array=0x033e096b65793635..}}, value=CacheValue{data=ByteArray{size=1029, array=0x034304002106020a..}, version=9007207845383422}, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true}} at org.infinispan.client.hotrod.impl.protocol.Codec10.checkForErrorsInResponseStatus(Codec10.java:153) at org.infinispan.client.hotrod.impl.protocol.Codec10.readHeader(Codec10.java:110) at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:78) at org.infinispan.client.hotrod.impl.operations.AbstractKeyValueOperation.sendPutOperation(AbstractKeyValueOperation.java:72) at org.infinispan.client.hotrod.impl.operations.PutOperation.executeOperation(PutOperation.java:52) at org.infinispan.client.hotrod.impl.operations.PutOperation.executeOperation(PutOperation.java:41) at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:68) at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:231) at org.infinispan.CacheSupport.put(CacheSupport.java:53) at org.jboss.qa.jdg.adapter.HotRodAdapter$HotRodRemoteCacheAdapter.put(HotRodAdapter.java:247) at org.jboss.qa.jdg.adapter.HotRodAdapter$HotRodRemoteCacheAdapter.put(HotRodAdapter.java:232) at org.jboss.smartfrog.jdg.loaddriver.DriverThreadImpl.makeRequest(DriverThreadImpl.java:236) at org.jboss.smartfrog.jdg.loaddriver.DriverThreadImpl.run(DriverThreadImpl.java:331) {code}
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2577 We should also silence situations like ISPN-2752. In that JIRA, a node is trying to establish a new view sending a cache topology control view, but the node is stopping.
Nominated for 6.2 release notes.
Still present in JDG 6.2.0.DR3
Anuj Shah <anujshahwork> made a comment on jira ISPN-2577 There's a discussion here: https://community.jboss.org/message/846155 This issue may be more than cosmetic
Michal Linhard <mlinhard> made a comment on jira ISPN-2577 This issue is about silencing the SuspectExceptions - not displaying them as error level message, since it's not an exceptional situation that a node is suspected during topology changes. Of course currently SuspectExceptions may accompany other serious errors, but those should be logged separately.
Still present in 6.2.0.ER4
Dan Berindei <dberinde> updated the status of jira ISPN-2577 to Resolved