Hide Forgot
project_key: EDG In resilience tests we're seeing connection refused shortly after restart of the node. we have 4 nodes perf17-perf20. we're failing perf19. exactly after the perf19 finishes it's join rehash ([JBoss] 05:00:07,062 INFO [JoinTask] perf19-58461 completed join rehash in 16.22 seconds!) the driver nodes (perf02-perf10) start trying to connect to it and it's not yet ready to receive the connections. Is there a period of time between the new node is officially in cluster (and therefore hotrod clients obtain it via topology change piggybacking) and the hotrod server is started ? Shouldn't we eliminate this period ? the affected run is: http://hudson.qa.jboss.com/hudson/view/EDG/job/edg-51x-resilience-client-size4-hotrod/58/ I realized that there are sampling errors not only during node failure but also during node recovery (even more than during failure) and they are the mentioned connection refused exceptions.
results.ods - attaching compiled data from the hudson run. the approximate times of fail and restore events are marked in the table.
Attachment: Added: results.ods
Michal, does this need looking into?
I'll verify this one, it might be applicable also to EDG6 Alpha
This will take a bit longer, I'll need to get resilience tests going.
This is now obsolete, when smilar thing occurs for EDG6, we'll create a new JIRA.
Docs QE Status: Removed: NEW