Bug 1029010
Summary: | Session failover not working, reporting TimeoutException in ClusteredSession.access | ||
---|---|---|---|
Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Hynek Mlnarik <hmlnarik> |
Component: | Clustering | Assignee: | Paul Ferraro <paul.ferraro> |
Status: | CLOSED DUPLICATE | QA Contact: | Jitka Kozana <jkudrnac> |
Severity: | high | Docs Contact: | Russell Dickenson <rdickens> |
Priority: | unspecified | ||
Version: | 6.2.0 | CC: | akostadi, pslavice, rhusar |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-12-12 16:53:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hynek Mlnarik
2013-11-11 13:25:01 UTC
There are also other relevant server exceptions. In my opinion they are related to the exception in the bug description so I'm not filing another issue but adding to this one. > 09:52:43,121 WARN [org.jboss.as.clustering.web.infinispan] (Incoming-8,shared=tcp) JBAS010325: Possible concurrency problem: Replicated version id 31 is less than or equal to in-memory version for session jt-Hvt0b9x0YDY088h83XR8h ... > 09:54:26,168 ERROR [org.jboss.as.clustering] (Incoming-1,shared=tcp) JBAS010223: ViewAccepted failed: java.lang.IllegalStateException: JBAS010240: Address 8abf6d5e-45dc-c7dc-2f86-72794f5f87ff not registered in transport layer ... > 09:54:26,687 ERROR [org.jboss.as.clustering] (Incoming-10,shared=tcp) JBAS010223: ViewAccepted failed: java.lang.IllegalStateException: JBAS010240: Address a87af105-2853-f02d-cee2-041c74bd1399 not registered in transport layer ... > 09:54:26,825 WARN [org.jgroups.protocols.TCP] (ajp-ec2-54-234-10-125.compute-1.amazonaws.com/10.240.62.148:8009-39) null: logical address cache didn't contain all physical address, sending up a discovery request ... All the above lines are from config/jboss-ec2-54-234-10-125.compute-1.amazonaws.com/server.log Server logs of all servers participating in the test scenario can be found in the already attached archive. FYI during the test there were 4 servers in a cluster and 200 concurrent clients. One by one servers are killed (kill -9) and then started again. I see failures after 3 of the four server kills. Which is IMO very strange. I'm running the same test dozens of times without being able to reproduce. So having this issue happen 3 out of 4 times in a single test must mean something. I'll be running the test till tomorrow but I'm losing hope we can reproduce. P.S. for easier log reading, here are the times at which each server was killed: > domU-12-31-39-04-39-66_sf.log:2013/11/06 09:55:36:358 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep 'o8POK1sF4CPtw7QU' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`; > domU-12-31-39-06-90-08_sf.log:2013/11/06 09:51:49:086 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep 'HLDqw0ark9XcMNCM' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`; > ip-10-138-52-150_sf.log:2013/11/06 10:03:05:153 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep 'Hdf06hHdi5gDdU7g' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`; > ip-10-151-106-31_sf.log:2013/11/06 09:59:19:216 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep '1aMS9DHcEPZFda3K' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`; Running for extended periods of time I couldn't reproduce the above issue. The only thing is that in 3 runs relatively close to each other in time, I got total of 4 failed samples:
> SFCORE_LOG - Error sampling data: <org.jboss.smartfrog.loaddriver.RequestProcessingException: Stale session data received. Expected 140, received 139, Runner: 186>
I think this is unrelated though and expected to happen rarely.
*** This bug has been marked as a duplicate of bug 993041 *** |