Bug 1381139 - EAP Server won't start up after failover due to JBAS013412: Timeout after [300] seconds waiting for service container stability (disabled sticky sessions, invalidation cache and shared cache store)
Summary: EAP Server won't start up after failover due to JBAS013412: Timeout after [30...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Clustering
Version: 6.4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: jboss-set
QA Contact: Michal Vinkler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-03 08:46 UTC by Michal Vinkler
Modified: 2019-03-01 12:29 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-01 12:29:12 UTC
Type: Bug


Attachments (Terms of Use)

Description Michal Vinkler 2016-10-03 08:46:48 UTC
We can see "JBAS013412: Timeout after [300] seconds waiting for service container stability. Operation will roll back." which causes EAP server to abort startup in our failover tests with invalidation cache and shared cache store with disabled sticky sessions.

Scenario description:
HTTP traffic accessing clustered web application that has replicated sessions (uses a mod_cluster load balancer). Delay between sending a new request after receiving a response is 4000 ms (for each client). Session size is 34 KB.
4-node EAP cluster + 4-node JDG cluster, one EAP node at time is shut down and after some time started again, while 2000 standalone clients keep calling the application.
Sticky sessions are disabled.

Configuration:
 - 4-node EAP cluster with an invalidation cache + a shared cache store (remote JDG cluster)
 - 4-node JDG cluster with distributed cache
 - 4 nodes generating load (2000 clients in total)
 - cache mode: ASYNC or SYNC (for both invalidation and distributed caches, also "write-behind" element is set for "remote-store" element accordingly)


Links to configuration files:
EAP http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-remote-jdg-session-shutdown-invalidation-sync-4nodes-no-sticky-sessions-perf17/4/artifact/report/config/jboss-perf18/standalone-ha.xml
JDG http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-remote-jdg-session-shutdown-invalidation-sync-4nodes-no-sticky-sessions-perf17/4/artifact/report/config/jboss-perf22/clustered.xml


When EAP server is being started up (after previous shutdown), it sometimes logs this error, which causes EAP server to abort startup:

[JBossINF] [0m[31m03:18:03,725 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) JBAS013412: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[
[JBossINF]     ("core-service" => "management"),
[JBossINF]     ("management-interface" => "native-interface")

Before that, it was repeatedly logging these two WARN messages:
WARN  [org.jgroups.protocols.MPING] (MPING) perf19/ejb: discarding discovery request for cluster 'web' from perf19/web; our cluster name is 'ejb'. Please separate your clusters cleanly. 
WARN  [org.jgroups.protocols.MPING] (MPING) perf19/web: discarding discovery request for cluster 'ejb' from perf19/ejb; our cluster name is 'web'. Please separate your clusters cleanly.

Server log:
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-remote-jdg-session-shutdown-invalidation-sync-4nodes-no-sticky-sessions-perf17/4/console-perf19/


Note You need to log in before you can comment on or make changes to this bug.