Bug 1464323

Summary: Server started with blocking=true times out only on a slave host
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Osamu Nagano <onagano>
Component: Domain ManagementAssignee: Jiri Ondrusek <jondruse>
Status: CLOSED NOTABUG QA Contact: Ivo Hradek <ihradek>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4.14CC: brian.stansberry, dandread, jboss_lr, jondruse, onagano, plohia
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-30 06:44:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
repro.zip none

Description Osamu Nagano 2017-06-23 05:52:44 UTC
Created attachment 1290889 [details]
repro.zip

Description of problem:
We use system property "jboss.as.management.blocking.timeout" for a slow starting server to avoid a timeout on startup. It works for a server on the master host-controller but doesn't work a server on a slave host.


Version-Release number of selected component (if applicable):
EAP 6.4.14, 6.4.16, 7.0.6


How reproducible:
Always


Steps to Reproduce:
It's easily reproducible using a web application that sleeps more than 300 seconds in a ServletContextListener. Configuration files and such a test application are attached as repro.zip.
1. Start master host-controller using domain.xml and host-master.xml in repro.zip.
2. Start slave host-controller using host-slave.xml in repro.zip.
3. Deploy test.war in repro.zip to main-server-group.
3. Start a server on the slave with blocking=true
[domain@localhost:9990 /] /host=slave/server-config=server-one:start(blocking=true)


Actual results:
In CLI:
~~~
[domain@localhost:9999 /] /host=slave/server-config=server-one:start(blocking=true)
{
    "outcome" => "failed",
    "result" => undefined,
    "failure-description" => "JBAS013496: Execution of operation 'start' on remote process at address '[(\"host\" => \"slave\")]' timed out after 305000 ms while awaiting initial response; remote proc
ess has been notified to terminate operation",
    "rolled-back" => true
}
~~~

In the server.log:
~~~
10:18:42,893 INFO  [stdout] (ServerService Thread Pool -- 56) Sleeping for 301 secconds...
10:18:43,458 INFO  [org.jboss.as.server] (main) JBAS015984: ProcessController has signalled to shut down; shutting down
10:18:43,472 INFO  [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-3) JBAS010409: Unbound data source [java:jboss/datasources/ExampleDS]
10:18:43,479 INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-1) JBWEB003075: Coyote HTTP/1.1 pausing on: http-127.0.1.1:8080
10:18:43,479 INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-1) JBWEB003077: Coyote HTTP/1.1 stopping on : http-127.0.1.1:8080
10:18:43,483 INFO  [org.apache.catalina.core] (MSC service thread 1-4) JBWEB001079: Container org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/] has not been started
10:18:43,894 INFO  [stdout] (ServerService Thread Pool -- 56) Sleeping for 302 secconds...
~~~


Expected results:
No errors in CLI and the server.log and the server is able to start.


Additional info:
Without blocking=true, which is the default, the server on the slave is able to start.
For a server on the master is able to start even with blocking=true (/host=master/server-config=server-zero:start(blocking=true)).

Comment 1 Osamu Nagano 2017-06-26 06:50:30 UTC
Upstream JIRA ticket is created and linked.

Comment 2 Brian Stansberry 2017-06-26 13:56:42 UTC
The system property in domain.xml doesn't affect the function of the Host Controllers including the master DC. Did you pass the property to DC as well by including -Djboss.as.management.blocking.timeout=xxx in domain.conf?

Comment 3 Osamu Nagano 2017-06-27 01:16:10 UTC
Hi Brian, as in WFCORE-3008, that solves the issue. Thanks.