Bug 1106393

Summary: Managed server shutdown unexpectedly when timeout during connection request to HC
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Takayoshi Kimura <tkimura>
Component: Domain ManagementAssignee: John Allen <joallen>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Kremensky <pkremens>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2.3CC: brian.stansberry, dandread, federico, hnaram, jlee, joallen, kkhan, klape, pyadav, sjadhav, wfink, yqu
Target Milestone: DR1   
Target Release: EAP 6.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
In a previous version of JBoss EAP 6, after a managed server's connection to it's Host Controller failed, it would only make a single re-connection attempt. This could cause the product to shut down unexpectedly if the re-connection failed. In this release, connections to the Host Controller are re-tried indefinitely. Server instances no longer shut down due to loss of connection to the Host Controller.
Story Points: ---
Clone Of:
: 1140453 1153383 1186949 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1140453, 1153383, 1186949    

Description Takayoshi Kimura 2014-06-09 08:52:49 UTC
Sometimes managed server shutdown unexpectedly when managed server tries to connect to the HC and got lengthly Full GC.

java.io.IOException: JBAS012175: Channel closed
	at org.jboss.as.server.mgmt.domain.HostControllerConnection.getChannel(HostControllerConnection.java:101)
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:117)
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:100)
	at org.jboss.as.server.mgmt.domain.HostControllerConnection.reConnect(HostControllerConnection.java:171)
	at org.jboss.as.server.mgmt.domain.HostControllerClient.reconnect(HostControllerClient.java:98)
	at org.jboss.as.server.DomainServerMain.main(DomainServerMain.java:138)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.jboss.modules.Module.run(Module.java:292)
	at org.jboss.modules.Main.main(Main.java:455)

The current DomainServerMain exits on the connetion error.

The scenario is:

1. server tries to connect to the HC
2. server Full GC
3. HC is waiting a request from server, read timeout, closes the sock
4. server resumed from GC, failed to send the req to the HC and exit

Comment 2 kylin 2014-07-10 07:17:22 UTC
I have a customer hit this issue again, I extract some log as [1], this may helpful for resolving the issue.

The log [1] contain the following info:

1. Server almost exhausted, from 15:04:15 to 15:11:30, more than 7 minutes no log output, this may caused by gc, OS level issue, VM hypervisor issues(customer use VMware virtual platform)

2. Server no shut down log output after 15:14:12, but the Server exit time should be 15:18:12, we can find evidence from PC log

3. HC hit Read timed out at 15:10:01, at the same time Server keep stuck as step 1

4. PC monitor Server exit, receive server exit at 15:18:12, this hints Server exit at 15:18:12


[1] https://github.com/kylinsoong/wildfly-samples/blob/master/domain/bug-1106393-log.md

Comment 3 Emanuel Muckenhuber 2014-07-10 09:46:29 UTC
Which JVM version are you using? Apparently using JDK 7 was helping with this issues, we are going to fix this though in a future release.

Comment 13 Patrick 2014-09-19 09:23:12 UTC
Adding a new case 01183081 to the list of impacted customers.

Thanks

Comment 14 Petr Kremensky 2014-10-07 08:29:05 UTC
Server process now waits until the host-controller is available again and reconnects.

Verified on EAP 6.4.0.DR3

Comment 18 Federico Bellizia 2017-03-22 13:59:10 UTC
Good morning, 
     we analyzed similar problem on our EAP 6 installation and we found a correlation bewteen high swap usage and HC-DC Disconnection.

Full GC on a low memory machine cause this problems.

Solution, upgrade ram of machine.