Bug 1106393

Summary:	Managed server shutdown unexpectedly when timeout during connection request to HC
Product:	[JBoss] JBoss Enterprise Application Platform 6	Reporter:	Takayoshi Kimura <tkimura>
Component:	Domain Management	Assignee:	John Allen <joallen>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Petr Kremensky <pkremens>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	6.2.3	CC:	brian.stansberry, dandread, federico, hnaram, jlee, joallen, kkhan, klape, pyadav, sjadhav, wfink, yqu
Target Milestone:	DR1
Target Release:	EAP 6.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	In a previous version of JBoss EAP 6, after a managed server's connection to it's Host Controller failed, it would only make a single re-connection attempt. This could cause the product to shut down unexpectedly if the re-connection failed. In this release, connections to the Host Controller are re-tried indefinitely. Server instances no longer shut down due to loss of connection to the Host Controller.	Story Points:	---
Clone Of:
Clones:	1140453 1153383 1186949 (view as bug list)		Environment:
Last Closed:		Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1140453, 1153383, 1186949

Description Takayoshi Kimura 2014-06-09 08:52:49 UTC

Sometimes managed server shutdown unexpectedly when managed server tries to connect to the HC and got lengthly Full GC.

java.io.IOException: JBAS012175: Channel closed
	at org.jboss.as.server.mgmt.domain.HostControllerConnection.getChannel(HostControllerConnection.java:101)
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:117)
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:100)
	at org.jboss.as.server.mgmt.domain.HostControllerConnection.reConnect(HostControllerConnection.java:171)
	at org.jboss.as.server.mgmt.domain.HostControllerClient.reconnect(HostControllerClient.java:98)
	at org.jboss.as.server.DomainServerMain.main(DomainServerMain.java:138)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.jboss.modules.Module.run(Module.java:292)
	at org.jboss.modules.Main.main(Main.java:455)

The current DomainServerMain exits on the connetion error.

The scenario is:

1. server tries to connect to the HC
2. server Full GC
3. HC is waiting a request from server, read timeout, closes the sock
4. server resumed from GC, failed to send the req to the HC and exit

Comment 2 kylin 2014-07-10 07:17:22 UTC

I have a customer hit this issue again, I extract some log as [1], this may helpful for resolving the issue.

The log [1] contain the following info:

1. Server almost exhausted, from 15:04:15 to 15:11:30, more than 7 minutes no log output, this may caused by gc, OS level issue, VM hypervisor issues(customer use VMware virtual platform)

2. Server no shut down log output after 15:14:12, but the Server exit time should be 15:18:12, we can find evidence from PC log

3. HC hit Read timed out at 15:10:01, at the same time Server keep stuck as step 1

4. PC monitor Server exit, receive server exit at 15:18:12, this hints Server exit at 15:18:12


[1] https://github.com/kylinsoong/wildfly-samples/blob/master/domain/bug-1106393-log.md

Comment 3 Emanuel Muckenhuber 2014-07-10 09:46:29 UTC

Which JVM version are you using? Apparently using JDK 7 was helping with this issues, we are going to fix this though in a future release.

Comment 8 James Livingston 2014-09-03 22:56:50 UTC

Merged upstream PR: https://github.com/wildfly/wildfly-core/pull/150
Upstream commit: https://github.com/wildfly/wildfly-core/commit/70e2286fa6e737df2c4daa5b7f2330a8bd6d43fb

Comment 13 Patrick 2014-09-19 09:23:12 UTC

Adding a new case 01183081 to the list of impacted customers.

Thanks

Comment 14 Petr Kremensky 2014-10-07 08:29:05 UTC

Server process now waits until the host-controller is available again and reconnects.

Verified on EAP 6.4.0.DR3

Comment 18 Federico Bellizia 2017-03-22 13:59:10 UTC

Good morning, 
     we analyzed similar problem on our EAP 6 installation and we found a correlation bewteen high swap usage and HC-DC Disconnection.

Full GC on a low memory machine cause this problems.

Solution, upgrade ram of machine.