Created attachment 1150524 [details] thread dumps Description of problem: I found EAP7 domain controller marked as down after running ~8 days without problems. The resource has been marked as down since following msg in agent.log: 2016-04-23 11:15:12,001 WARN [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=13648, uuid=d8969534-6e2a-4cff-b0bd-bb395f48bfcc, type={EAP7}EAP7 Host Controller, key=hostConfig: /home/hudson/jboss-eap7-domain/domain/configuration/host-secured-changed.xml, name=EAP 7 Domain Controller (master 0.0.0.0:8443), parent=fbr-eap7-sec.bc.jonqe.lab.eng.bos.redhat.com, version=EAP 7.0.0.GA] - cause: java.lang.Exception:Failed to read attribute [name] of address [] - response: Result{outcome='failure', failureDescription=The server closed the connection before sending the response, rolled-back=false, rolledBack=false} The EAP domain controller process was still running on the host but the agent was throwing following errors periodically and the resource was marked as down: 2016-04-25 04:21:04,729 DEBUG [ResourceContainer.invoker.availCheck.daemon-69] (rhq.modules.plugins.wildfly10.BaseServerComponent)- ResourceType[id=0, name=EAP7 Host Controller, plugin=EAP7, category=Server] [hostConfig: /home/hudson/jboss-eap7-domain/domain/configuration/host-secured-changed.xml]: exception while checking availability java.lang.Exception: Failed to read attribute [name] of address [] - response: Result{outcome='failure', failureDescription=The server closed the connection before sending the response, rolled-back=false, rolledBack=false} at org.rhq.modules.plugins.wildfly10.BaseComponent.readAttribute(BaseComponent.java:835) at org.rhq.modules.plugins.wildfly10.BaseComponent.readAttribute(BaseComponent.java:805) at org.rhq.modules.plugins.wildfly10.BaseServerComponent.getAvailability(BaseServerComponent.java:145) at org.rhq.core.pc.inventory.AvailabilityProxy.call(AvailabilityProxy.java:143) at org.rhq.core.pc.inventory.AvailabilityProxy.call(AvailabilityProxy.java:52) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Restart of agent didn't resolve the issue. The issues disappeared when the EAP was restarted. Version-Release number of selected component (if applicable): JON3.3.5 with EAP plugin pack 04-ER01 How reproducible: Not sure Steps to Reproduce: 1. install/start EAP7 in domain mode with secured management (2-way ssl) 2. import it to JON and set correct Connection properties to have it UP 3. wait ~8days Actual results: The resource is marked as down and agent is throwing previous err on debug level. Expected results: It should continue to work. Additional info: Discussed the issue with EAP QE and they are not aware of any similar issue on EAP 7 side. It was suggested that EAP got into this "broken" state because of interaction with JON agent which is causing exhaustion of connection pool. Thread dumps of agent, domain controller and host controller processes are attached. Suggested steps to get more info next time the issue is hit: - try to connect with jboss-cli directly to see if it's refused as well - use javax.net.debug=all to log handshake details - capture traffic via tcpdump -w (-w option is important to be able to import results to wireshark which is able to interpret SSL handshake)
Forgot to mention that the issue is not visible on EAP6
Created attachment 1150526 [details] tcpdump Adding tcpdump, unfortunately without -w option
Do we know if there were many (parallel) tcp connections open between agent and the DC?
(In reply to Heiko W. Rupp from comment #3) > Do we know if there were many (parallel) tcp connections open between agent > and the DC? Something like regular netstat -an | grep 999 to see how many connections end up at the EAP. See if that number grows
I didn't save output of netstat but I didn't notice suspiciously large amount of connections. But I don't remember if It was before or after restart of agent. I have thread dump of both agent and DC. Is it possible to guess number of connections from there?
Could you attach the logs?
All I have is attached. Complete agent log was unfortunately rolled out because it's keeping only two 5000KB backup files by default and debug mode is so noisy. Only relevant msgs which I found before it was rolled out are pasted in description. Nothing in eap logs (host-controller.log, process-controller.log)
I tried to move system date 10 days forward and it was not reproduced this way.
I will run following curl periodically to simulate the operation which is failing (exception above): curl --insecure --cert-type pem --key key.pem --cert ./cert.pem "https://127.0.0.1:844/management" --header "Content-Type: application/json" -d '{"operation":"read-attribute","name":"name","json.pretty":1}'
We are running EAP with two way ssl setup for management interface and sending request once in a second. Request is similar to the one Filip describes in Comment #10. It is now running for more than 3 days not seeing any issue so far.
I now have 3 instances in "broken" state. It happened again after 8.1 days. - it's not necessary to use 2-way SSL, the issue is visible when using 1-way SSL as well - it's irrelevant if there is another EAP 7 running on the host - EAP 7 standalone mode is not yet affected by this issue, but it might be later?? - jboss-cli is working correctly Following err in host-controller.log: 2016-05-05 11:42:40,271 ERROR [io.undertow.request] (Host Controller Service Threads - 54116) UT005071: Undertow request failed HttpServerExchange{ POST /management request {Accept=[application/json], Connection=[Keep-Alive], Authorization=[Digest username="rhqadmin", realm="ManagementRealm", nonce="5LMqqFybGWoNMTQ2MjQ2Mjk2MDA0Mna3aWvkOayVgASy9bsRABI=", uri="/management", response="e194f9abf448bf1f7a1fb411cff6abc8", qop=auth, nc=00000001, cnonce="25e2dd1725f8752c", algorithm="MD5", opaque="00000000000000000000000000000000"], Content-Length=[74], Content-Type=[application/json; charset=UTF-8], User-Agent=[Apache-HttpClient/4.2.5 (java 1.5)], Host=[127.0.0.1:8443]} response {Connection=[keep-alive], X-Frame-Options=[SAMEORIGIN], Authentication-Info=[nextnonce="5LMqqFybGWoNMTQ2MjQ2Mjk2MDA0Mna3aWvkOayVgASy9bsRABI=",qop="auth",rspauth="be69e0a8fe741fe0df203fde6e0b2389",cnonce="25e2dd1725f8752c",nc=00000001], Content-Type=[application/json; charset=utf-8], Content-Length=[44], Date=[Thu, 05 May 2016 15:42:40 GMT]}}: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at io.undertow.server.DefaultByteBufferPool.allocate(DefaultByteBufferPool.java:122) at io.undertow.protocols.ssl.SslConduit.doWrap(SslConduit.java:849) at io.undertow.protocols.ssl.SslConduit.write(SslConduit.java:377) at io.undertow.server.protocol.http.HttpResponseConduit.processWrite(HttpResponseConduit.java:247) at io.undertow.server.protocol.http.HttpResponseConduit.write(HttpResponseConduit.java:588) at io.undertow.conduits.AbstractFixedLengthStreamSinkConduit.write(AbstractFixedLengthStreamSinkConduit.java:106) at io.undertow.conduits.AbstractFixedLengthStreamSinkConduit.write(AbstractFixedLengthStreamSinkConduit.java:120) at org.xnio.conduits.ConduitStreamSinkChannel.write(ConduitStreamSinkChannel.java:158) at io.undertow.channels.DetachableStreamSinkChannel.write(DetachableStreamSinkChannel.java:179) at io.undertow.server.HttpServerExchange$WriteDispatchChannel.write(HttpServerExchange.java:1969) at org.xnio.channels.Channels.writeBlocking(Channels.java:152) at io.undertow.io.UndertowOutputStream.write(UndertowOutputStream.java:228) at io.undertow.io.BlockingSenderImpl.writeBuffer(BlockingSenderImpl.java:194) at io.undertow.io.BlockingSenderImpl.writeBuffer(BlockingSenderImpl.java:187) at io.undertow.io.BlockingSenderImpl.send(BlockingSenderImpl.java:63) at io.undertow.io.BlockingSenderImpl.send(BlockingSenderImpl.java:83) at org.jboss.as.domain.http.server.DomainUtil.writeResponse(DomainUtil.java:70) at org.jboss.as.domain.http.server.DomainApiHandler$1.doSendResponse(DomainApiHandler.java:176) at org.jboss.as.domain.http.server.DomainApiHandler$ResponseCallback.sendResponse(DomainApiHandler.java:391) at org.jboss.as.domain.http.server.DomainApiHandler.handleRequest(DomainApiHandler.java:232) at io.undertow.server.handlers.encoding.EncodingHandler.handleRequest(EncodingHandler.java:66) at org.jboss.as.domain.http.server.security.SubjectDoAsHandler$1.run(SubjectDoAsHandler.java:72) at org.jboss.as.domain.http.server.security.SubjectDoAsHandler$1.run(SubjectDoAsHandler.java:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:92) at org.jboss.as.domain.http.server.security.SubjectDoAsHandler.handleRequest(SubjectDoAsHandler.java:68) at org.jboss.as.domain.http.server.security.SubjectDoAsHandler.handleRequest(SubjectDoAsHandler.java:63) at io.undertow.server.handlers.BlockingHandler.handleRequest(BlockingHandler.java:56) at org.jboss.as.domain.http.server.DomainApiCheckHandler.handleRequest(DomainApiCheckHandler.java:95) at io.undertow.server.Connectors.executeRootHandler(Connectors.java:202) at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:792) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) at org.jboss.threads.JBossThread.run(JBossThread.java:320)
This is not a bug in the plugin. The issue is caused by memory leak in EAP7. There is a following jira to track that - https://issues.jboss.org/browse/JBEAP-4410 Closing as not a bug.
Jimmy Wilson <jawilson> updated the status of jira JBEAP-4410 to Coding In Progress
Jimmy Wilson <jawilson> updated the status of jira JBEAP-4410 to Open
Errata Tool <etool> updated the status of jira JBEAP-4410 to Closed