Hide Forgot
Created attachment 1214708 [details] HS log Description of problem: Error happened in DR7 container: ^[[0m^[[31m13:46:44,989 ERROR [org.hawkular.agent.monitor.storage.AsyncInventoryStorage] (Hawkular WildFly Agent Full Discovery Scan-1) HAWKMONITOR010024: Failed to store inventory data: java.net.SocketTimeoutException: timeout at okio.Okio$3.newTimeoutException(Okio.java:207) at okio.AsyncTimeout.exit(AsyncTimeout.java:261) at okio.AsyncTimeout$2.read(AsyncTimeout.java:215) at okio.RealBufferedSource.indexOf(RealBufferedSource.java:306) at okio.RealBufferedSource.indexOf(RealBufferedSource.java:300) at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196) Version-Release number of selected component (if applicable): DR7 How reproducible: Steps to Reproduce: 1. Start Cassandra container 2. Start Hawkular Container 3. Add EAP Standalone and EAP Domain Servers Actual results: Expected results: No Inventory errors expected Additional info:
Can you provide container logs for Hawkular and C* Also the result of docker inspect <id of h-s container> ?
Ah sorry, I see hs.log is the server log.
What version of agent are you using? As far as I am aware, agent should no longer use the /bulk endpoint for any inventory storage...
Clearing the NEEDINFO flag as the logs are attached...
Matt, could you please confirm the agent version used in your tests. I.e. NOT the version of the Hawkular container, but the version of the agent installed into the EAPs.
Mazz, could you please confirm my assumption that the agent no longer uses /bulk for any inventory communication?
(In reply to Lukas Krejci from comment #7) > Mazz, could you please confirm my assumption that the agent no longer uses > /bulk for any inventory communication? Correct. Agent no longer sends to the /bulk endpoint. It sends to the /sync endpoint, such as here: https://github.com/hawkular/hawkular-agent/blob/0.24.0.Final/hawkular-wildfly-agent/src/main/java/org/hawkular/agent/monitor/storage/AsyncInventoryStorage.java#L600
Matt, could you please make sure that you install the latest version of the Hawkular WildFly Agent into the EAPs? If the error persists, we will need to investigate, but I have a strong suspicion that this is caused by using an outdated agent that does not work with the latest inventory versions anymore.
FYI, inventory no longer supports having complex JSON values as generic property values (which is what the reported error complains about).
Created attachment 1217507 [details] HS Log - Retest
Retested and still see the ERROR, NOT using WF Container (see attached log). Test steps: - Started HS 0.0.18.Final-redhat-1 in container - Downloaded Agent to EAP host server - On EAP server: - Downloaded and unpackaged jboss-eap-7.0 - Installed the Agent: java -jar hawkular-wildfly-agent-installer.jar --target-location=/root/jboss-eap-7.0 --username=jdoe --password=password --server-url=http://<IP:PORT>/ - Started EAP: nohup /root/jboss-eap-7.0/bin/standalone.sh -Djboss.service.binding.set=ports-01 -b=0.0.0.0 -bmanagement=0.0.0.0 - in CFUI: Refreshed Provider, and validated that EAP was inventoried - Checked HS container log files, error below. ^[[0m^[[31m15:58:20,072 ERROR [org.hawkular.agent.monitor.storage.AsyncInventoryStorage] (Hawkular WildFly Agent Full Discovery Scan-1) HAWKMONITOR010024: Failed to store inventory data: java.net.SocketTimeoutException: timeout at okio.Okio$3.newTimeoutException(Okio.java:207) at okio.AsyncTimeout.exit(AsyncTimeout.java:261) at okio.AsyncTimeout$2.read(AsyncTimeout.java:215) at okio.RealBufferedSource.indexOf(RealBufferedSource.java:306) at okio.RealBufferedSource.indexOf(RealBufferedSource.java:300) at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196)
The latest log contains timeouts of both inventory and metrics, with no other accompanying errors suggesting malfunction (at least on inventory side - metrics seem to timeout due to C* timeouts). As such I am lead to believe this is either an environmental issue or high GC pressure preventing the server from performing normal operation in timely manner. Also note that the error we're talking about is going to happen in production no matter what we do due to environmental issues - network outages, etc. Therefore we should not test for its absence but for the system recovering from it.
Created attachment 1218109 [details] NoEAP-HS.log