Bug 1389495 - Failed to store inventory data
Summary: Failed to store inventory data
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: GA
: cfme-future
Assignee: Bronagh Sorota
QA Contact: Matt Mahoney
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-27 18:02 UTC by Matt Mahoney
Modified: 2019-08-06 20:05 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-10 15:26:35 UTC
Category: ---
Cloudforms Team: Middleware
Target Upstream Version:
mmahoney: needinfo+


Attachments (Terms of Use)
HS log (184.61 KB, text/plain)
2016-10-27 18:02 UTC, Matt Mahoney
no flags Details
HS Log - Retest (121.20 KB, text/plain)
2016-11-04 20:33 UTC, Matt Mahoney
no flags Details
NoEAP-HS.log (185.82 KB, text/plain)
2016-11-07 16:15 UTC, Matt Mahoney
no flags Details

Description Matt Mahoney 2016-10-27 18:02:32 UTC
Created attachment 1214708 [details]
HS log

Description of problem:
Error happened in DR7 container:

^[[0m^[[31m13:46:44,989 ERROR [org.hawkular.agent.monitor.storage.AsyncInventoryStorage] (Hawkular WildFly Agent Full Discovery Scan-1) HAWKMONITOR010024: Failed to store inventory data: java.net.SocketTimeoutException: timeout
        at okio.Okio$3.newTimeoutException(Okio.java:207)
        at okio.AsyncTimeout.exit(AsyncTimeout.java:261)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:215)
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:306)
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:300)
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196)


Version-Release number of selected component (if applicable):
DR7

How reproducible:


Steps to Reproduce:
1. Start Cassandra container
2. Start Hawkular Container
3. Add EAP Standalone and EAP Domain Servers

Actual results:


Expected results:
No Inventory errors expected

Additional info:

Comment 2 Heiko W. Rupp 2016-10-27 18:07:18 UTC
Can you provide container logs for Hawkular and C*
Also the result of docker inspect <id of h-s container> ?

Comment 3 Heiko W. Rupp 2016-10-28 08:23:54 UTC
Ah sorry, I see hs.log is the server log.

Comment 4 Lukas Krejci 2016-10-31 11:04:31 UTC
What version of agent are you using? As far as I am aware, agent should no longer use the /bulk endpoint for any inventory storage...

Comment 5 Lukas Krejci 2016-10-31 11:07:53 UTC
Clearing the NEEDINFO flag as the logs are attached...

Comment 6 Lukas Krejci 2016-11-01 21:21:22 UTC
Matt, could you please confirm the agent version used in your tests. I.e. NOT the version of the Hawkular container, but the version of the agent installed into the EAPs.

Comment 7 Lukas Krejci 2016-11-01 21:28:37 UTC
Mazz, could you please confirm my assumption that the agent no longer uses /bulk for any inventory communication?

Comment 8 John Mazzitelli 2016-11-01 23:07:41 UTC
(In reply to Lukas Krejci from comment #7)
> Mazz, could you please confirm my assumption that the agent no longer uses
> /bulk for any inventory communication?

Correct. Agent no longer sends to the /bulk endpoint. It sends to the /sync endpoint, such as here:

https://github.com/hawkular/hawkular-agent/blob/0.24.0.Final/hawkular-wildfly-agent/src/main/java/org/hawkular/agent/monitor/storage/AsyncInventoryStorage.java#L600

Comment 9 Lukas Krejci 2016-11-02 13:35:09 UTC
Matt, could you please make sure that you install the latest version of the Hawkular WildFly Agent into the EAPs? 

If the error persists, we will need to investigate, but I have a strong suspicion that this is caused by using an outdated agent that does not work with the latest inventory versions anymore.

Comment 10 Lukas Krejci 2016-11-02 13:37:59 UTC
FYI, inventory no longer supports having complex JSON values as generic property values (which is what the reported error complains about).

Comment 13 Matt Mahoney 2016-11-04 20:33:44 UTC
Created attachment 1217507 [details]
HS Log - Retest

Comment 14 Matt Mahoney 2016-11-04 20:34:25 UTC
Retested and still see the ERROR, NOT using WF Container (see attached log).

Test steps:
 - Started HS 0.0.18.Final-redhat-1 in container
 - Downloaded Agent to EAP host server
 - On EAP server:
     - Downloaded and unpackaged jboss-eap-7.0
     - Installed the Agent: java -jar hawkular-wildfly-agent-installer.jar --target-location=/root/jboss-eap-7.0 --username=jdoe --password=password --server-url=http://<IP:PORT>/
     - Started EAP: nohup /root/jboss-eap-7.0/bin/standalone.sh -Djboss.service.binding.set=ports-01 -b=0.0.0.0 -bmanagement=0.0.0.0
 - in CFUI: Refreshed Provider, and validated that EAP was inventoried
 - Checked HS container log files, error below.



^[[0m^[[31m15:58:20,072 ERROR [org.hawkular.agent.monitor.storage.AsyncInventoryStorage] (Hawkular WildFly Agent Full Discovery Scan-1) HAWKMONITOR010024: Failed to store inventory data: java.net.SocketTimeoutException: timeout
        at okio.Okio$3.newTimeoutException(Okio.java:207)
        at okio.AsyncTimeout.exit(AsyncTimeout.java:261)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:215)
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:306)
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:300)
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196)

Comment 15 Lukas Krejci 2016-11-07 12:46:19 UTC
The latest log contains timeouts of both inventory and metrics, with no other accompanying errors suggesting malfunction (at least on inventory side - metrics seem to timeout due to C* timeouts). As such I am lead to believe this is either an environmental issue or high GC pressure preventing the server from performing normal operation in timely manner.

Also note that the error we're talking about is going to happen in production no matter what we do due to environmental issues - network outages, etc. Therefore we should not test for its absence but for the system recovering from it.

Comment 17 Matt Mahoney 2016-11-07 16:15:50 UTC
Created attachment 1218109 [details]
NoEAP-HS.log


Note You need to log in before you can comment on or make changes to this bug.