Bug 1421060
Summary: | Could not get application level metrics with error 'Failed to collect endpoint' | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Peng Li <penli> |
Component: | Hawkular | Assignee: | Matt Wringe <mwringe> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Peng Li <penli> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.5.0 | CC: | aos-bugs, mazz, tdawson |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-02 22:22:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Peng Li
2017-02-10 09:16:41 UTC
Are you seeing metrics for the agent itself? I see this error, too, with the Jolokia example endpoint (not the Prometheus example), but I don't think it is an agent problem. Because if you navigate to the Jolokia example pod in the OpenShift UI, and click the "Open Java Console" link, OpenShift gets the exact same connection error as the agent does - you will see this in a popup dialogue box: === The connection to jolokia failed! The connection to jolokia has failed with the following error, also check the javascript console for more details. Error: 'dial tcp 172.17.0.6:8778: getsockopt: connection refused' Trying to reach: 'https://172.17.0.6:8778/jolokia/?maxDepth=7&maxCollectionSize=500&ignoreErrors=true&canonicalNaming=false' 1 === So for some reason, OpenShift cannot expose that Jolokia endpoint sometimes and when that happens, no client can connect to it. My versions: oc v1.5.0-alpha.0+3b2bbe5 kubernetes v1.4.0+776c994 openshift v1.5.0-alpha.0+3b2bbe5 $ oc project test1 $ oc new-app amq62-basic for this pod broker-amq, I can click the "Open Java Console" link, and see some JMX info. (In reply to Peng Li from comment #6) > $ oc project test1 > $ oc new-app amq62-basic > for this pod broker-amq, I can click the "Open Java Console" link, and see > some JMX info. Then it sounds like it might be some kind of permission issue? What roles/permissions are the agent given? We might need Matt to take a look at the setup. verified Version: metrics-hawkular-metrics 3.5.0 b50862a32dd6 14 hours ago 1.508 GB metrics-hawkular-openshift-agent 3.5.0 a66118961a69 20 hours ago 234.8 MB metrics-heapster 3.5.0 03d0a94d4bd2 metrics-cassandra 3.5.0 aa7e5b2b7210 Steps: 1. deploy Metrics 3.5.0 using ansible. 2. deploy the agent oc create -f hawkular-openshift-agent-configmap.yaml -n default oc process -f hawkular-openshift-agent.yaml | oc create -n default -f - oc adm policy add-cluster-role-to-user hawkular-openshift-agent system:serviceaccount:default:hawkular-openshift-agent 3. check metrics and agent pods running. 4. check from console that Promethus and Jokokia example pod endpoints could be gathered. Above test is in a ovs-multitatent enabled OCP # openshift version openshift v3.5.0.19+199197c # oc get netnamespace | grep openshift-infra openshift-infra 11939989 Since this bug never reached customers, I am closing it. |