Bug 1508496 - Connection refused error when accessing hawkular-cassandra and hawkular-metrics prometheus metrics interface
Summary: Connection refused error when accessing hawkular-cassandra and hawkular-metri...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: John Sanda
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-01 14:38 UTC by Junqi Zhao
Modified: 2018-04-25 09:16 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1571641 (view as bug list)
Environment:
Last Closed: 2018-03-28 14:09:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
metrics pods log (494.53 KB, text/plain)
2017-11-01 14:38 UTC, Junqi Zhao
no flags Details
metrics pods info (17.78 KB, text/plain)
2017-11-01 14:40 UTC, Junqi Zhao
no flags Details
issue is fixed (49.46 KB, text/plain)
2018-03-09 02:06 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:10:51 UTC

Description Junqi Zhao 2017-11-01 14:38:31 UTC
Created attachment 1346559 [details]
metrics pods log

Description of problem:
Can not access hawkular-cassandra and hawkular-metrics prometheus metrics interface, return connection refused error
# oc get po -o wide
NAME                         READY     STATUS    RESTARTS   AGE       IP            NODE
hawkular-cassandra-1-kcgdb   1/1       Running   0          1m        10.128.0.86   host-8-241-56.host.centralci.eng.rdu2.redhat.com
hawkular-metrics-ng86f       1/1       Running   0          1m        10.128.0.87   host-8-241-56.host.centralci.eng.rdu2.redhat.com
heapster-nxrzp               1/1       Running   0          1m        10.128.0.88   host-8-241-56.host.centralci.eng.rdu2.redhat.com
# curl http://10.128.0.86:7575/metrics
curl: (7) Failed connect to 10.128.0.86:7575; Connection refused
# curl http://10.128.0.87:7575/metrics
curl: (7) Failed connect to 10.128.0.87:7575; Connection refused


Version-Release number of selected component (if applicable):
# rpm -qa | grep openshift-ansible
openshift-ansible-playbooks-3.7.0-0.189.0.git.0.d497c5e.el7.noarch
openshift-ansible-lookup-plugins-3.7.0-0.189.0.git.0.d497c5e.el7.noarch
openshift-ansible-filter-plugins-3.7.0-0.189.0.git.0.d497c5e.el7.noarch
openshift-ansible-callback-plugins-3.7.0-0.189.0.git.0.d497c5e.el7.noarch
openshift-ansible-3.7.0-0.189.0.git.0.d497c5e.el7.noarch
openshift-ansible-roles-3.7.0-0.189.0.git.0.d497c5e.el7.noarch
openshift-ansible-docs-3.7.0-0.189.0.git.0.d497c5e.el7.noarch

metrics-hawkular-metrics:v3.7.0-0.185.0.0
metrics-cassandra:v3.7.0-0.185.0.0
metrics-heapster:v3.7.0-0.185.0.0

How reproducible:
Always

Steps to Reproduce:
1. Deploy metrics,inventory file see the [Additional info] part
2.
3.

Actual results:
Can not access hawkular-cassandra and hawkular-metrics prometheus metrics interface

Expected results:
Should return prometheus type data

Additional info:
[OSEv3:children]
masters
etcd

[masters]
${MASTER} openshift_public_hostname=${MASTER}

[etcd]
${ETCD} openshift_public_hostname=${ETCD}


[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_private_key_file="~/libra.pem"
deployment_type=openshift-enterprise


# Metrics
openshift_metrics_install_metrics=true
openshift_metrics_hawkular_hostname=hawkular-metrics.${SUB_DOMAIN}
openshift_metrics_project=openshift-infra
openshift_metrics_image_prefix=${IMAGE_PREFIX}
openshift_metrics_image_version=v3.7

Comment 1 Junqi Zhao 2017-11-01 14:40:04 UTC
Created attachment 1346560 [details]
metrics pods info

Comment 2 John Sanda 2017-11-02 15:18:47 UTC
The ansible scripts set the ENABLE_PROMETHEUS_ENDPOINT variable to a value of "True". The cassandra-docker.sh script which checks to see if the variable is set looks for a value of "true". The same is true for hawkular-metrics in the standalone.conf script. You can work around this by do the following:

1) `oc edit rc hawkular-cassandra-1`  and set value of ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes.

2) `oc edit rc hawkular-metrics` and set the value of ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes.

3) `oc scale --replicas=0 rc hawkular-cassandra-1`

4) `oc scale --replicas=0 rc hawkular-metrics`

5) `oc scale --replicas=1 rc hawkular-cassandra-1`

6) `oc scale --replicas=1 rc hawkular-metrics`


For the permanent fix, I will need to update the cassandra-docker.sh and standalone.conf scripts.

Comment 4 John Sanda 2017-11-02 20:27:46 UTC
Moving target release to 3.8 since there is a work around that I described in comment 2.

Comment 5 Junqi Zhao 2017-11-03 00:47:57 UTC
(In reply to John Sanda from comment #2)
> The ansible scripts set the ENABLE_PROMETHEUS_ENDPOINT variable to a value
> of "True". The cassandra-docker.sh script which checks to see if the
> variable is set looks for a value of "true". The same is true for
> hawkular-metrics in the standalone.conf script. You can work around this by
> do the following:
> 
> 1) `oc edit rc hawkular-cassandra-1`  and set value of
> ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes.
> 
> 2) `oc edit rc hawkular-metrics` and set the value of
> ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes.
> 
> 3) `oc scale --replicas=0 rc hawkular-cassandra-1`
> 
> 4) `oc scale --replicas=0 rc hawkular-metrics`
> 
> 5) `oc scale --replicas=1 rc hawkular-cassandra-1`
> 
> 6) `oc scale --replicas=1 rc hawkular-metrics`
> 
> 
> For the permanent fix, I will need to update the cassandra-docker.sh and
> standalone.conf scripts.

another workaround is set the following parameters in inventory file
openshift_metrics_cassandra_enable_prometheus_endpoint=true
openshift_metrics_hawkular_enable_prometheus_endpoint=true

Comment 6 John Sanda 2018-02-20 15:34:45 UTC
I have created https://github.com/openshift/origin-metrics/pull/404 to fix this.

Comment 8 Junqi Zhao 2018-03-05 02:19:50 UTC
Issue is not fixed, it is changed to ON_QA by errata, change back to MODIFIED

# oc get po -o wide
NAME                         READY     STATUS    RESTARTS   AGE       IP            NODE
hawkular-cassandra-1-pdnb5   1/1       Running   0          36m       10.129.0.13   172.16.120.17
hawkular-metrics-2cjcx       1/1       Running   2          36m       10.129.0.12   172.16.120.17
heapster-mhw94               1/1       Running   2          36m       10.128.0.13   172.16.120.59
# curl http://10.129.0.13:7575/metrics
curl: (7) Failed connect to 10.129.0.13:7575; Connection refused
# curl http://10.129.0.12:7575/metrics
curl: (7) Failed connect to 10.129.0.12:7575; Connection refused

Images:
metrics-cassandra-v3.9.2-1
metrics-hawkular-metrics-v3.9.2-1
metrics-heapster-v3.9.2-1

Comment 11 Ruben Vargas Palma 2018-03-08 17:36:10 UTC
I saw the PR for this fix and it seems like those changes are not on the latest build. I'll do a new build with those changes.

Comment 13 Junqi Zhao 2018-03-09 02:05:46 UTC
Could get hawkular-cassandra and hawkular-metrics prometheus metrics by command now, the output see the attached file
# curl http://${POD_IP}:7575/metrics

Images
metrics-cassandra-v3.9.4-1
metrics-hawkular-metrics-v3.9.4-1
metrics-heapster-v3.9.4-1

# openshift version
openshift v3.9.3
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Comment 14 Junqi Zhao 2018-03-09 02:06:24 UTC
Created attachment 1406065 [details]
issue is fixed

Comment 17 errata-xmlrpc 2018-03-28 14:09:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.