Created attachment 1346559 [details] metrics pods log Description of problem: Can not access hawkular-cassandra and hawkular-metrics prometheus metrics interface, return connection refused error # oc get po -o wide NAME READY STATUS RESTARTS AGE IP NODE hawkular-cassandra-1-kcgdb 1/1 Running 0 1m 10.128.0.86 host-8-241-56.host.centralci.eng.rdu2.redhat.com hawkular-metrics-ng86f 1/1 Running 0 1m 10.128.0.87 host-8-241-56.host.centralci.eng.rdu2.redhat.com heapster-nxrzp 1/1 Running 0 1m 10.128.0.88 host-8-241-56.host.centralci.eng.rdu2.redhat.com # curl http://10.128.0.86:7575/metrics curl: (7) Failed connect to 10.128.0.86:7575; Connection refused # curl http://10.128.0.87:7575/metrics curl: (7) Failed connect to 10.128.0.87:7575; Connection refused Version-Release number of selected component (if applicable): # rpm -qa | grep openshift-ansible openshift-ansible-playbooks-3.7.0-0.189.0.git.0.d497c5e.el7.noarch openshift-ansible-lookup-plugins-3.7.0-0.189.0.git.0.d497c5e.el7.noarch openshift-ansible-filter-plugins-3.7.0-0.189.0.git.0.d497c5e.el7.noarch openshift-ansible-callback-plugins-3.7.0-0.189.0.git.0.d497c5e.el7.noarch openshift-ansible-3.7.0-0.189.0.git.0.d497c5e.el7.noarch openshift-ansible-roles-3.7.0-0.189.0.git.0.d497c5e.el7.noarch openshift-ansible-docs-3.7.0-0.189.0.git.0.d497c5e.el7.noarch metrics-hawkular-metrics:v3.7.0-0.185.0.0 metrics-cassandra:v3.7.0-0.185.0.0 metrics-heapster:v3.7.0-0.185.0.0 How reproducible: Always Steps to Reproduce: 1. Deploy metrics,inventory file see the [Additional info] part 2. 3. Actual results: Can not access hawkular-cassandra and hawkular-metrics prometheus metrics interface Expected results: Should return prometheus type data Additional info: [OSEv3:children] masters etcd [masters] ${MASTER} openshift_public_hostname=${MASTER} [etcd] ${ETCD} openshift_public_hostname=${ETCD} [OSEv3:vars] ansible_ssh_user=root ansible_ssh_private_key_file="~/libra.pem" deployment_type=openshift-enterprise # Metrics openshift_metrics_install_metrics=true openshift_metrics_hawkular_hostname=hawkular-metrics.${SUB_DOMAIN} openshift_metrics_project=openshift-infra openshift_metrics_image_prefix=${IMAGE_PREFIX} openshift_metrics_image_version=v3.7
Created attachment 1346560 [details] metrics pods info
The ansible scripts set the ENABLE_PROMETHEUS_ENDPOINT variable to a value of "True". The cassandra-docker.sh script which checks to see if the variable is set looks for a value of "true". The same is true for hawkular-metrics in the standalone.conf script. You can work around this by do the following: 1) `oc edit rc hawkular-cassandra-1` and set value of ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes. 2) `oc edit rc hawkular-metrics` and set the value of ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes. 3) `oc scale --replicas=0 rc hawkular-cassandra-1` 4) `oc scale --replicas=0 rc hawkular-metrics` 5) `oc scale --replicas=1 rc hawkular-cassandra-1` 6) `oc scale --replicas=1 rc hawkular-metrics` For the permanent fix, I will need to update the cassandra-docker.sh and standalone.conf scripts.
Moving target release to 3.8 since there is a work around that I described in comment 2.
(In reply to John Sanda from comment #2) > The ansible scripts set the ENABLE_PROMETHEUS_ENDPOINT variable to a value > of "True". The cassandra-docker.sh script which checks to see if the > variable is set looks for a value of "true". The same is true for > hawkular-metrics in the standalone.conf script. You can work around this by > do the following: > > 1) `oc edit rc hawkular-cassandra-1` and set value of > ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes. > > 2) `oc edit rc hawkular-metrics` and set the value of > ENABLE_PROMETHEUS_ENDPOINT to true. Save the changes. > > 3) `oc scale --replicas=0 rc hawkular-cassandra-1` > > 4) `oc scale --replicas=0 rc hawkular-metrics` > > 5) `oc scale --replicas=1 rc hawkular-cassandra-1` > > 6) `oc scale --replicas=1 rc hawkular-metrics` > > > For the permanent fix, I will need to update the cassandra-docker.sh and > standalone.conf scripts. another workaround is set the following parameters in inventory file openshift_metrics_cassandra_enable_prometheus_endpoint=true openshift_metrics_hawkular_enable_prometheus_endpoint=true
I have created https://github.com/openshift/origin-metrics/pull/404 to fix this.
Issue is not fixed, it is changed to ON_QA by errata, change back to MODIFIED # oc get po -o wide NAME READY STATUS RESTARTS AGE IP NODE hawkular-cassandra-1-pdnb5 1/1 Running 0 36m 10.129.0.13 172.16.120.17 hawkular-metrics-2cjcx 1/1 Running 2 36m 10.129.0.12 172.16.120.17 heapster-mhw94 1/1 Running 2 36m 10.128.0.13 172.16.120.59 # curl http://10.129.0.13:7575/metrics curl: (7) Failed connect to 10.129.0.13:7575; Connection refused # curl http://10.129.0.12:7575/metrics curl: (7) Failed connect to 10.129.0.12:7575; Connection refused Images: metrics-cassandra-v3.9.2-1 metrics-hawkular-metrics-v3.9.2-1 metrics-heapster-v3.9.2-1
I saw the PR for this fix and it seems like those changes are not on the latest build. I'll do a new build with those changes.
Could get hawkular-cassandra and hawkular-metrics prometheus metrics by command now, the output see the attached file # curl http://${POD_IP}:7575/metrics Images metrics-cassandra-v3.9.4-1 metrics-hawkular-metrics-v3.9.4-1 metrics-heapster-v3.9.4-1 # openshift version openshift v3.9.3 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.16
Created attachment 1406065 [details] issue is fixed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489