Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1571641 - [3.10]Connection refused error when accessing hawkular-cassandra and hawkular-metrics prometheus metrics interface
Summary: [3.10]Connection refused error when accessing hawkular-cassandra and hawkular...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.0
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-25 09:15 UTC by Junqi Zhao
Modified: 2018-10-11 07:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1508496
Environment:
Last Closed: 2018-10-11 07:19:10 UTC
Target Upstream Version:


Attachments (Terms of Use)
still shows "Connection refused error" for hawkular-cassandra pod (18.97 KB, text/plain)
2018-06-05 08:29 UTC, Junqi Zhao
no flags Details
Issue is fixed (13.05 KB, text/plain)
2018-06-21 04:26 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:19:39 UTC

Comment 7 Junqi Zhao 2018-06-05 08:28:17 UTC
Issue is not fully fixed, it still shows "Connection refused error" for hawkular-cassandra pod, issue is fixed for hawkular-metrics pod

# oc get po -o wide
NAME                            READY     STATUS      RESTARTS   AGE       IP            NODE
hawkular-cassandra-1-kt858      1/1       Running     0          8m        10.129.0.26   qe-juzhao-310-qeos-1-nrr-1
hawkular-metrics-gxwqq          1/1       Running     0          8m        10.128.0.25   qe-juzhao-310-qeos-1-master-etcd-1
hawkular-metrics-schema-cltg2   0/1       Completed   0          9m        10.129.0.25   qe-juzhao-310-qeos-1-nrr-1
heapster-b7rzw                  1/1       Running     0          8m        10.129.0.27   qe-juzhao-310-qeos-1-nrr-1
*********************************************************************************
# curl 10.129.0.26:7575/metrics
curl: (7) Failed connect to 10.129.0.26:7575; Connection refused

although ENABLE_PROMETHEUS_ENDPOINT is true for hawkular-cassandra pod
# oc exec hawkular-cassandra-1-kt858 -- env | grep ENABLE_PROMETHEUS_ENDPOINT
ENABLE_PROMETHEUS_ENDPOINT=True

more info please see the attached file
metrics version: v3.10.0-0.58.0.0

Comment 8 Junqi Zhao 2018-06-05 08:29:22 UTC
Created attachment 1447755 [details]
still shows "Connection refused error" for hawkular-cassandra pod

Comment 9 John Sanda 2018-06-08 18:34:41 UTC
hawkular-metrics does a case-insensitive comparison in the script, whereas cassandra does not. The test is for all lower case "true". Unfortunately, setting ENABLE_PROMETHEUS_ENDPOINT=true in the inventory file does not work either. Setting the env var in the hawkular-cassandra-1 RC though will work. Since there is a relatively easy work around I am bumping this to 3.11.

Comment 10 Junqi Zhao 2018-06-10 23:57:18 UTC
*** Bug 1589023 has been marked as a duplicate of this bug. ***

Comment 11 Junqi Zhao 2018-06-21 04:26:21 UTC
Not sure what had happened, issue is fixed


# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.10.2-1.git.190.5abfddb.el7.noarch
openshift-ansible-playbooks-3.10.2-1.git.190.5abfddb.el7.noarch
openshift-ansible-docs-3.10.2-1.git.190.5abfddb.el7.noarch
openshift-ansible-3.10.2-1.git.190.5abfddb.el7.noarch

# openshift version
openshift v3.10.2

metrics version: v3.10.2-1

Comment 12 Junqi Zhao 2018-06-21 04:26:43 UTC
Created attachment 1453333 [details]
Issue is  fixed

Comment 13 John Sanda 2018-07-30 17:42:51 UTC
(In reply to Junqi Zhao from comment #12)
> Created attachment 1453333 [details]
> Issue is  fixed

Can the status be moved to VERIFIED?

Comment 14 Junqi Zhao 2018-08-01 05:31:23 UTC
(In reply to John Sanda from comment #13)
> Can the status be moved to VERIFIED?
no, issue is reproduced again,

# oc get pod -o wide -n openshift-infra
NAME                            READY     STATUS      RESTARTS   AGE       IP          NODE
hawkular-cassandra-1-7w2sf      1/1       Running     0          18h       10.2.2.9    ip-172-18-22-155.ec2.internal
hawkular-cassandra-2-9mtrd      1/1       Running     1          18h       10.2.10.4   ip-172-18-0-153.ec2.internal
hawkular-metrics-hbll5          1/1       Running     0          18h       10.2.6.89   ip-172-18-28-25.ec2.internal
hawkular-metrics-schema-5qzj8   0/1       Completed   0          18h       10.2.6.87   ip-172-18-28-25.ec2.internal
heapster-dtjpf                  1/1       Running     0          18h       10.2.2.10   ip-172-18-22-155.ec2.internal

hawkular-cassandra is still Connection refused
# curl http://10.2.2.9:7575/metrics
curl: (7) Failed connect to 10.2.2.9:7575; Connection refused
# curl http://10.2.10.4:7575/metrics
curl: (7) Failed connect to 10.2.10.4:7575; Connection refused

$ oc get rc hawkular-cassandra-1 -o yaml | grep  ENABLE_PROMETHEUS_ENDPOINT -A 2
        - name: ENABLE_PROMETHEUS_ENDPOINT
          value: "True"

cassandra version: metrics-cassandra-v3.11.0-0.10.0.0

hawkular-metrics works well
# curl http://10.2.6.89:7575/metrics
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 405.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 151.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 422.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 987.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="PS Scavenge",} 777.0
jvm_gc_collection_seconds_sum{gc="PS Scavenge",} 3.994
jvm_gc_collection_seconds_count{gc="PS MarkSweep",} 1.0
jvm_gc_collection_seconds_sum{gc="PS MarkSweep",} 0.125
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2732.21
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.533026610444E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 1549.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.2267114496E10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.413013504E9
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 6.17996568E8
jvm_memory_bytes_used{area="nonheap",} 1.86586304E8
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 1.358954496E9
jvm_memory_bytes_committed{area="nonheap",} 2.0021248E8
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 1.358954496E9
jvm_memory_bytes_max{area="nonheap",} 7.80140544E8
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 6.1809216E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 1.11660072E8
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 1.3117016E7
jvm_memory_pool_bytes_used{pool="PS Eden Space",} 3.69711576E8
jvm_memory_pool_bytes_used{pool="PS Survivor Space",} 1819184.0
jvm_memory_pool_bytes_used{pool="PS Old Gen",} 2.46465808E8
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 6.2324736E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 1.21765888E8
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.6121856E7
jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 4.39353344E8
jvm_memory_pool_bytes_committed{pool="PS Survivor Space",} 7864320.0
jvm_memory_pool_bytes_committed{pool="PS Old Gen",} 9.11736832E8
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} 2.68435456E8
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 2.60046848E8
jvm_memory_pool_bytes_max{pool="PS Eden Space",} 4.39877632E8
jvm_memory_pool_bytes_max{pool="PS Survivor Space",} 7864320.0
jvm_memory_pool_bytes_max{pool="PS Old Gen",} 9.11736832E8
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 5.14454E-4
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 19414.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 19448.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 34.0
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_181-b13",vendor="Oracle Corporation",} 1.0

Comment 17 Junqi Zhao 2018-08-08 09:08:57 UTC
Tested with metrics-cassandra:v3.11.0-0.11.0.0, issue is fixed.
Please change to ON_QA.
# oc get pod -n openshift-infra -o wide | grep hawkular-cassandra
hawkular-cassandra-1-k498r      1/1       Running     0          2h        10.2.12.66   preserve-sharefr2-node-infra-1

# curl http://10.2.12.66:7575/metrics
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 4.4331744E8
jvm_memory_bytes_used{area="nonheap",} 8.8301192E7
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 9.67114752E8
jvm_memory_bytes_committed{area="nonheap",} 9.1254784E7
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 9.67114752E8
jvm_memory_bytes_max{area="nonheap",} -1.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 3.8577408E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 4.4996008E7
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 4727776.0
jvm_memory_pool_bytes_used{pool="Par Eden Space",} 1.82857384E8
jvm_memory_pool_bytes_used{pool="Par Survivor Space",} 9474640.0
jvm_memory_pool_bytes_used{pool="CMS Old Gen",} 2.50985416E8
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 3.9452672E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 4.6727168E7
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 5074944.0
jvm_memory_pool_bytes_committed{pool="Par Eden Space",} 2.65945088E8
jvm_memory_pool_bytes_committed{pool="Par Survivor Space",} 3.3226752E7
jvm_memory_pool_bytes_committed{pool="CMS Old Gen",} 6.67942912E8
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
jvm_memory_pool_bytes_max{pool="Par Eden Space",} 2.65945088E8
jvm_memory_pool_bytes_max{pool="Par Survivor Space",} 3.3226752E7
jvm_memory_pool_bytes_max{pool="CMS Old Gen",} 6.67942912E8
# HELP org_apache_cassandra_metrics_DroppedMessage_Count Attribute exposed for management (org.apache.cassandra.metrics<type=DroppedMessage, scope=MUTATION, name=Dropped><>Count)
# TYPE org_apache_cassandra_metrics_DroppedMessage_Count untyped
org_apache_cassandra_metrics_DroppedMessage_Count{scope="MUTATION",name="Dropped",} 0.0
org_apache_cassandra_metrics_DroppedMessage_Count{scope="READ",name="Dropped",} 0.0
org_apache_cassandra_metrics_DroppedMessage_Count{scope="RANGE_SLICE",name="Dropped",} 0.0
org_apache_cassandra_metrics_DroppedMessage_Count{scope="PAGED_RANGE",name="Dropped",} 0.0
# HELP org_apache_cassandra_metrics_ClientRequest_Count Attribute exposed for management (org.apache.cassandra.metrics<type=ClientRequest, scope=RangeSlice, name=Failures><>Count)
# TYPE org_apache_cassandra_metrics_ClientRequest_Count untyped
org_apache_cassandra_metrics_ClientRequest_Count{scope="RangeSlice",name="Failures",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="Read",name="Unavailables",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="Write",name="Unavailables",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="Write",name="Timeouts",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="RangeSlice",name="Unavailables",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="Write",name="Failures",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="Read",name="Failures",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="RangeSlice",name="Timeouts",} 0.0
org_apache_cassandra_metrics_ClientRequest_Count{scope="Read",name="Timeouts",} 0.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 0.048682478
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 405.12
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.533706804225E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 220.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.119505408E9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.385967616E9
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_181-b13",vendor="Oracle Corporation",} 1.0
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 222.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 198.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 226.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 1407.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="ParNew",} 144.0
jvm_gc_collection_seconds_sum{gc="ParNew",} 4.702
jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep",} 3.0
jvm_gc_collection_seconds_sum{gc="ConcurrentMarkSweep",} 0.216
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 7141.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 7182.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 41.0


$ oc get rc hawkular-cassandra-1 -o yaml | grep  ENABLE_PROMETHEUS_ENDPOINT -A 2
        - name: ENABLE_PROMETHEUS_ENDPOINT
          value: "True"

Comment 20 Junqi Zhao 2018-08-23 10:59:27 UTC
Issue is fixed

Images:
metrics-cassandra/images/v3.11.0-0.20.0.0
metrics-heapster/images/v3.11.0-0.20.0.0
metrics-schema-installer/images/v3.11.0-0.20.0.0
metrics-hawkular-metrics/images/v3.11.0-0.20.0.0

Comment 22 errata-xmlrpc 2018-10-11 07:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.