Description of problem: ansible-service-broker-operator doesn't notify users and admins via alerts in prometheus Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-11-24-183610 tsb operator commit.id: How reproducible: Always Steps to Reproduce: 1. Install asb operator 2. check prometheus rule for the template service broker operator 3. check prometheus targets for the template service broker operator Actual results: 2. no rule for the ansible service broker operator 3. no targets for the ansible service broker operator Expected results: 2. prometheus rule with alert name AnsibleServiceBrokerEnabled can be found 3. prometheus targets for ansible-service-broker-operator with an Endpoint can be found Additional info: $ oc get csv -n openshift-ansible-service-broker NAME DISPLAY VERSION REPLACES PHASE openshiftansibleservicebroker.4.3.0-201911220712 OpenShift Ansible Service Broker Operator 4.3.0-201911220712 Succeeded $ oc image info registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-ansible-service-broker-operator:v4.3.0-201911220712 --filter-by-os linux/amd64 | grep commit.id io.openshift.build.commit.id=d8cb6fe7bdeef19b888aa9a01e02701e356f630f [chuo@dhcp-140-51 .kube]$ oc image info registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-ansible-service-broker:v4.3.0-201911220712 --filter-by-os linux/amd64 | grep commit.id io.openshift.build.commit.id=d8cb6fe7bdeef19b888aa9a01e02701e356f630f
I noticed that in the script used to deploy the brokers the namespace did not have the monitoring labels. https://docs.openshift.com/container-platform/4.2/applications/service_brokers/installing-ansible-service-broker.html#sb-install-asb-operator_sb-installing-asb Enter openshift-ansible-service-broker in the Name field and openshift.io/cluster-monitoring=true in the Labels field and click Create. Please verify that the monitor label is on the namespace.
Confirming part of asb is working. create labels openshift.io/cluster-monitoring=true for openshift-ansible-service-broker in the Name field and the prometheus rule with alert name AnsibleServiceBrokerEnabled can be found and targets for the ansible service broker operator also shows the Endpoint alert: AnsibleServiceBrokerEnabled expr: automationbroker_info{automationbroker="ansible-service-broker",namespace="openshift-ansible-service-broker"} > 0 labels: severity: warning annotations: message: Indicates whether Ansible Service Broker is enabled $ oc get ns openshift-ansible-service-broker --show-labels NAME STATUS AGE LABELS openshift-ansible-service-broker Active 31h openshift.io/cluster-monitoring=true
Verification failed. Alert: AnsibleServiceBrokerEnabled is not firing. cluster version:4.3.0-0.nightly-2019-12-12-004325 asb commit.id: 346a81a77323baeb9f8bcb13437f7e7e32a0824f $ oc get clusterservicebroker NAME URL STATUS AGE ansible-service-broker https://asb.openshift-ansible-service-broker.svc:1338/osb/ Ready 32m $ oc -n openshift-ansible-service-broker get ep NAME ENDPOINTS AGE asb 10.130.2.8:1338,10.130.2.8:1337 50m openshift-ansible-service-broker-operator-metrics <none> 50m $ token=`oc -n openshift-monitoring sa get-token prometheus-k8s` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-1 -- curl -k -H "Authorization: Bearer $token" 'https://10.130.2.8:1338/metrics' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="21600"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="43200"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="86400"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="172800"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="345600"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="604800"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="2.592e+06"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="7.776e+06"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1.5552e+07"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="3.1104e+07"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="+Inf"} 0 apiserver_client_certificate_expiration_seconds_sum 0 apiserver_client_certificate_expiration_seconds_count 0 # HELP asb_deprovision_jobs How many deprovision jobs are actively in the buffer. # TYPE asb_deprovision_jobs gauge asb_deprovision_jobs 0 # HELP asb_provision_jobs How many provision jobs are actively in the buffer. # TYPE asb_provision_jobs gauge asb_provision_jobs 0 # HELP asb_sandbox Gauge of all sandbox namespaces that are active. # TYPE asb_sandbox gauge asb_sandbox 0 # HELP asb_specs_deleted Specs deleted from data-store. # TYPE asb_specs_deleted gauge asb_specs_deleted 0 # HELP asb_specs_total Spec count of different registries and marked for deletion. # TYPE asb_specs_total gauge asb_specs_total{source="marked_for_deletion"} 0 asb_specs_total{source="test"} 4 # HELP asb_update_jobs How many update jobs are actively in the buffer. # TYPE asb_update_jobs gauge asb_update_jobs 0 # HELP authenticated_user_requests Counter of authenticated requests broken out by username. # TYPE authenticated_user_requests counter authenticated_user_requests{username="other"} 655 # HELP bundlelib_sandbox Guage of all sandbox namespaces that are active. # TYPE bundlelib_sandbox gauge bundlelib_sandbox 0 # HELP etcd_helper_cache_entry_count Counter of etcd helper cache entries. This can be different from etcd_helper_cache_miss_count because two concurrent threads can miss the cache and generate the same entry twice. # TYPE etcd_helper_cache_entry_count counter etcd_helper_cache_entry_count 0 # HELP etcd_helper_cache_hit_count Counter of etcd helper cache hits. # TYPE etcd_helper_cache_hit_count counter etcd_helper_cache_hit_count 0 # HELP etcd_helper_cache_miss_count Counter of etcd helper cache miss. # TYPE etcd_helper_cache_miss_count counter etcd_helper_cache_miss_count 0 # HELP etcd_request_cache_add_latencies_summary Latency in microseconds of adding an object to etcd cache # TYPE etcd_request_cache_add_latencies_summary summary etcd_request_cache_add_latencies_summary{quantile="0.5"} NaN etcd_request_cache_add_latencies_summary{quantile="0.9"} NaN etcd_request_cache_add_latencies_summary{quantile="0.99"} NaN etcd_request_cache_add_latencies_summary_sum 0 etcd_request_cache_add_latencies_summary_count 0 # HELP etcd_request_cache_get_latencies_summary Latency in microseconds of getting an object from etcd cache # TYPE etcd_request_cache_get_latencies_summary summary etcd_request_cache_get_latencies_summary{quantile="0.5"} NaN etcd_request_cache_get_latencies_summary{quantile="0.9"} NaN etcd_request_cache_get_latencies_summary{quantile="0.99"} NaN etcd_request_cache_get_latencies_summary_sum 0 etcd_request_cache_get_latencies_summary_count 0 # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.3107e-05 go_gc_duration_seconds{quantile="0.25"} 1.8163e-05 go_gc_duration_seconds{quantile="0.5"} 2.1872e-05 go_gc_duration_seconds{quantile="0.75"} 4.7245e-05 go_gc_duration_seconds{quantile="1"} 0.000413544 go_gc_duration_seconds_sum 0.00234589 go_gc_duration_seconds_count 60 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 25 # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 9.079216e+06 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 2.7715248e+08 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.49898e+06 # HELP go_memstats_frees_total Total number of frees. # TYPE go_memstats_frees_total counter go_memstats_frees_total 1.105552e+06 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 2.422784e+06 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 9.079216e+06 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 5.459968e+07 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 1.142784e+07 # HELP go_memstats_heap_objects Number of allocated objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 39558 # HELP go_memstats_heap_released_bytes_total Total number of heap bytes released to OS. # TYPE go_memstats_heap_released_bytes_total counter go_memstats_heap_released_bytes_total 0 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 6.602752e+07 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 1.5761396081224551e+09 # HELP go_memstats_lookups_total Total number of pointer lookups. # TYPE go_memstats_lookups_total counter go_memstats_lookups_total 0 # HELP go_memstats_mallocs_total Total number of mallocs. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 1.14511e+06 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 6944 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 16384 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 128304 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 147456 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 1.0550512e+07 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 1.124756e+06 # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 1.048576e+06 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 1.048576e+06 # HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations. # TYPE go_memstats_sys_bytes gauge 100 11866 100 11866 0 0 96503 0 --:--:-- --:--:-- --:--:-- 97262 _sys_bytes 7.2286456e+07 # HELP http_request_duration_microseconds The HTTP request latencies in microseconds. # TYPE http_request_duration_microseconds summary http_request_duration_microseconds{handler="ansible-service-broker",quantile="0.5"} 6066.34 http_request_duration_microseconds{handler="ansible-service-broker",quantile="0.9"} 6066.34 http_request_duration_microseconds{handler="ansible-service-broker",quantile="0.99"} 6066.34 http_request_duration_microseconds_sum{handler="ansible-service-broker"} 42629.947 http_request_duration_microseconds_count{handler="ansible-service-broker"} 6 http_request_duration_microseconds{handler="prometheus",quantile="0.5"} 1620.509 http_request_duration_microseconds{handler="prometheus",quantile="0.9"} 2559.998 http_request_duration_microseconds{handler="prometheus",quantile="0.99"} 4577.65 http_request_duration_microseconds_sum{handler="prometheus"} 309983.51600000006 http_request_duration_microseconds_count{handler="prometheus"} 172 # HELP http_request_size_bytes The HTTP request sizes in bytes. # TYPE http_request_size_bytes summary http_request_size_bytes{handler="ansible-service-broker",quantile="0.5"} 142 http_request_size_bytes{handler="ansible-service-broker",quantile="0.9"} 142 http_request_size_bytes{handler="ansible-service-broker",quantile="0.99"} 142 http_request_size_bytes_sum{handler="ansible-service-broker"} 852 http_request_size_bytes_count{handler="ansible-service-broker"} 6 http_request_size_bytes{handler="prometheus",quantile="0.5"} 214 http_request_size_bytes{handler="prometheus",quantile="0.9"} 214 http_request_size_bytes{handler="prometheus",quantile="0.99"} 214 http_request_size_bytes_sum{handler="prometheus"} 35908 http_request_size_bytes_count{handler="prometheus"} 172 # HELP http_requests_total Total number of HTTP requests made. # TYPE http_requests_total counter http_requests_total{code="200",handler="ansible-service-broker",method="get"} 6 http_requests_total{code="200",handler="prometheus",method="get"} 172 # HELP http_response_size_bytes The HTTP response sizes in bytes. # TYPE http_response_size_bytes summary http_response_size_bytes{handler="ansible-service-broker",quantile="0.5"} 33010 http_response_size_bytes{handler="ansible-service-broker",quantile="0.9"} 33010 http_response_size_bytes{handler="ansible-service-broker",quantile="0.99"} 33010 http_response_size_bytes_sum{handler="ansible-service-broker"} 198060 http_response_size_bytes_count{handler="ansible-service-broker"} 6 http_response_size_bytes{handler="prometheus",quantile="0.5"} 2292 http_response_size_bytes{handler="prometheus",quantile="0.9"} 2297 http_response_size_bytes{handler="prometheus",quantile="0.99"} 11865 http_response_size_bytes_sum{handler="prometheus"} 450346 http_response_size_bytes_count{handler="prometheus"} 172 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 5.7 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1.048576e+06 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 11 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 4.089856e+07 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.57613711977e+09 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 5.2985856e+08
Verification failed in 2 points: 1.Alert: AnsibleServiceBrokerEnabled is not firing. 2.No metric named ansible_service_broker_enabled or automationbroker_info is found. cluster version:4.3.0-0.nightly-2019-12-12-004325 asb commit.id: 489b0fa7201136510e6145f74fb133ba50c8a809 $ oc get clusterservicebroker NAME URL STATUS AGE ansible-service-broker https://asb.openshift-ansible-service-broker.svc:1338/osb/ Ready 32m $ oc -n openshift-ansible-service-broker get ep NAME ENDPOINTS AGE asb 10.131.0.22:1338,10.131.0.22:1337 4m21s openshift-ansible-service-broker-operator-metrics 10.131.0.12:8383,10.131.0.12:8686 20m $ token=`oc -n openshift-monitoring sa get-token prometheus-k8s` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-1 -- curl -k -H "Authorization: Bearer $token" 'https://10.131.0.22:1338/metrics' | grep ansible-service-broker % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 11864 100 11864 0 0 92863 0 --:--:-- --:--:-- --:--:-- 93417 http_request_duration_microseconds{handler="ansible-service-broker",quantile="0.5"} 8810.245 http_request_duration_microseconds{handler="ansible-service-broker",quantile="0.9"} 16059.347 http_request_duration_microseconds{handler="ansible-service-broker",quantile="0.99"} 16059.347 http_request_duration_microseconds_sum{handler="ansible-service-broker"} 33130.590000000004 http_request_duration_microseconds_count{handler="ansible-service-broker"} 3 http_request_size_bytes{handler="ansible-service-broker",quantile="0.5"} 142 http_request_size_bytes{handler="ansible-service-broker",quantile="0.9"} 142 http_request_size_bytes{handler="ansible-service-broker",quantile="0.99"} 142 http_request_size_bytes_sum{handler="ansible-service-broker"} 426 http_request_size_bytes_count{handler="ansible-service-broker"} 3 http_requests_total{code="200",handler="ansible-service-broker",method="get"} 3 http_response_size_bytes{handler="ansible-service-broker",quantile="0.5"} 33010 http_response_size_bytes{handler="ansible-service-broker",quantile="0.9"} 33010 http_response_size_bytes{handler="ansible-service-broker",quantile="0.99"} 33010 http_response_size_bytes_sum{handler="ansible-service-broker"} 99030 http_response_size_bytes_count{handler="ansible-service-broker"} 3 [chuo@dhcp-140-51 .kube]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-1 -- curl -k -H "Authorization: Bearer $token" 'https://10.131.0.22:1338/metrics' | grep automationbroker_info % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 11853 100 11853 0 0 99169 0 --:--:-- --:--:-- --:--:-- 99605 $ oc get csv -n openshift-ansible-service-broker NAME DISPLAY VERSION REPLACES PHASE openshiftansibleservicebroker.4.3.0-201912121917 OpenShift Ansible Service Broker Operator 4.3.0-201912121917 Succeeded
You need to be grepping the 8383/8686 endpoint to see if they have the automationbroker_info. The 1338 is for metrics that the broker itself are outputting. The 8383/8686 endpoint have the metrics that the operator is pushing out which is what the alert is looking at. The real confusion to me is why the alert still isn't firing.
verification blocked by bug 1783829
Verified. asb packagemanifest tag:4.3.0-201912171717 1.Alert: AnsibleServiceBrokerEnabled is firing. 2.automationbroker_info metric is showed as design. $ oc -n openshift-ansible-service-broker get ep NAME ENDPOINTS AGE asb 10.128.2.49:1338,10.128.2.49:1337 46m openshift-ansible-service-broker-operator-metrics 10.128.2.47:8383,10.128.2.47:8686 44h $ oc -n openshift-monitoring exec prometheus-k8s-1 -c prometheus -- curl 'http://10.128.2.47:8686/metrics' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP automationbroker_info Information about the AutomationBroker custom resource. # TYPE automationbroker_info gauge automationbroker_info{namespace="openshift-ansible-service-broker",automationbroker="ansible-service-broker"} 1 100 232 100 232 0 0 35365 0 --:--:-- --:--:-- --:--:-- 38666
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062