Bug 1662273

Summary:

Can't find elasticsearch metrics in prometheus server

Product:

OpenShift Container Platform

Reporter:

Qiaoling Tang <qitang>

Component:

Logging

Assignee:

Josef Karasek <jkarasek>

Status:

CLOSED ERRATA

QA Contact:

Qiaoling Tang <qitang>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.1.0

CC:

aos-bugs, jcantril, rmeggins, surbania

Target Milestone:

---

Target Release:

4.1.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Clones:

1683359 (view as bug list)

Environment:

Last Closed:

2019-06-04 10:41:28 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1683359

Attachments:

Description	Flags
Prometheus web console	none
Full screenshot of prometheus "Service Discovery" page	none

Description Qiaoling Tang 2018-12-27 09:25:25 UTC

Description of problem:
Deploy OCP, the openshift-monitoring is deployed along with OCP, then deploy logging, make sure the servicemonitor object for fluentd and elasticsearch are created and the "openshfit-logging" namespace has label "openshift.io/cluster-monitoring=true", then try to find elasticsearch and fluentd metrics in prometheus server, no metrics could be found. Check prometheus-k8s pod log, it shows "system:serviceaccount:openshift-monitoring:prometheus-k8s cannot list endpoints/pods/services in the namespace openshift-logging".

$ oc get ns openshift-logging --show-labels
NAME                STATUS    AGE       LABELS
openshift-logging   Active    2h        openshift.io/cluster-logging=true,openshift.io/cluster-monitoring=true

$ oc get servicemonitor -n openshift-logging
NAME                            AGE
fluentd                         2h
monitor-elasticsearch-cluster   2h

$ oc logs -c prometheus prometheus-k8s-0 -n openshift-monitoring
level=info ts=2018-12-27T01:01:07.529856007Z caller=main.go:244 msg="Starting Prometheus" version="(version=2.5.0, branch=master, revision=5fef62b83aec95dc2f82b3d77927f8534eb3e46e)"
level=info ts=2018-12-27T01:01:07.52999923Z caller=main.go:245 build_context="(go=go1.10.4, user=pgier@pgier-laptop, date=20181107-20:33:26)"
level=info ts=2018-12-27T01:01:07.53003841Z caller=main.go:246 host_details="(Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 prometheus-k8s-0 (none))"
level=info ts=2018-12-27T01:01:07.530077876Z caller=main.go:247 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-12-27T01:01:07.530105425Z caller=main.go:248 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2018-12-27T01:01:07.531573718Z caller=main.go:562 msg="Starting TSDB ..."
level=info ts=2018-12-27T01:01:07.535020338Z caller=web.go:399 component=web msg="Start listening for connections" address=127.0.0.1:9090
level=info ts=2018-12-27T01:01:07.537543763Z caller=main.go:572 msg="TSDB started"

level=info ts=2018-12-27T01:16:11.071041237Z caller=main.go:658 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=error ts=2018-12-27T01:16:11.464466238Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list pods in the namespace \"openshift-logging\": no RBAC policy matched"
level=error ts=2018-12-27T01:16:11.490701696Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list endpoints in the namespace \"openshift-logging\": no RBAC policy matched"
level=error ts=2018-12-27T01:16:11.525842364Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list services in the namespace \"openshift-logging\": no RBAC policy matched"
level=error ts=2018-12-27T01:16:12.467103389Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list pods in the namespace \"openshift-logging\": no RBAC policy matched"
level=error ts=2018-12-27T01:16:12.494088798Z caller=main.go:240 component=k8s_client_runtime 

These metrics has been exposed:

$ curl -k -H "Authorization: Bearer `oc sa get-token prometheus-k8s -n openshift-monitoring`"   -H "Content-type: application/json" https://`oc get pods --selector component=elasticsearch -o jsonpath={.items[?\(@.status.phase==\"Running\"\)].metadata.name} | cut -d" " -f1 | xargs oc get pod -o jsonpath={.status.podIP}`:9200/_prometheus/metrics
# HELP es_indices_indexing_index_current_number Current rate of documents indexed
# TYPE es_indices_indexing_index_current_number gauge
es_indices_indexing_index_current_number{cluster="elasticsearch",node="elasticsearch-clientdatamaster-0-1",nodeid="qYL4QC-YQaqEXleI934n7g",} 0.0
# HELP es_indices_search_scroll_current_number Current rate of search scrolls
# TYPE es_indices_search_scroll_current_number gauge
es_indices_search_scroll_current_number{cluster="elasticsearch",node="elasticsearch-clientdatamaster-0-1",nodeid="qYL4QC-YQaqEXleI934n7g",} 0.0
# HELP es_indices_querycache_evictions_count Count of evictions in query cache
# TYPE es_indices_querycache_evictions_count gauge
es_indices_querycache_evictions_count{cluster="elasticsearch",node="elasticsearch-clientdatamaster-0-1",nodeid="qYL4QC-YQaqEXleI934n7g",} 0.0
# HELP es_indices_get_count Count of get commands
# TYPE es_indices_get_count gauge
es_indices_get_count{cluster="elasticsearch",node="elasticsearch-clientdatamaster-0-1",nodeid="qYL4QC-YQaqEXleI934n7g",} 2409.0


$ curl -k -H "Authorization: Bearer `oc sa get-token prometheus-k8s -n openshift-monitoring`"   -H "Content-type: application/json" https://`oc get pods --selector component=fluentd -o jsonpath={.items[?\(@.status.phase==\"Running\"\)].metadata.name} | cut -d" " -f1 | xargs oc get pod -o jsonpath={.status.podIP}`:24231/metrics
# TYPE fluentd_status_buffer_queue_length gauge
# HELP fluentd_status_buffer_queue_length Current buffer queue length.
fluentd_status_buffer_queue_length{hostname="fluentd-89xvz",plugin_id="object:3ff8deb0fd80",plugin_category="output",type="elasticsearch"} 0.0
fluentd_status_buffer_queue_length{hostname="fluentd-89xvz",plugin_id="elasticsearch-apps",plugin_category="output",type="elasticsearch"} 0.0
# TYPE fluentd_status_buffer_total_bytes gauge
# HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
fluentd_status_buffer_total_bytes{hostname="fluentd-89xvz",plugin_id="object:3ff8deb0fd80",plugin_category="output",type="elasticsearch"} 0.0
fluentd_status_buffer_total_bytes{hostname="fluentd-89xvz",plugin_id="elasticsearch-apps",plugin_category="output",type="elasticsearch"} 32895.0
# TYPE fluentd_status_retry_count gauge
# HELP fluentd_status_retry_count Current retry counts.
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="object:3ff8e00c02b0",plugin_category="output",type="relabel"} 0.0
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="object:3ff8e008aebc",plugin_category="output",type="rewrite_tag_filter"} 0.0
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="object:3ff8dff67378",plugin_category="output",type="relabel"} 0.0
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="object:3ff8dff6be78",plugin_category="output",type="rewrite_tag_filter"} 0.0
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="object:3ff8de9f96a8",plugin_category="output",type="copy"} 0.0
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="object:3ff8deb0fd80",plugin_category="output",type="elasticsearch"} 0.0
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="object:3ff8e0c90f68",plugin_category="output",type="copy"} 0.0
fluentd_status_retry_count{hostname="fluentd-89xvz",plugin_id="elasticsearch-apps",plugin_category="output",type="elasticsearch"} 1.0



Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2018-12-26-190509   True        False         7h        Cluster version is 4.0.0-0.alpha-2018-12-26-190509

$ ./bin/openshift-install version
./bin/openshift-install v0.8.0-master-2-g5e7b36d6351c9cc773f1dadc64abf9d7041151b1


How reproducible:
Always

Steps to Reproduce:
1. Deploy OCP use Next-Gen installer
2. Deploy logging
3. log into prometheus metrics server to find fluentd or elasticsearch metrics

Actual results:


Expected results:


Additional info:

Comment 2 Qiaoling Tang 2019-01-02 06:36:30 UTC

Just FYI:

I tried to add role "prometheus-k8s" and rolebinding "prometheus-k8s" in openshift-logging project, after adding these objects, there is no error message "system:serviceaccount:openshift-monitoring:prometheus-k8s cannot list endpoints/pods/services in the namespace openshift-logging" in prometheus-k8s pod, but still can't find fluentd and elasticsearch metrics in prometheus server.

$ oc get role prometheus-k8s -n openshift-logging -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: 2019-01-02T00:59:16Z
  name: prometheus-k8s
  namespace: openshift-logging
  resourceVersion: "19282"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/roles/prometheus-k8s
  uid: a101fdfe-0e29-11e9-bac3-0e4166c7666c
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch

$ oc get rolebindings prometheus-k8s -n openshift-logging -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: 2019-01-02T00:59:24Z
  name: prometheus-k8s
  namespace: openshift-logging
  resourceVersion: "19390"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/rolebindings/prometheus-k8s
  uid: a5b0df68-0e29-11e9-80bc-0a9697cbec22
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus-k8s
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring

Comment 3 Sergiusz Urbaniak 2019-01-02 14:55:10 UTC

Can you please post the output of "oc get servicemonitor -n openshift-logging -o yaml"?

Comment 4 Jeff Cantrill 2019-01-02 18:12:13 UTC

This may be resolved with the merge of https://github.com/openshift/cluster-logging-operator/pull/65 for fluentd.  Elasticsearch also has the correct capath: https://github.com/openshift/elasticsearch-operator/blob/master/pkg/k8shandler/service_monitor.go#L38

@Sergiusz is there more required if these paths are defined correctly?

Comment 5 Qiaoling Tang 2019-01-03 00:19:53 UTC

$ oc get servicemonitors -o yaml -n openshift-logging
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    creationTimestamp: 2019-01-02T00:55:00Z
    generation: 1
    name: fluentd
    namespace: openshift-logging
    resourceVersion: "18545"
    selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/fluentd
    uid: 07e4e6b6-0e29-11e9-bb49-129920ef04ca
  spec:
    endpoints:
    - port: "24231"
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
    selector:
      matchLabels:
        logging-infra: fluentd

- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    creationTimestamp: 2019-01-02T00:55:53Z
    generation: 1
    labels:
      cluster-name: elasticsearch
    name: monitor-elasticsearch-cluster
    namespace: openshift-logging
    ownerReferences:
    - apiVersion: logging.openshift.io/v1alpha1
      controller: true
      kind: Elasticsearch
      name: elasticsearch
      uid: 275c7a22-0e29-11e9-bac3-0e4166c7666c
    resourceVersion: "18546"
    selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/monitor-elasticsearch-cluster
    uid: 2780a011-0e29-11e9-bb49-129920ef04ca
  spec:
    endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      path: /_prometheus/metrics
      port: restapi
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
    jobLabel: monitor-elasticsearch
    namespaceSelector: {}
    selector:
      matchLabels:
        cluster-name: elasticsearch
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 6 Qiaoling Tang 2019-01-03 00:49:39 UTC

@Jeff and @Sergiusz, I'v updated the caFile to /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I deploying logging.

Comment 7 Jeff Cantrill 2019-01-03 15:38:11 UTC

(In reply to Qiaoling Tang from comment #6)
> @Jeff and @Sergiusz, I'v updated the caFile to
> /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I
> deploying logging.

Does modifying the path make any difference or do you still see the same error?

Comment 8 Qiaoling Tang 2019-01-04 00:25:55 UTC

(In reply to Jeff Cantrill from comment #7)
> (In reply to Qiaoling Tang from comment #6)
> > @Jeff and @Sergiusz, I'v updated the caFile to
> > /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I
> > deploying logging.
> 
> Does modifying the path make any difference or do you still see the same
> error?

After changing the path, I can't see any error message in prometheus-k8s pod logs.

Comment 9 Qiaoling Tang 2019-01-04 00:45:35 UTC

Here is the log in prometheus-k8s before and after modifying the path of caFile in servicemonitor:

level=error ts=2019-01-03T08:52:19.21603214Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list services in the namespace \"openshift-logging\": no RBAC policy matched"
level=error ts=2019-01-03T08:52:22.5496207Z caller=scrape.go:148 component="scrape manager" scrape_pool=openshift-logging/monitor-elasticsearch-cluster/0 msg="Error creating HTTP client" err="unable to use specified CA cert /etc/prometheus/configmaps/prometheus-serving-certs-ca-bundle/service-ca.crt: open /etc/prometheus/configmaps/prometheus-serving-certs-ca-bundle/service-ca.crt: no such file or directory"
level=info ts=2019-01-03T08:54:08.815723419Z caller=main.go:632 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-01-03T08:54:09.138503615Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-01-03T08:54:09.140177884Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-01-03T08:54:09.141700801Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"

And after changing the path, I can find these configurations in /etc/prometheus/config_out/prometheus.env.yaml in prometheus pod:

- job_name: openshift-logging/fluentd/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - openshift-logging
  tls_config:
    ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
    insecure_skip_verify: false
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_logging_infra]
    separator: ;
    regex: fluentd
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: "24231"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: "24231"
    action: replace
- job_name: openshift-logging/monitor-elasticsearch-cluster/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /_prometheus/metrics
  scheme: https
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - openshift-logging
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
    insecure_skip_verify: false
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_cluster_name]
    separator: ;
    regex: elasticsearch
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: restapi
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: restapi
    action: replace

Comment 10 Jeff Cantrill 2019-01-11 15:29:37 UTC

Updated roles and bindings in: https://github.com/openshift/cluster-logging-operator/pull/81

Comment 11 Qiaoling Tang 2019-01-14 02:31:45 UTC

Still can't find fluentd and elasticsearch metrics, no error message in pod logs.

Comment 12 Qiaoling Tang 2019-01-14 07:33:15 UTC

Created attachment 1520465 [details]
Prometheus web console

There are some "Discovered Labels" in prometheus web console, but none of them is in the "Target Labels".

Comment 13 Qiaoling Tang 2019-01-14 08:38:34 UTC

Created attachment 1520474 [details]
Full screenshot of prometheus "Service Discovery" page

Comment 15 Josef Karasek 2019-02-28 10:56:32 UTC

@Sergiusz: services for elasticsearch and servicemonitors are created by elasticsearch operator, I have a PR, where the svc labels get matched by
the label selector and prometheus finally scrapes metrics.

@Qiaoling: pls wait till https://github.com/openshift/elasticsearch-operator/pull/76 merges. ES metrics should be in prom then.

Comment 17 Qiaoling Tang 2019-03-04 09:30:38 UTC

Verified in quay.io/openshift/origin-elasticsearch-operator@sha256:591e253eef19b18e404d165df8db99908a8fe52d09aa5dcba7d0c835d3487f54

Comment 19 Qiaoling Tang 2019-03-27 00:54:40 UTC

Reopen this bug because this issue can be reproduced in 4.0.0-0.nightly-2019-03-25-180911, CLO and EO images are: quay.io/openshift/origin-cluster-logging-operator@sha256:2193bbb23eba530cd76574e0c25f9b3a7f966a7f10f7eb0739f465644614df48 , quay.io/openshift/origin-elasticsearch-operator@sha256:8e7a748802fc284162f5dadf7cdfd9ae2adfb72b6f9682d3b91ee87024aa0a76.

Logs in prometheus-k8s pod:
level=error ts=2019-03-26T05:49:04.590Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.591Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.597Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""

$ oc get sa -n openshift-logging
NAME                       SECRETS   AGE
builder                    2         6h53m
cluster-logging-operator   2         6h52m
curator                    2         119m
default                    2         6h53m
deployer                   2         6h53m
elasticsearch              2         119m
eventrouter                2         40m
kibana                     2         119m
logcollector               2         119m

$ oc get servicemonitor
NAME                            AGE
monitor-elasticsearch-cluster   120m


The configurations are in prometheus server, but check targets in prometheus console, all the target labels are dropped, see the attachment.
- job_name: openshift-logging/monitor-elasticsearch-cluster/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /_prometheus/metrics
  scheme: https
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - openshift-logging
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
    server_name: elasticsearch-metrics.openshift-logging.svc
    insecure_skip_verify: false
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_cluster_name]
    separator: ;
    regex: elasticsearch
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: elasticsearch-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: elasticsearch-metrics
    action: replace

[qitang@wlc-trust-182 aws]$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-metrics
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219559"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-metrics
  uid: 7be9f5bd-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - endpoints
  verbs:
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
[qitang@wlc-trust-182 aws]$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-metrics
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219560"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics
  uid: 7beae764-4f8e-11e9-a66f-065965d80050
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elasticsearch-metrics
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring


$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-proxy
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219562"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-proxy
  uid: 7bebd503-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-proxy
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219565"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy
  uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elasticsearch-proxy
subjects:
- kind: ServiceAccount
  name: elasticsearch
  namespace: openshift-logging


    Private Description Qiaoling Tang 2019-02-18 02:57:59 UTC
Created attachment 1535793 [details]
Elasticsearch-prometheus-rule

Description of problem:
Deploy logging, check elasticsearch prometheusrules in logging namespace and prometheus server, 

elasticsearch prometheusrules can be found in logging namespace,
$ oc get prometheusrule -n openshift-logging
NAME                             AGE
elasticsearch-prometheus-rules   4m15s

log into openshift web console, go to "Monitoring", click "Metrics", then try to find the elasticsearch prometheusrules in prometheus-k8s web console, go to "Status"-->"Rules", no elasticsearch prometheusrules in the page.

Version-Release number of selected component (if applicable):
4.0.0-0.nightly-2019-02-17-024922

How reproducible:
Always

Steps to Reproduce:
1.Deploy logging
2.check prometheusrules in logging namespace
3.check prometheusrules in prometheus server

Actual results:
The configuration of elasticsearch prometheusrule isn't in prometheus server

Expected results:
Elasticsearch prometheusrule can be found in prometheus server.

Additional info:

Qiaoling Tang 2019-02-18 02:58:31 UTC
Target Release: --- → 4.0.0
Jeff Cantrill 2019-02-26 15:40:22 UTC
Status: NEW → ASSIGNED
Assignee: jcantril → jkarasek
Doc Type: If docs needed, set a value → No Doc Update
Red Hat Bugzilla 2019-02-26 15:40:22 UTC
Flags: requires_doc_text-
Doc Type: No Doc Update → No Doc Update
RHEL Product and Program Management 2019-02-26 15:40:28 UTC
Flags: pm_ack+
Rule Engine Rule: OSE-pm-ack
Flags: devel_ack+
Rule Engine Rule: OSE-devel-ack
Flags: qa_ack+
Rule Engine Rule: OSE-qa-ack
Josef Karasek 2019-02-28 11:11:45 UTC
Status: ASSIGNED → POST
External Bug ID: Github openshift/cluster-monitoring-operator/pull/262 Github openshift/elasticsearch-operator/pull/7...
Josef Karasek 2019-02-28 16:22:04 UTC
Status: POST → ON_QA
Anping Li 2019-03-01 02:58:03 UTC
QA Contact: anli → qitang
    Private Comment 1 Qiaoling Tang 2019-03-01 03:11:27 UTC
Tested in 4.0.0-0.nightly-2019-02-28-054829, elasticsearch-prometheus-rules can be found in prometheus server.

Move bug to VERIFIED.

Qiaoling Tang 2019-03-01 03:11:43 UTC
Status: ON_QA → VERIFIED
    Private Comment 2 chris alfonso 2019-03-12 14:02:41 UTC
RED HAT CONFIDENTIAL
Moved Target Release from 4.0.0 to 4.1.0.

Target Release: 4.0.0 → 4.1.0
chris alfonso 2019-03-12 14:27:03 UTC
Version: 4.0.0 → 4.1
    Private Comment 3 Qiaoling Tang 2019-03-26 08:16:05 UTC
Reopen this bug because this issue can be reproduced in 4.0.0-0.nightly-2019-03-25-180911, CLO and EO images are: quay.io/openshift/origin-cluster-logging-operator@sha256:2193bbb23eba530cd76574e0c25f9b3a7f966a7f10f7eb0739f465644614df48 , quay.io/openshift/origin-elasticsearch-operator@sha256:8e7a748802fc284162f5dadf7cdfd9ae2adfb72b6f9682d3b91ee87024aa0a76.

Logs in prometheus-k8s pod:
level=error ts=2019-03-26T05:49:04.590Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.591Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.597Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""

$ oc get sa -n openshift-logging
NAME                       SECRETS   AGE
builder                    2         6h53m
cluster-logging-operator   2         6h52m
curator                    2         119m
default                    2         6h53m
deployer                   2         6h53m
elasticsearch              2         119m
eventrouter                2         40m
kibana                     2         119m
logcollector               2         119m

$ oc get servicemonitor
NAME                            AGE
monitor-elasticsearch-cluster   120m

Status: VERIFIED → ASSIGNED
Verified: FailedQA
    Private Comment 4 Josef Karasek 2019-03-26 08:21:59 UTC
Can you provide output of:

oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics
oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics

oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy
oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy

    Private Comment 5 Qiaoling Tang 2019-03-26 08:22 UTC
Created attachment 1547943 [details]
Service Discovery page

The configurations are in prometheus server, but check targets in prometheus console, all the target labels are dropped, see the attachment.
- job_name: openshift-logging/monitor-elasticsearch-cluster/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /_prometheus/metrics
  scheme: https
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - openshift-logging
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
    server_name: elasticsearch-metrics.openshift-logging.svc
    insecure_skip_verify: false
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_cluster_name]
    separator: ;
    regex: elasticsearch
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: elasticsearch-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: elasticsearch-metrics
    action: replace

    Private Comment 6 Qiaoling Tang 2019-03-26 08:24:00 UTC
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-metrics
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219560"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics
  uid: 7beae764-4f8e-11e9-a66f-065965d80050
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elasticsearch-metrics
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring

$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-proxy
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219565"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy
  uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elasticsearch-proxy
subjects:
- kind: ServiceAccount
  name: elasticsearch
  namespace: openshift-logging

    Private Comment 7 Qiaoling Tang 2019-03-26 08:26:07 UTC
Sorry, missed some info in my last comment:
[qitang@wlc-trust-182 aws]$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-metrics
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219559"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-metrics
  uid: 7be9f5bd-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - endpoints
  verbs:
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
[qitang@wlc-trust-182 aws]$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-metrics
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219560"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics
  uid: 7beae764-4f8e-11e9-a66f-065965d80050
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elasticsearch-metrics
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring


$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-proxy
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219562"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-proxy
  uid: 7bebd503-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2019-03-26T06:14:59Z
  name: elasticsearch-proxy
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  resourceVersion: "219565"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy
  uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elasticsearch-proxy
subjects:
- kind: ServiceAccount
  name: elasticsearch
  namespace: openshift-logging


    Private Comment 8 Josef Karasek 2019-03-26 09:14:42 UTC
Eric, the test ran with the latest EO code.
I found the following problems:
1) secret `elasticsearch` is in ns openshift-logging, but EO runs in openshift-operators
2) EO misconfigures svc/elasticsearch-metrics. Please revert to my original implementation of servicemonitor and metrics svc
3) ES cluster is in red state and EO log is full of:
time="2019-03-26T09:09:59Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-clientdatamaster-0-2: red / green"


$ oc --kubeconfig=./kubeconfig -n openshift-logging get pod 
NAME                                                  READY   STATUS    RESTARTS   AGE
cluster-logging-operator-799f97f47-66jfh              1/1     Running   0          7h23m
elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh   2/2     Running   0          77m
elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds   2/2     Running   0          77m
eventrouter-649fc6b98b-wckpz                          1/1     Running   0          71m
fluentd-cbvjd                                         1/1     Running   0          68m
fluentd-r6w4d                                         1/1     Running   0          68m
fluentd-thh9q                                         1/1     Running   0          68m
fluentd-v2pbm                                         1/1     Running   0          68m
fluentd-xgbmg                                         1/1     Running   0          68m
fluentd-zrgmk                                         1/1     Running   0          68m
kibana-568746d6d9-4d48d

$ oc --kubeconfig=./kubeconfig -n openshift-operators get pod
elasticsearch-operator-5b4977987d-wg97j               1/1     Running   0          7h24m

$ oc --kubeconfig=./kubeconfig get secret --all-namespaces | grep elasti
openshift-logging                                       elasticsearch                                                        Opaque                                7      153m
openshift-logging                                       elasticsearch-dockercfg-hw4qt                                        kubernetes.io/dockercfg               1      153m
openshift-logging                                       elasticsearch-metrics                                                kubernetes.io/tls                     2      153m
openshift-logging                                       elasticsearch-token-pfr8g                                            kubernetes.io/service-account-token   4      153m
openshift-logging                                       elasticsearch-token-vc8pq                                            kubernetes.io/service-account-token   4      153m
openshift-operators                                     elasticsearch-operator-dockercfg-qjnbj                               kubernetes.io/dockercfg               1      7h26m
openshift-operators                                     elasticsearch-operator-token-l2ztc                                   kubernetes.io/service-account-token   4      7h26m
openshift-operators                                     elasticsearch-operator-token-tgr6s                                   kubernetes.io/service-account-token   4      7h26m

$ oc --kubeconfig=./kubeconfig -n openshift-operators logs elasticsearch-operator-5b4977987d-wg97j
time="2019-03-26T01:21:53Z" level=info msg="Go Version: go1.10.3"
time="2019-03-26T01:21:53Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-03-26T01:21:53Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-03-26T01:21:53Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, , 5000000000"
E0326 02:49:34.516615       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=52895, ErrCode=NO_ERROR, debug=""
E0326 02:50:43.399058       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=815, ErrCode=NO_ERROR, debug=""
E0326 04:36:08.932707       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=75571, ErrCode=NO_ERROR, debug=""
time="2019-03-26T04:37:27Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Roles and RoleBindings for Elasticsearch cluster: failed to create ClusterRoleBindig elasticsearch-proxy: Post https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings: unexpected EOF"
time="2019-03-26T04:49:35Z" level=error msg="Error reading secret elasticsearch: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/secrets/elasticsearch: unexpected EOF"
time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T04:52:17Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Services for Elasticsearch cluster: Failure creating service Failed to get elasticsearch-cluster service: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/services/elasticsearch-cluster: unexpected EOF"
time="2019-03-26T05:40:03Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:03Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:40:05Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found"
time="2019-03-26T05:40:05Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging"
time="2019-03-26T05:40:05Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)"
time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:46:33Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found"
time="2019-03-26T05:46:33Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging"
time="2019-03-26T05:46:33Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)"
E0326 06:50:48.820723       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=905, ErrCode=NO_ERROR, debug=""
time="2019-03-26T08:33:21Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: request declared a Content-Length of 1708 but only wrote 0 bytes"
E0326 08:33:21.415632       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=73429, ErrCode=NO_ERROR, debug=""
time="2019-03-26T08:35:57Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: Get https://172.30.0.1:443/apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch: unexpected EOF"


- apiVersion: logging.openshift.io/v1alpha1
  kind: Elasticsearch
  metadata:
    creationTimestamp: 2019-03-26T06:14:59Z
    generation: 1
    name: elasticsearch
    namespace: openshift-logging
    ownerReferences:
    - apiVersion: logging.openshift.io/v1alpha1
      controller: true
      kind: ClusterLogging
      name: instance
      uid: 7ba7a5ba-4f8e-11e9-a058-0a0c0e8a4a2e
    resourceVersion: "328550"
    selfLink: /apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch
    uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
  spec:
    managementState: Managed
    nodeSpec:
      image: quay.io/openshift/origin-logging-elasticsearch5:latest
      resources:
        limits:
          cpu: "1"
          memory: 4Gi
        requests:
          cpu: 200m
          memory: 1Gi
    nodes:
    - nodeCount: 2
      resources:
        limits:
          cpu: "1"
          memory: 4Gi
        requests:
          cpu: 200m
          memory: 1Gi
      roles:
      - client
      - data
      - master
      storage:
        size: 10Gi
        storageClassName: gp2
    redundancyPolicy: SingleRedundancy
  status:
    clusterHealth: red
    conditions: []
    nodes:
    - deploymentName: elasticsearch-clientdatamaster-0-1
      upgradeStatus:
        scheduledRedeploy: "True"
    - deploymentName: elasticsearch-clientdatamaster-0-2
      upgradeStatus:
        scheduledRedeploy: "True"
    pods:
      client:
        failed: []
        notReady: []
        ready:
        - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh
        - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds
      data:
        failed: []
        notReady: []
        ready:
        - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh
        - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds
      master:
        failed: []
        notReady: []
        ready:
        - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh
        - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds
    shardAllocationEnabled: none
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 21 Qiaoling Tang 2019-04-04 01:34:27 UTC

Verified in latest image: quay.io/openshift/origin-elasticsearch-operator@sha256:f3e56412389727015e80b01420304f4736c58be20410fe67b9e4e676ba7cfd4a

Comment 23 errata-xmlrpc 2019-06-04 10:41:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758