Bug 1662273
Summary: | Can't find elasticsearch metrics in prometheus server | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | ||||||
Component: | Logging | Assignee: | Josef Karasek <jkarasek> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Qiaoling Tang <qitang> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.1.0 | CC: | aos-bugs, jcantril, rmeggins, surbania | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.1.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1683359 (view as bug list) | Environment: | |||||||
Last Closed: | 2019-06-04 10:41:28 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1683359 | ||||||||
Attachments: |
|
Description
Qiaoling Tang
2018-12-27 09:25:25 UTC
Just FYI: I tried to add role "prometheus-k8s" and rolebinding "prometheus-k8s" in openshift-logging project, after adding these objects, there is no error message "system:serviceaccount:openshift-monitoring:prometheus-k8s cannot list endpoints/pods/services in the namespace openshift-logging" in prometheus-k8s pod, but still can't find fluentd and elasticsearch metrics in prometheus server. $ oc get role prometheus-k8s -n openshift-logging -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: creationTimestamp: 2019-01-02T00:59:16Z name: prometheus-k8s namespace: openshift-logging resourceVersion: "19282" selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/roles/prometheus-k8s uid: a101fdfe-0e29-11e9-bac3-0e4166c7666c rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods verbs: - get - list - watch $ oc get rolebindings prometheus-k8s -n openshift-logging -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: creationTimestamp: 2019-01-02T00:59:24Z name: prometheus-k8s namespace: openshift-logging resourceVersion: "19390" selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/rolebindings/prometheus-k8s uid: a5b0df68-0e29-11e9-80bc-0a9697cbec22 roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: prometheus-k8s subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring Can you please post the output of "oc get servicemonitor -n openshift-logging -o yaml"? This may be resolved with the merge of https://github.com/openshift/cluster-logging-operator/pull/65 for fluentd. Elasticsearch also has the correct capath: https://github.com/openshift/elasticsearch-operator/blob/master/pkg/k8shandler/service_monitor.go#L38 @Sergiusz is there more required if these paths are defined correctly? $ oc get servicemonitors -o yaml -n openshift-logging apiVersion: v1 items: - apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: creationTimestamp: 2019-01-02T00:55:00Z generation: 1 name: fluentd namespace: openshift-logging resourceVersion: "18545" selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/fluentd uid: 07e4e6b6-0e29-11e9-bb49-129920ef04ca spec: endpoints: - port: "24231" scheme: https tlsConfig: caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt selector: matchLabels: logging-infra: fluentd - apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: creationTimestamp: 2019-01-02T00:55:53Z generation: 1 labels: cluster-name: elasticsearch name: monitor-elasticsearch-cluster namespace: openshift-logging ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 275c7a22-0e29-11e9-bac3-0e4166c7666c resourceVersion: "18546" selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/monitor-elasticsearch-cluster uid: 2780a011-0e29-11e9-bb49-129920ef04ca spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token path: /_prometheus/metrics port: restapi scheme: https tlsConfig: caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt jobLabel: monitor-elasticsearch namespaceSelector: {} selector: matchLabels: cluster-name: elasticsearch kind: List metadata: resourceVersion: "" selfLink: "" @Jeff and @Sergiusz, I'v updated the caFile to /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I deploying logging. (In reply to Qiaoling Tang from comment #6) > @Jeff and @Sergiusz, I'v updated the caFile to > /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I > deploying logging. Does modifying the path make any difference or do you still see the same error? (In reply to Jeff Cantrill from comment #7) > (In reply to Qiaoling Tang from comment #6) > > @Jeff and @Sergiusz, I'v updated the caFile to > > /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I > > deploying logging. > > Does modifying the path make any difference or do you still see the same > error? After changing the path, I can't see any error message in prometheus-k8s pod logs. Here is the log in prometheus-k8s before and after modifying the path of caFile in servicemonitor: level=error ts=2019-01-03T08:52:19.21603214Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list services in the namespace \"openshift-logging\": no RBAC policy matched" level=error ts=2019-01-03T08:52:22.5496207Z caller=scrape.go:148 component="scrape manager" scrape_pool=openshift-logging/monitor-elasticsearch-cluster/0 msg="Error creating HTTP client" err="unable to use specified CA cert /etc/prometheus/configmaps/prometheus-serving-certs-ca-bundle/service-ca.crt: open /etc/prometheus/configmaps/prometheus-serving-certs-ca-bundle/service-ca.crt: no such file or directory" level=info ts=2019-01-03T08:54:08.815723419Z caller=main.go:632 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml level=info ts=2019-01-03T08:54:09.138503615Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config" level=info ts=2019-01-03T08:54:09.140177884Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config" level=info ts=2019-01-03T08:54:09.141700801Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config" And after changing the path, I can find these configurations in /etc/prometheus/config_out/prometheus.env.yaml in prometheus pod: - job_name: openshift-logging/fluentd/0 scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - role: endpoints namespaces: names: - openshift-logging tls_config: ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt insecure_skip_verify: false relabel_configs: - source_labels: [__meta_kubernetes_service_label_logging_infra] separator: ; regex: fluentd replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: "24231" replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: "24231" action: replace - job_name: openshift-logging/monitor-elasticsearch-cluster/0 scrape_interval: 30s scrape_timeout: 10s metrics_path: /_prometheus/metrics scheme: https kubernetes_sd_configs: - role: endpoints namespaces: names: - openshift-logging bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt insecure_skip_verify: false relabel_configs: - source_labels: [__meta_kubernetes_service_label_cluster_name] separator: ; regex: elasticsearch replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: restapi replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch] separator: ; regex: (.+) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: restapi action: replace Updated roles and bindings in: https://github.com/openshift/cluster-logging-operator/pull/81 Still can't find fluentd and elasticsearch metrics, no error message in pod logs. Created attachment 1520465 [details]
Prometheus web console
There are some "Discovered Labels" in prometheus web console, but none of them is in the "Target Labels".
Created attachment 1520474 [details]
Full screenshot of prometheus "Service Discovery" page
@Sergiusz: services for elasticsearch and servicemonitors are created by elasticsearch operator, I have a PR, where the svc labels get matched by the label selector and prometheus finally scrapes metrics. @Qiaoling: pls wait till https://github.com/openshift/elasticsearch-operator/pull/76 merges. ES metrics should be in prom then. Verified in quay.io/openshift/origin-elasticsearch-operator@sha256:591e253eef19b18e404d165df8db99908a8fe52d09aa5dcba7d0c835d3487f54 Reopen this bug because this issue can be reproduced in 4.0.0-0.nightly-2019-03-25-180911, CLO and EO images are: quay.io/openshift/origin-cluster-logging-operator@sha256:2193bbb23eba530cd76574e0c25f9b3a7f966a7f10f7eb0739f465644614df48 , quay.io/openshift/origin-elasticsearch-operator@sha256:8e7a748802fc284162f5dadf7cdfd9ae2adfb72b6f9682d3b91ee87024aa0a76. Logs in prometheus-k8s pod: level=error ts=2019-03-26T05:49:04.590Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\"" level=error ts=2019-03-26T05:49:04.591Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\"" level=error ts=2019-03-26T05:49:04.597Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\"" $ oc get sa -n openshift-logging NAME SECRETS AGE builder 2 6h53m cluster-logging-operator 2 6h52m curator 2 119m default 2 6h53m deployer 2 6h53m elasticsearch 2 119m eventrouter 2 40m kibana 2 119m logcollector 2 119m $ oc get servicemonitor NAME AGE monitor-elasticsearch-cluster 120m The configurations are in prometheus server, but check targets in prometheus console, all the target labels are dropped, see the attachment. - job_name: openshift-logging/monitor-elasticsearch-cluster/0 scrape_interval: 30s scrape_timeout: 10s metrics_path: /_prometheus/metrics scheme: https kubernetes_sd_configs: - role: endpoints namespaces: names: - openshift-logging bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt server_name: elasticsearch-metrics.openshift-logging.svc insecure_skip_verify: false relabel_configs: - source_labels: [__meta_kubernetes_service_label_cluster_name] separator: ; regex: elasticsearch replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: elasticsearch-metrics replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch] separator: ; regex: (.+) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: elasticsearch-metrics action: replace [qitang@wlc-trust-182 aws]$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219559" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-metrics uid: 7be9f5bd-4f8e-11e9-a66f-065965d80050 rules: - apiGroups: - "" resources: - pods - services - endpoints verbs: - list - watch - nonResourceURLs: - /metrics verbs: - get [qitang@wlc-trust-182 aws]$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219560" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics uid: 7beae764-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-metrics subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring $ oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219562" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-proxy uid: 7bebd503-4f8e-11e9-a66f-065965d80050 rules: - apiGroups: - authentication.k8s.io resources: - tokenreviews verbs: - create - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create $ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219565" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-proxy subjects: - kind: ServiceAccount name: elasticsearch namespace: openshift-logging Private Description Qiaoling Tang 2019-02-18 02:57:59 UTC Created attachment 1535793 [details] Elasticsearch-prometheus-rule Description of problem: Deploy logging, check elasticsearch prometheusrules in logging namespace and prometheus server, elasticsearch prometheusrules can be found in logging namespace, $ oc get prometheusrule -n openshift-logging NAME AGE elasticsearch-prometheus-rules 4m15s log into openshift web console, go to "Monitoring", click "Metrics", then try to find the elasticsearch prometheusrules in prometheus-k8s web console, go to "Status"-->"Rules", no elasticsearch prometheusrules in the page. Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-02-17-024922 How reproducible: Always Steps to Reproduce: 1.Deploy logging 2.check prometheusrules in logging namespace 3.check prometheusrules in prometheus server Actual results: The configuration of elasticsearch prometheusrule isn't in prometheus server Expected results: Elasticsearch prometheusrule can be found in prometheus server. Additional info: Qiaoling Tang 2019-02-18 02:58:31 UTC Target Release: --- → 4.0.0 Jeff Cantrill 2019-02-26 15:40:22 UTC Status: NEW → ASSIGNED Assignee: jcantril → jkarasek Doc Type: If docs needed, set a value → No Doc Update Red Hat Bugzilla 2019-02-26 15:40:22 UTC Flags: requires_doc_text- Doc Type: No Doc Update → No Doc Update RHEL Product and Program Management 2019-02-26 15:40:28 UTC Flags: pm_ack+ Rule Engine Rule: OSE-pm-ack Flags: devel_ack+ Rule Engine Rule: OSE-devel-ack Flags: qa_ack+ Rule Engine Rule: OSE-qa-ack Josef Karasek 2019-02-28 11:11:45 UTC Status: ASSIGNED → POST External Bug ID: Github openshift/cluster-monitoring-operator/pull/262 Github openshift/elasticsearch-operator/pull/7... Josef Karasek 2019-02-28 16:22:04 UTC Status: POST → ON_QA Anping Li 2019-03-01 02:58:03 UTC QA Contact: anli → qitang Private Comment 1 Qiaoling Tang 2019-03-01 03:11:27 UTC Tested in 4.0.0-0.nightly-2019-02-28-054829, elasticsearch-prometheus-rules can be found in prometheus server. Move bug to VERIFIED. Qiaoling Tang 2019-03-01 03:11:43 UTC Status: ON_QA → VERIFIED Private Comment 2 chris alfonso 2019-03-12 14:02:41 UTC RED HAT CONFIDENTIAL Moved Target Release from 4.0.0 to 4.1.0. Target Release: 4.0.0 → 4.1.0 chris alfonso 2019-03-12 14:27:03 UTC Version: 4.0.0 → 4.1 Private Comment 3 Qiaoling Tang 2019-03-26 08:16:05 UTC Reopen this bug because this issue can be reproduced in 4.0.0-0.nightly-2019-03-25-180911, CLO and EO images are: quay.io/openshift/origin-cluster-logging-operator@sha256:2193bbb23eba530cd76574e0c25f9b3a7f966a7f10f7eb0739f465644614df48 , quay.io/openshift/origin-elasticsearch-operator@sha256:8e7a748802fc284162f5dadf7cdfd9ae2adfb72b6f9682d3b91ee87024aa0a76. Logs in prometheus-k8s pod: level=error ts=2019-03-26T05:49:04.590Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\"" level=error ts=2019-03-26T05:49:04.591Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\"" level=error ts=2019-03-26T05:49:04.597Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\"" $ oc get sa -n openshift-logging NAME SECRETS AGE builder 2 6h53m cluster-logging-operator 2 6h52m curator 2 119m default 2 6h53m deployer 2 6h53m elasticsearch 2 119m eventrouter 2 40m kibana 2 119m logcollector 2 119m $ oc get servicemonitor NAME AGE monitor-elasticsearch-cluster 120m Status: VERIFIED → ASSIGNED Verified: FailedQA Private Comment 4 Josef Karasek 2019-03-26 08:21:59 UTC Can you provide output of: oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy Private Comment 5 Qiaoling Tang 2019-03-26 08:22 UTC Created attachment 1547943 [details] Service Discovery page The configurations are in prometheus server, but check targets in prometheus console, all the target labels are dropped, see the attachment. - job_name: openshift-logging/monitor-elasticsearch-cluster/0 scrape_interval: 30s scrape_timeout: 10s metrics_path: /_prometheus/metrics scheme: https kubernetes_sd_configs: - role: endpoints namespaces: names: - openshift-logging bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt server_name: elasticsearch-metrics.openshift-logging.svc insecure_skip_verify: false relabel_configs: - source_labels: [__meta_kubernetes_service_label_cluster_name] separator: ; regex: elasticsearch replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: elasticsearch-metrics replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch] separator: ; regex: (.+) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: elasticsearch-metrics action: replace Private Comment 6 Qiaoling Tang 2019-03-26 08:24:00 UTC $ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219560" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics uid: 7beae764-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-metrics subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring $ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219565" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-proxy subjects: - kind: ServiceAccount name: elasticsearch namespace: openshift-logging Private Comment 7 Qiaoling Tang 2019-03-26 08:26:07 UTC Sorry, missed some info in my last comment: [qitang@wlc-trust-182 aws]$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219559" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-metrics uid: 7be9f5bd-4f8e-11e9-a66f-065965d80050 rules: - apiGroups: - "" resources: - pods - services - endpoints verbs: - list - watch - nonResourceURLs: - /metrics verbs: - get [qitang@wlc-trust-182 aws]$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219560" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics uid: 7beae764-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-metrics subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring $ oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219562" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-proxy uid: 7bebd503-4f8e-11e9-a66f-065965d80050 rules: - apiGroups: - authentication.k8s.io resources: - tokenreviews verbs: - create - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create $ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219565" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-proxy subjects: - kind: ServiceAccount name: elasticsearch namespace: openshift-logging Private Comment 8 Josef Karasek 2019-03-26 09:14:42 UTC Eric, the test ran with the latest EO code. I found the following problems: 1) secret `elasticsearch` is in ns openshift-logging, but EO runs in openshift-operators 2) EO misconfigures svc/elasticsearch-metrics. Please revert to my original implementation of servicemonitor and metrics svc 3) ES cluster is in red state and EO log is full of: time="2019-03-26T09:09:59Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-clientdatamaster-0-2: red / green" $ oc --kubeconfig=./kubeconfig -n openshift-logging get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-799f97f47-66jfh 1/1 Running 0 7h23m elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh 2/2 Running 0 77m elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds 2/2 Running 0 77m eventrouter-649fc6b98b-wckpz 1/1 Running 0 71m fluentd-cbvjd 1/1 Running 0 68m fluentd-r6w4d 1/1 Running 0 68m fluentd-thh9q 1/1 Running 0 68m fluentd-v2pbm 1/1 Running 0 68m fluentd-xgbmg 1/1 Running 0 68m fluentd-zrgmk 1/1 Running 0 68m kibana-568746d6d9-4d48d $ oc --kubeconfig=./kubeconfig -n openshift-operators get pod elasticsearch-operator-5b4977987d-wg97j 1/1 Running 0 7h24m $ oc --kubeconfig=./kubeconfig get secret --all-namespaces | grep elasti openshift-logging elasticsearch Opaque 7 153m openshift-logging elasticsearch-dockercfg-hw4qt kubernetes.io/dockercfg 1 153m openshift-logging elasticsearch-metrics kubernetes.io/tls 2 153m openshift-logging elasticsearch-token-pfr8g kubernetes.io/service-account-token 4 153m openshift-logging elasticsearch-token-vc8pq kubernetes.io/service-account-token 4 153m openshift-operators elasticsearch-operator-dockercfg-qjnbj kubernetes.io/dockercfg 1 7h26m openshift-operators elasticsearch-operator-token-l2ztc kubernetes.io/service-account-token 4 7h26m openshift-operators elasticsearch-operator-token-tgr6s kubernetes.io/service-account-token 4 7h26m $ oc --kubeconfig=./kubeconfig -n openshift-operators logs elasticsearch-operator-5b4977987d-wg97j time="2019-03-26T01:21:53Z" level=info msg="Go Version: go1.10.3" time="2019-03-26T01:21:53Z" level=info msg="Go OS/Arch: linux/amd64" time="2019-03-26T01:21:53Z" level=info msg="operator-sdk Version: 0.0.7" time="2019-03-26T01:21:53Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, , 5000000000" E0326 02:49:34.516615 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=52895, ErrCode=NO_ERROR, debug="" E0326 02:50:43.399058 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=815, ErrCode=NO_ERROR, debug="" E0326 04:36:08.932707 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=75571, ErrCode=NO_ERROR, debug="" time="2019-03-26T04:37:27Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Roles and RoleBindings for Elasticsearch cluster: failed to create ClusterRoleBindig elasticsearch-proxy: Post https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings: unexpected EOF" time="2019-03-26T04:49:35Z" level=error msg="Error reading secret elasticsearch: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/secrets/elasticsearch: unexpected EOF" time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T04:52:17Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Services for Elasticsearch cluster: Failure creating service Failed to get elasticsearch-cluster service: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/services/elasticsearch-cluster: unexpected EOF" time="2019-03-26T05:40:03Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:03Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:40:05Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found" time="2019-03-26T05:40:05Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging" time="2019-03-26T05:40:05Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)" time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:46:33Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found" time="2019-03-26T05:46:33Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging" time="2019-03-26T05:46:33Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)" E0326 06:50:48.820723 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=905, ErrCode=NO_ERROR, debug="" time="2019-03-26T08:33:21Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: request declared a Content-Length of 1708 but only wrote 0 bytes" E0326 08:33:21.415632 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=73429, ErrCode=NO_ERROR, debug="" time="2019-03-26T08:35:57Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: Get https://172.30.0.1:443/apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch: unexpected EOF" - apiVersion: logging.openshift.io/v1alpha1 kind: Elasticsearch metadata: creationTimestamp: 2019-03-26T06:14:59Z generation: 1 name: elasticsearch namespace: openshift-logging ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: ClusterLogging name: instance uid: 7ba7a5ba-4f8e-11e9-a058-0a0c0e8a4a2e resourceVersion: "328550" selfLink: /apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 spec: managementState: Managed nodeSpec: image: quay.io/openshift/origin-logging-elasticsearch5:latest resources: limits: cpu: "1" memory: 4Gi requests: cpu: 200m memory: 1Gi nodes: - nodeCount: 2 resources: limits: cpu: "1" memory: 4Gi requests: cpu: 200m memory: 1Gi roles: - client - data - master storage: size: 10Gi storageClassName: gp2 redundancyPolicy: SingleRedundancy status: clusterHealth: red conditions: [] nodes: - deploymentName: elasticsearch-clientdatamaster-0-1 upgradeStatus: scheduledRedeploy: "True" - deploymentName: elasticsearch-clientdatamaster-0-2 upgradeStatus: scheduledRedeploy: "True" pods: client: failed: [] notReady: [] ready: - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds data: failed: [] notReady: [] ready: - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds master: failed: [] notReady: [] ready: - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds shardAllocationEnabled: none kind: List metadata: resourceVersion: "" selfLink: "" Verified in latest image: quay.io/openshift/origin-elasticsearch-operator@sha256:f3e56412389727015e80b01420304f4736c58be20410fe67b9e4e676ba7cfd4a Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |