Bug 1662273
| Summary: | Can't find elasticsearch metrics in prometheus server | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | ||||||
| Component: | Logging | Assignee: | Josef Karasek <jkarasek> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Qiaoling Tang <qitang> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 4.1.0 | CC: | aos-bugs, jcantril, rmeggins, surbania | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.1.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 1683359 (view as bug list) | Environment: | |||||||
| Last Closed: | 2019-06-04 10:41:28 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1683359 | ||||||||
| Attachments: |
|
||||||||
|
Description
Qiaoling Tang
2018-12-27 09:25:25 UTC
Just FYI: I tried to add role "prometheus-k8s" and rolebinding "prometheus-k8s" in openshift-logging project, after adding these objects, there is no error message "system:serviceaccount:openshift-monitoring:prometheus-k8s cannot list endpoints/pods/services in the namespace openshift-logging" in prometheus-k8s pod, but still can't find fluentd and elasticsearch metrics in prometheus server. $ oc get role prometheus-k8s -n openshift-logging -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: creationTimestamp: 2019-01-02T00:59:16Z name: prometheus-k8s namespace: openshift-logging resourceVersion: "19282" selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/roles/prometheus-k8s uid: a101fdfe-0e29-11e9-bac3-0e4166c7666c rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods verbs: - get - list - watch $ oc get rolebindings prometheus-k8s -n openshift-logging -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: creationTimestamp: 2019-01-02T00:59:24Z name: prometheus-k8s namespace: openshift-logging resourceVersion: "19390" selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/rolebindings/prometheus-k8s uid: a5b0df68-0e29-11e9-80bc-0a9697cbec22 roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: prometheus-k8s subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring Can you please post the output of "oc get servicemonitor -n openshift-logging -o yaml"? This may be resolved with the merge of https://github.com/openshift/cluster-logging-operator/pull/65 for fluentd. Elasticsearch also has the correct capath: https://github.com/openshift/elasticsearch-operator/blob/master/pkg/k8shandler/service_monitor.go#L38 @Sergiusz is there more required if these paths are defined correctly? $ oc get servicemonitors -o yaml -n openshift-logging
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
creationTimestamp: 2019-01-02T00:55:00Z
generation: 1
name: fluentd
namespace: openshift-logging
resourceVersion: "18545"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/fluentd
uid: 07e4e6b6-0e29-11e9-bb49-129920ef04ca
spec:
endpoints:
- port: "24231"
scheme: https
tlsConfig:
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
selector:
matchLabels:
logging-infra: fluentd
- apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
creationTimestamp: 2019-01-02T00:55:53Z
generation: 1
labels:
cluster-name: elasticsearch
name: monitor-elasticsearch-cluster
namespace: openshift-logging
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 275c7a22-0e29-11e9-bac3-0e4166c7666c
resourceVersion: "18546"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/monitor-elasticsearch-cluster
uid: 2780a011-0e29-11e9-bb49-129920ef04ca
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
path: /_prometheus/metrics
port: restapi
scheme: https
tlsConfig:
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
jobLabel: monitor-elasticsearch
namespaceSelector: {}
selector:
matchLabels:
cluster-name: elasticsearch
kind: List
metadata:
resourceVersion: ""
selfLink: ""
@Jeff and @Sergiusz, I'v updated the caFile to /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I deploying logging. (In reply to Qiaoling Tang from comment #6) > @Jeff and @Sergiusz, I'v updated the caFile to > /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I > deploying logging. Does modifying the path make any difference or do you still see the same error? (In reply to Jeff Cantrill from comment #7) > (In reply to Qiaoling Tang from comment #6) > > @Jeff and @Sergiusz, I'v updated the caFile to > > /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt when I > > deploying logging. > > Does modifying the path make any difference or do you still see the same > error? After changing the path, I can't see any error message in prometheus-k8s pod logs. Here is the log in prometheus-k8s before and after modifying the path of caFile in servicemonitor:
level=error ts=2019-01-03T08:52:19.21603214Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list services in the namespace \"openshift-logging\": no RBAC policy matched"
level=error ts=2019-01-03T08:52:22.5496207Z caller=scrape.go:148 component="scrape manager" scrape_pool=openshift-logging/monitor-elasticsearch-cluster/0 msg="Error creating HTTP client" err="unable to use specified CA cert /etc/prometheus/configmaps/prometheus-serving-certs-ca-bundle/service-ca.crt: open /etc/prometheus/configmaps/prometheus-serving-certs-ca-bundle/service-ca.crt: no such file or directory"
level=info ts=2019-01-03T08:54:08.815723419Z caller=main.go:632 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-01-03T08:54:09.138503615Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-01-03T08:54:09.140177884Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-01-03T08:54:09.141700801Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
And after changing the path, I can find these configurations in /etc/prometheus/config_out/prometheus.env.yaml in prometheus pod:
- job_name: openshift-logging/fluentd/0
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- openshift-logging
tls_config:
ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
insecure_skip_verify: false
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_logging_infra]
separator: ;
regex: fluentd
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: "24231"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: "24231"
action: replace
- job_name: openshift-logging/monitor-elasticsearch-cluster/0
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /_prometheus/metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- openshift-logging
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
insecure_skip_verify: false
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_cluster_name]
separator: ;
regex: elasticsearch
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: restapi
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: restapi
action: replace
Updated roles and bindings in: https://github.com/openshift/cluster-logging-operator/pull/81 Still can't find fluentd and elasticsearch metrics, no error message in pod logs. Created attachment 1520465 [details]
Prometheus web console
There are some "Discovered Labels" in prometheus web console, but none of them is in the "Target Labels".
Created attachment 1520474 [details]
Full screenshot of prometheus "Service Discovery" page
@Sergiusz: services for elasticsearch and servicemonitors are created by elasticsearch operator, I have a PR, where the svc labels get matched by the label selector and prometheus finally scrapes metrics. @Qiaoling: pls wait till https://github.com/openshift/elasticsearch-operator/pull/76 merges. ES metrics should be in prom then. Verified in quay.io/openshift/origin-elasticsearch-operator@sha256:591e253eef19b18e404d165df8db99908a8fe52d09aa5dcba7d0c835d3487f54 Reopen this bug because this issue can be reproduced in 4.0.0-0.nightly-2019-03-25-180911, CLO and EO images are: quay.io/openshift/origin-cluster-logging-operator@sha256:2193bbb23eba530cd76574e0c25f9b3a7f966a7f10f7eb0739f465644614df48 , quay.io/openshift/origin-elasticsearch-operator@sha256:8e7a748802fc284162f5dadf7cdfd9ae2adfb72b6f9682d3b91ee87024aa0a76.
Logs in prometheus-k8s pod:
level=error ts=2019-03-26T05:49:04.590Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.591Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.597Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""
$ oc get sa -n openshift-logging
NAME SECRETS AGE
builder 2 6h53m
cluster-logging-operator 2 6h52m
curator 2 119m
default 2 6h53m
deployer 2 6h53m
elasticsearch 2 119m
eventrouter 2 40m
kibana 2 119m
logcollector 2 119m
$ oc get servicemonitor
NAME AGE
monitor-elasticsearch-cluster 120m
The configurations are in prometheus server, but check targets in prometheus console, all the target labels are dropped, see the attachment.
- job_name: openshift-logging/monitor-elasticsearch-cluster/0
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /_prometheus/metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- openshift-logging
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
server_name: elasticsearch-metrics.openshift-logging.svc
insecure_skip_verify: false
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_cluster_name]
separator: ;
regex: elasticsearch
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: elasticsearch-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: elasticsearch-metrics
action: replace
[qitang@wlc-trust-182 aws]$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-metrics
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219559"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-metrics
uid: 7be9f5bd-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
- ""
resources:
- pods
- services
- endpoints
verbs:
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
[qitang@wlc-trust-182 aws]$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-metrics
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219560"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics
uid: 7beae764-4f8e-11e9-a66f-065965d80050
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: elasticsearch-metrics
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-proxy
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219562"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-proxy
uid: 7bebd503-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-proxy
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219565"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy
uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: elasticsearch-proxy
subjects:
- kind: ServiceAccount
name: elasticsearch
namespace: openshift-logging
Private Description Qiaoling Tang 2019-02-18 02:57:59 UTC
Created attachment 1535793 [details]
Elasticsearch-prometheus-rule
Description of problem:
Deploy logging, check elasticsearch prometheusrules in logging namespace and prometheus server,
elasticsearch prometheusrules can be found in logging namespace,
$ oc get prometheusrule -n openshift-logging
NAME AGE
elasticsearch-prometheus-rules 4m15s
log into openshift web console, go to "Monitoring", click "Metrics", then try to find the elasticsearch prometheusrules in prometheus-k8s web console, go to "Status"-->"Rules", no elasticsearch prometheusrules in the page.
Version-Release number of selected component (if applicable):
4.0.0-0.nightly-2019-02-17-024922
How reproducible:
Always
Steps to Reproduce:
1.Deploy logging
2.check prometheusrules in logging namespace
3.check prometheusrules in prometheus server
Actual results:
The configuration of elasticsearch prometheusrule isn't in prometheus server
Expected results:
Elasticsearch prometheusrule can be found in prometheus server.
Additional info:
Qiaoling Tang 2019-02-18 02:58:31 UTC
Target Release: --- → 4.0.0
Jeff Cantrill 2019-02-26 15:40:22 UTC
Status: NEW → ASSIGNED
Assignee: jcantril → jkarasek
Doc Type: If docs needed, set a value → No Doc Update
Red Hat Bugzilla 2019-02-26 15:40:22 UTC
Flags: requires_doc_text-
Doc Type: No Doc Update → No Doc Update
RHEL Product and Program Management 2019-02-26 15:40:28 UTC
Flags: pm_ack+
Rule Engine Rule: OSE-pm-ack
Flags: devel_ack+
Rule Engine Rule: OSE-devel-ack
Flags: qa_ack+
Rule Engine Rule: OSE-qa-ack
Josef Karasek 2019-02-28 11:11:45 UTC
Status: ASSIGNED → POST
External Bug ID: Github openshift/cluster-monitoring-operator/pull/262 Github openshift/elasticsearch-operator/pull/7...
Josef Karasek 2019-02-28 16:22:04 UTC
Status: POST → ON_QA
Anping Li 2019-03-01 02:58:03 UTC
QA Contact: anli → qitang
Private Comment 1 Qiaoling Tang 2019-03-01 03:11:27 UTC
Tested in 4.0.0-0.nightly-2019-02-28-054829, elasticsearch-prometheus-rules can be found in prometheus server.
Move bug to VERIFIED.
Qiaoling Tang 2019-03-01 03:11:43 UTC
Status: ON_QA → VERIFIED
Private Comment 2 chris alfonso 2019-03-12 14:02:41 UTC
RED HAT CONFIDENTIAL
Moved Target Release from 4.0.0 to 4.1.0.
Target Release: 4.0.0 → 4.1.0
chris alfonso 2019-03-12 14:27:03 UTC
Version: 4.0.0 → 4.1
Private Comment 3 Qiaoling Tang 2019-03-26 08:16:05 UTC
Reopen this bug because this issue can be reproduced in 4.0.0-0.nightly-2019-03-25-180911, CLO and EO images are: quay.io/openshift/origin-cluster-logging-operator@sha256:2193bbb23eba530cd76574e0c25f9b3a7f966a7f10f7eb0739f465644614df48 , quay.io/openshift/origin-elasticsearch-operator@sha256:8e7a748802fc284162f5dadf7cdfd9ae2adfb72b6f9682d3b91ee87024aa0a76.
Logs in prometheus-k8s pod:
level=error ts=2019-03-26T05:49:04.590Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.591Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
level=error ts=2019-03-26T05:49:04.597Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""
$ oc get sa -n openshift-logging
NAME SECRETS AGE
builder 2 6h53m
cluster-logging-operator 2 6h52m
curator 2 119m
default 2 6h53m
deployer 2 6h53m
elasticsearch 2 119m
eventrouter 2 40m
kibana 2 119m
logcollector 2 119m
$ oc get servicemonitor
NAME AGE
monitor-elasticsearch-cluster 120m
Status: VERIFIED → ASSIGNED
Verified: FailedQA
Private Comment 4 Josef Karasek 2019-03-26 08:21:59 UTC
Can you provide output of:
oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics
oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics
oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy
oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
Private Comment 5 Qiaoling Tang 2019-03-26 08:22 UTC
Created attachment 1547943 [details]
Service Discovery page
The configurations are in prometheus server, but check targets in prometheus console, all the target labels are dropped, see the attachment.
- job_name: openshift-logging/monitor-elasticsearch-cluster/0
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /_prometheus/metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- openshift-logging
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
server_name: elasticsearch-metrics.openshift-logging.svc
insecure_skip_verify: false
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_cluster_name]
separator: ;
regex: elasticsearch
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: elasticsearch-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: elasticsearch-metrics
action: replace
Private Comment 6 Qiaoling Tang 2019-03-26 08:24:00 UTC
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-metrics
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219560"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics
uid: 7beae764-4f8e-11e9-a66f-065965d80050
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: elasticsearch-metrics
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-proxy
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219565"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy
uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: elasticsearch-proxy
subjects:
- kind: ServiceAccount
name: elasticsearch
namespace: openshift-logging
Private Comment 7 Qiaoling Tang 2019-03-26 08:26:07 UTC
Sorry, missed some info in my last comment:
[qitang@wlc-trust-182 aws]$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-metrics
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219559"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-metrics
uid: 7be9f5bd-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
- ""
resources:
- pods
- services
- endpoints
verbs:
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
[qitang@wlc-trust-182 aws]$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-metrics
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219560"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics
uid: 7beae764-4f8e-11e9-a66f-065965d80050
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: elasticsearch-metrics
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-proxy
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219562"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-proxy
uid: 7bebd503-4f8e-11e9-a66f-065965d80050
rules:
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
name: elasticsearch-proxy
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: Elasticsearch
name: elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
resourceVersion: "219565"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy
uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: elasticsearch-proxy
subjects:
- kind: ServiceAccount
name: elasticsearch
namespace: openshift-logging
Private Comment 8 Josef Karasek 2019-03-26 09:14:42 UTC
Eric, the test ran with the latest EO code.
I found the following problems:
1) secret `elasticsearch` is in ns openshift-logging, but EO runs in openshift-operators
2) EO misconfigures svc/elasticsearch-metrics. Please revert to my original implementation of servicemonitor and metrics svc
3) ES cluster is in red state and EO log is full of:
time="2019-03-26T09:09:59Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-clientdatamaster-0-2: red / green"
$ oc --kubeconfig=./kubeconfig -n openshift-logging get pod
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-799f97f47-66jfh 1/1 Running 0 7h23m
elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh 2/2 Running 0 77m
elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds 2/2 Running 0 77m
eventrouter-649fc6b98b-wckpz 1/1 Running 0 71m
fluentd-cbvjd 1/1 Running 0 68m
fluentd-r6w4d 1/1 Running 0 68m
fluentd-thh9q 1/1 Running 0 68m
fluentd-v2pbm 1/1 Running 0 68m
fluentd-xgbmg 1/1 Running 0 68m
fluentd-zrgmk 1/1 Running 0 68m
kibana-568746d6d9-4d48d
$ oc --kubeconfig=./kubeconfig -n openshift-operators get pod
elasticsearch-operator-5b4977987d-wg97j 1/1 Running 0 7h24m
$ oc --kubeconfig=./kubeconfig get secret --all-namespaces | grep elasti
openshift-logging elasticsearch Opaque 7 153m
openshift-logging elasticsearch-dockercfg-hw4qt kubernetes.io/dockercfg 1 153m
openshift-logging elasticsearch-metrics kubernetes.io/tls 2 153m
openshift-logging elasticsearch-token-pfr8g kubernetes.io/service-account-token 4 153m
openshift-logging elasticsearch-token-vc8pq kubernetes.io/service-account-token 4 153m
openshift-operators elasticsearch-operator-dockercfg-qjnbj kubernetes.io/dockercfg 1 7h26m
openshift-operators elasticsearch-operator-token-l2ztc kubernetes.io/service-account-token 4 7h26m
openshift-operators elasticsearch-operator-token-tgr6s kubernetes.io/service-account-token 4 7h26m
$ oc --kubeconfig=./kubeconfig -n openshift-operators logs elasticsearch-operator-5b4977987d-wg97j
time="2019-03-26T01:21:53Z" level=info msg="Go Version: go1.10.3"
time="2019-03-26T01:21:53Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-03-26T01:21:53Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-03-26T01:21:53Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, , 5000000000"
E0326 02:49:34.516615 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=52895, ErrCode=NO_ERROR, debug=""
E0326 02:50:43.399058 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=815, ErrCode=NO_ERROR, debug=""
E0326 04:36:08.932707 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=75571, ErrCode=NO_ERROR, debug=""
time="2019-03-26T04:37:27Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Roles and RoleBindings for Elasticsearch cluster: failed to create ClusterRoleBindig elasticsearch-proxy: Post https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings: unexpected EOF"
time="2019-03-26T04:49:35Z" level=error msg="Error reading secret elasticsearch: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/secrets/elasticsearch: unexpected EOF"
time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T04:52:17Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Services for Elasticsearch cluster: Failure creating service Failed to get elasticsearch-cluster service: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/services/elasticsearch-cluster: unexpected EOF"
time="2019-03-26T05:40:03Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:03Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:40:05Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found"
time="2019-03-26T05:40:05Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging"
time="2019-03-26T05:40:05Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)"
time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found"
time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found"
time="2019-03-26T05:46:33Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found"
time="2019-03-26T05:46:33Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging"
time="2019-03-26T05:46:33Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)"
E0326 06:50:48.820723 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=905, ErrCode=NO_ERROR, debug=""
time="2019-03-26T08:33:21Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: request declared a Content-Length of 1708 but only wrote 0 bytes"
E0326 08:33:21.415632 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=73429, ErrCode=NO_ERROR, debug=""
time="2019-03-26T08:35:57Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: Get https://172.30.0.1:443/apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch: unexpected EOF"
- apiVersion: logging.openshift.io/v1alpha1
kind: Elasticsearch
metadata:
creationTimestamp: 2019-03-26T06:14:59Z
generation: 1
name: elasticsearch
namespace: openshift-logging
ownerReferences:
- apiVersion: logging.openshift.io/v1alpha1
controller: true
kind: ClusterLogging
name: instance
uid: 7ba7a5ba-4f8e-11e9-a058-0a0c0e8a4a2e
resourceVersion: "328550"
selfLink: /apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch
uid: 7be6cb58-4f8e-11e9-a66f-065965d80050
spec:
managementState: Managed
nodeSpec:
image: quay.io/openshift/origin-logging-elasticsearch5:latest
resources:
limits:
cpu: "1"
memory: 4Gi
requests:
cpu: 200m
memory: 1Gi
nodes:
- nodeCount: 2
resources:
limits:
cpu: "1"
memory: 4Gi
requests:
cpu: 200m
memory: 1Gi
roles:
- client
- data
- master
storage:
size: 10Gi
storageClassName: gp2
redundancyPolicy: SingleRedundancy
status:
clusterHealth: red
conditions: []
nodes:
- deploymentName: elasticsearch-clientdatamaster-0-1
upgradeStatus:
scheduledRedeploy: "True"
- deploymentName: elasticsearch-clientdatamaster-0-2
upgradeStatus:
scheduledRedeploy: "True"
pods:
client:
failed: []
notReady: []
ready:
- elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh
- elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds
data:
failed: []
notReady: []
ready:
- elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh
- elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds
master:
failed: []
notReady: []
ready:
- elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh
- elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds
shardAllocationEnabled: none
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Verified in latest image: quay.io/openshift/origin-elasticsearch-operator@sha256:f3e56412389727015e80b01420304f4736c58be20410fe67b9e4e676ba7cfd4a Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |