Hide Forgot
Created attachment 1535793 [details] Elasticsearch-prometheus-rule Description of problem: Deploy logging, check elasticsearch prometheusrules in logging namespace and prometheus server, elasticsearch prometheusrules can be found in logging namespace, $ oc get prometheusrule -n openshift-logging NAME AGE elasticsearch-prometheus-rules 4m15s log into openshift web console, go to "Monitoring", click "Metrics", then try to find the elasticsearch prometheusrules in prometheus-k8s web console, go to "Status"-->"Rules", no elasticsearch prometheusrules in the page. Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-02-17-024922 How reproducible: Always Steps to Reproduce: 1.Deploy logging 2.check prometheusrules in logging namespace 3.check prometheusrules in prometheus server Actual results: The configuration of elasticsearch prometheusrule isn't in prometheus server Expected results: Elasticsearch prometheusrule can be found in prometheus server. Additional info:
Tested in 4.0.0-0.nightly-2019-02-28-054829, elasticsearch-prometheus-rules can be found in prometheus server. Move bug to VERIFIED.
Reopen this bug because this issue can be reproduced in 4.0.0-0.nightly-2019-03-25-180911, CLO and EO images are: quay.io/openshift/origin-cluster-logging-operator@sha256:2193bbb23eba530cd76574e0c25f9b3a7f966a7f10f7eb0739f465644614df48 , quay.io/openshift/origin-elasticsearch-operator@sha256:8e7a748802fc284162f5dadf7cdfd9ae2adfb72b6f9682d3b91ee87024aa0a76. Logs in prometheus-k8s pod: level=error ts=2019-03-26T05:49:04.590Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\"" level=error ts=2019-03-26T05:49:04.591Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\"" level=error ts=2019-03-26T05:49:04.597Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\"" $ oc get sa -n openshift-logging NAME SECRETS AGE builder 2 6h53m cluster-logging-operator 2 6h52m curator 2 119m default 2 6h53m deployer 2 6h53m elasticsearch 2 119m eventrouter 2 40m kibana 2 119m logcollector 2 119m $ oc get servicemonitor NAME AGE monitor-elasticsearch-cluster 120m
Can you provide output of: oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy
Created attachment 1547943 [details] Service Discovery page The configurations are in prometheus server, but check targets in prometheus console, all the target labels are dropped, see the attachment. - job_name: openshift-logging/monitor-elasticsearch-cluster/0 scrape_interval: 30s scrape_timeout: 10s metrics_path: /_prometheus/metrics scheme: https kubernetes_sd_configs: - role: endpoints namespaces: names: - openshift-logging bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt server_name: elasticsearch-metrics.openshift-logging.svc insecure_skip_verify: false relabel_configs: - source_labels: [__meta_kubernetes_service_label_cluster_name] separator: ; regex: elasticsearch replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: elasticsearch-metrics replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - source_labels: [__meta_kubernetes_service_label_monitor_elasticsearch] separator: ; regex: (.+) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: elasticsearch-metrics action: replace
$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219560" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics uid: 7beae764-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-metrics subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring $ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219565" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-proxy subjects: - kind: ServiceAccount name: elasticsearch namespace: openshift-logging
Sorry, missed some info in my last comment: [qitang@wlc-trust-182 aws]$ oc get clusterrole -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219559" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-metrics uid: 7be9f5bd-4f8e-11e9-a66f-065965d80050 rules: - apiGroups: - "" resources: - pods - services - endpoints verbs: - list - watch - nonResourceURLs: - /metrics verbs: - get [qitang@wlc-trust-182 aws]$ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-metrics apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-metrics ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219560" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-metrics uid: 7beae764-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-metrics subjects: - kind: ServiceAccount name: prometheus-k8s namespace: openshift-monitoring $ oc get clusterrole -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219562" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/elasticsearch-proxy uid: 7bebd503-4f8e-11e9-a66f-065965d80050 rules: - apiGroups: - authentication.k8s.io resources: - tokenreviews verbs: - create - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create $ oc get clusterrolebinding -o yaml -n openshift-logging elasticsearch-proxy apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: 2019-03-26T06:14:59Z name: elasticsearch-proxy ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: Elasticsearch name: elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 resourceVersion: "219565" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/elasticsearch-proxy uid: 7bed4d2e-4f8e-11e9-a66f-065965d80050 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: elasticsearch-proxy subjects: - kind: ServiceAccount name: elasticsearch namespace: openshift-logging
Eric, the test ran with the latest EO code. I found the following problems: 1) secret `elasticsearch` is in ns openshift-logging, but EO runs in openshift-operators 2) EO misconfigures svc/elasticsearch-metrics. Please revert to my original implementation of servicemonitor and metrics svc 3) ES cluster is in red state and EO log is full of: time="2019-03-26T09:09:59Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-clientdatamaster-0-2: red / green" $ oc --kubeconfig=./kubeconfig -n openshift-logging get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-799f97f47-66jfh 1/1 Running 0 7h23m elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh 2/2 Running 0 77m elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds 2/2 Running 0 77m eventrouter-649fc6b98b-wckpz 1/1 Running 0 71m fluentd-cbvjd 1/1 Running 0 68m fluentd-r6w4d 1/1 Running 0 68m fluentd-thh9q 1/1 Running 0 68m fluentd-v2pbm 1/1 Running 0 68m fluentd-xgbmg 1/1 Running 0 68m fluentd-zrgmk 1/1 Running 0 68m kibana-568746d6d9-4d48d $ oc --kubeconfig=./kubeconfig -n openshift-operators get pod elasticsearch-operator-5b4977987d-wg97j 1/1 Running 0 7h24m $ oc --kubeconfig=./kubeconfig get secret --all-namespaces | grep elasti openshift-logging elasticsearch Opaque 7 153m openshift-logging elasticsearch-dockercfg-hw4qt kubernetes.io/dockercfg 1 153m openshift-logging elasticsearch-metrics kubernetes.io/tls 2 153m openshift-logging elasticsearch-token-pfr8g kubernetes.io/service-account-token 4 153m openshift-logging elasticsearch-token-vc8pq kubernetes.io/service-account-token 4 153m openshift-operators elasticsearch-operator-dockercfg-qjnbj kubernetes.io/dockercfg 1 7h26m openshift-operators elasticsearch-operator-token-l2ztc kubernetes.io/service-account-token 4 7h26m openshift-operators elasticsearch-operator-token-tgr6s kubernetes.io/service-account-token 4 7h26m $ oc --kubeconfig=./kubeconfig -n openshift-operators logs elasticsearch-operator-5b4977987d-wg97j time="2019-03-26T01:21:53Z" level=info msg="Go Version: go1.10.3" time="2019-03-26T01:21:53Z" level=info msg="Go OS/Arch: linux/amd64" time="2019-03-26T01:21:53Z" level=info msg="operator-sdk Version: 0.0.7" time="2019-03-26T01:21:53Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, , 5000000000" E0326 02:49:34.516615 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=52895, ErrCode=NO_ERROR, debug="" E0326 02:50:43.399058 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=815, ErrCode=NO_ERROR, debug="" E0326 04:36:08.932707 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=75571, ErrCode=NO_ERROR, debug="" time="2019-03-26T04:37:27Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Roles and RoleBindings for Elasticsearch cluster: failed to create ClusterRoleBindig elasticsearch-proxy: Post https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings: unexpected EOF" time="2019-03-26T04:49:35Z" level=error msg="Error reading secret elasticsearch: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/secrets/elasticsearch: unexpected EOF" time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T04:49:35Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T04:52:17Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Services for Elasticsearch cluster: Failure creating service Failed to get elasticsearch-cluster service: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/services/elasticsearch-cluster: unexpected EOF" time="2019-03-26T05:40:03Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:03Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:40:03Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:40:04Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:40:04Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:40:05Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found" time="2019-03-26T05:40:05Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging" time="2019-03-26T05:40:05Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)" time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:46:32Z" level=error msg="Unable to find secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error reading secret elasticsearch: secrets \"elasticsearch\" not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-ca not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-cert not found" time="2019-03-26T05:46:32Z" level=error msg="Error secret key admin-key not found" time="2019-03-26T05:46:33Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: elasticsearches.logging.openshift.io \"elasticsearch\" not found" time="2019-03-26T05:46:33Z" level=info msg="Flushing nodes for cluster elasticsearch in openshift-logging" time="2019-03-26T05:46:33Z" level=error msg="no last known state found for deleted object (openshift-logging/elasticsearch)" E0326 06:50:48.820723 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=905, ErrCode=NO_ERROR, debug="" time="2019-03-26T08:33:21Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: request declared a Content-Length of 1708 but only wrote 0 bytes" E0326 08:33:21.415632 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=73429, ErrCode=NO_ERROR, debug="" time="2019-03-26T08:35:57Z" level=error msg="error syncing key (openshift-logging/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Error: could not update status for Elasticsearch elasticsearch after 0 retries: Get https://172.30.0.1:443/apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch: unexpected EOF" - apiVersion: logging.openshift.io/v1alpha1 kind: Elasticsearch metadata: creationTimestamp: 2019-03-26T06:14:59Z generation: 1 name: elasticsearch namespace: openshift-logging ownerReferences: - apiVersion: logging.openshift.io/v1alpha1 controller: true kind: ClusterLogging name: instance uid: 7ba7a5ba-4f8e-11e9-a058-0a0c0e8a4a2e resourceVersion: "328550" selfLink: /apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/elasticsearches/elasticsearch uid: 7be6cb58-4f8e-11e9-a66f-065965d80050 spec: managementState: Managed nodeSpec: image: quay.io/openshift/origin-logging-elasticsearch5:latest resources: limits: cpu: "1" memory: 4Gi requests: cpu: 200m memory: 1Gi nodes: - nodeCount: 2 resources: limits: cpu: "1" memory: 4Gi requests: cpu: 200m memory: 1Gi roles: - client - data - master storage: size: 10Gi storageClassName: gp2 redundancyPolicy: SingleRedundancy status: clusterHealth: red conditions: [] nodes: - deploymentName: elasticsearch-clientdatamaster-0-1 upgradeStatus: scheduledRedeploy: "True" - deploymentName: elasticsearch-clientdatamaster-0-2 upgradeStatus: scheduledRedeploy: "True" pods: client: failed: [] notReady: [] ready: - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds data: failed: [] notReady: [] ready: - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds master: failed: [] notReady: [] ready: - elasticsearch-clientdatamaster-0-1-6dbcbb64b4-6fdgh - elasticsearch-clientdatamaster-0-2-746d8788d5-2fbds shardAllocationEnabled: none kind: List metadata: resourceVersion: "" selfLink: ""
I'm sorry I reopened the wrong bug, it should be https://bugzilla.redhat.com/show_bug.cgi?id=1662273. Move this back to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758