Description of problem: Deploy clusterlogging, set node selector for the EFK pods: $ oc get clusterlogging -oyaml apiVersion: v1 items: - apiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: creationTimestamp: "2020-05-07T02:20:20Z" generation: 1 name: instance namespace: openshift-logging resourceVersion: "77871" selfLink: /apis/logging.openshift.io/v1/namespaces/openshift-logging/clusterloggings/instance uid: 2b90879f-6bbf-46f2-9d0c-d3135405af54 spec: collection: logs: fluentd: nodeSelector: logging: test type: fluentd logStore: elasticsearch: nodeCount: 3 nodeSelector: logging: test redundancyPolicy: SingleRedundancy resources: requests: memory: 2Gi storage: size: 20Gi storageClassName: standard retentionPolicy: application: maxAge: 1d audit: maxAge: 1w infra: maxAge: 7d type: elasticsearch managementState: Managed visualization: kibana: nodeSelector: logging: test replicas: 1 type: kibana No nodes in the cluster have the label `logging=test`, then all the ES pods are pending due to node selector mismatch. $ oc get elasticsearch -oyaml apiVersion: v1 items: - apiVersion: logging.openshift.io/v1 kind: Elasticsearch metadata: annotations: elasticsearch.openshift.io/loglevel: trace creationTimestamp: "2020-05-07T02:20:27Z" generation: 3 ...... managementState: Managed nodeSpec: nodeSelector: logging: test resources: requests: memory: 2Gi nodes: - genUUID: wpykay58 nodeCount: 3 resources: {} roles: - client - data - master storage: size: 20Gi storageClassName: standard redundancyPolicy: SingleRedundancy $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-75774d56b6-47x6c 1/1 Running 0 3m19s elasticsearch-cdm-wpykay58-1-dfc8977c5-mhzwh 0/2 Pending 0 3m1s elasticsearch-cdm-wpykay58-2-5f4c9fdb5d-n8hsk 0/2 Pending 0 2m elasticsearch-cdm-wpykay58-3-56985bc445-m4dxg 0/2 Pending 0 59s kibana-797d5b7f99-mwmtg 2/2 Running 0 3m1s $ oc get deploy NAME READY UP-TO-DATE AVAILABLE AGE cluster-logging-operator 1/1 1 1 3m24s elasticsearch-cdm-wpykay58-1 0/1 1 0 3m6s elasticsearch-cdm-wpykay58-2 0/1 1 0 2m5s elasticsearch-cdm-wpykay58-3 0/1 1 0 64s kibana 1/1 1 1 3m6s $ oc get deploy -l cluster-name=elasticsearch -oyaml |grep -A 5 nodeSelector f:nodeSelector: .: {} f:kubernetes.io/os: {} f:logging: {} f:restartPolicy: {} f:schedulerName: {} -- nodeSelector: kubernetes.io/os: linux logging: test restartPolicy: Always schedulerName: default-scheduler securityContext: {} -- f:nodeSelector: .: {} f:kubernetes.io/os: {} f:logging: {} f:restartPolicy: {} f:schedulerName: {} -- nodeSelector: kubernetes.io/os: linux logging: test restartPolicy: Always schedulerName: default-scheduler securityContext: {} -- f:nodeSelector: .: {} f:kubernetes.io/os: {} f:logging: {} f:restartPolicy: {} f:schedulerName: {} -- nodeSelector: kubernetes.io/os: linux logging: test restartPolicy: Always schedulerName: default-scheduler securityContext: {} However, a few minutes later, the EO removes all the nodeSelector, and the ES pods are redeployed without node selector. $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-75774d56b6-47x6c 1/1 Running 0 15m elasticsearch-cdm-wpykay58-1-779dd794ff-d5qmg 2/2 Running 0 8m35s elasticsearch-cdm-wpykay58-2-5f4f5884fd-xchxg 2/2 Running 0 8m34s elasticsearch-cdm-wpykay58-3-5f586cd8d9-tpxvf 2/2 Running 0 8m33s elasticsearch-delete-app-1588818600-vg7cq 0/1 Pending 0 5m35s elasticsearch-delete-audit-1588818600-s4bqj 0/1 Pending 0 5m35s elasticsearch-delete-infra-1588818600-7782l 0/1 Pending 0 5m35s elasticsearch-rollover-app-1588818600-5dnxh 0/1 Pending 0 5m35s elasticsearch-rollover-audit-1588818600-dkws4 0/1 Pending 0 5m35s elasticsearch-rollover-infra-1588818600-5dx99 0/1 Pending 0 5m35s kibana-797d5b7f99-mwmtg 2/2 Running 0 15m The EO removes the node selectors in the es deployment including the default nodeSelector `kubernetes.io/os: linux`, but the node selectors in the elasticsearch instance are not removed. Logs in the EO: $ oc logs -n openshift-operators-redhat elasticsearch-operator-f997486f5-z6wkp {"level":"info","ts":1588818013.5684557,"logger":"cmd","msg":"Go Version: go1.13.8"} {"level":"info","ts":1588818013.568499,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1588818013.568506,"logger":"cmd","msg":"Version of operator-sdk: v0.8.2"} {"level":"info","ts":1588818013.5690854,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1588818013.8786685,"logger":"leader","msg":"No pre-existing lock was found."} {"level":"info","ts":1588818013.8938189,"logger":"leader","msg":"Became the leader."} {"level":"info","ts":1588818014.020002,"logger":"cmd","msg":"Registering Components."} {"level":"info","ts":1588818014.0205996,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"kibana-controller","source":"kind source: /, Kind="} {"level":"info","ts":1588818014.020811,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="} {"level":"info","ts":1588818014.021117,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"proxyconfig-controller","source":"kind source: /, Kind="} {"level":"info","ts":1588818014.0212927,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"kibanasecret-controller","source":"kind source: /, Kind="} {"level":"info","ts":1588818014.0215733,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"trustedcabundle-controller","source":"kind source: /, Kind="} {"level":"info","ts":1588818014.1484535,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"elasticsearch-operator","Service.Namespace":"openshift-operators-redhat"} {"level":"info","ts":1588818014.1484966,"logger":"cmd","msg":"Starting the Cmd."} {"level":"info","ts":1588818015.2488885,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibana-controller"} {"level":"info","ts":1588818015.2489219,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"proxyconfig-controller"} {"level":"info","ts":1588818015.2488775,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"} {"level":"info","ts":1588818015.248872,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"trustedcabundle-controller"} {"level":"info","ts":1588818015.2489371,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibanasecret-controller"} {"level":"info","ts":1588818015.3491592,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"proxyconfig-controller","worker count":1} {"level":"info","ts":1588818015.349234,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibana-controller","worker count":1} {"level":"info","ts":1588818015.3492274,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibanasecret-controller","worker count":1} {"level":"info","ts":1588818015.3492675,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"trustedcabundle-controller","worker count":1} {"level":"info","ts":1588818015.3491406,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1} time="2020-05-07T02:20:27Z" level=error msg="Operator unable to read local file to get contents: open /tmp/ocp-eo/ca.crt: no such file or directory" time="2020-05-07T02:20:27Z" level=error msg="Operator unable to read local file to get contents: open /tmp/ocp-eo/ca.crt: no such file or directory" {"level":"error","ts":1588818028.1739502,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"kibana-controller","request":"openshift-logging/instance","error":"Did not receive hashvalue for trusted CA value","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} time="2020-05-07T02:20:28Z" level=info msg="Updating status of Kibana" time="2020-05-07T02:20:29Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:20:29Z" level=info msg="Updating status of Kibana" time="2020-05-07T02:20:29Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:20:29Z" level=info msg="Updating status of Kibana" time="2020-05-07T02:20:29Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:20:59Z" level=info msg="Updating status of Kibana" time="2020-05-07T02:20:59Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:20:59Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:21:29Z" level=warning msg="unable to get cluster node count. E: Get https://elasticsearch.openshift-logging.svc:9200/_cluster/health: dial tcp 172.30.136.171:9200: i/o timeout\r\n" time="2020-05-07T02:21:29Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:21:59Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:22:29Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:22:30Z" level=warning msg="unable to get cluster node count. E: Get https://elasticsearch.openshift-logging.svc:9200/_cluster/health: dial tcp 172.30.136.171:9200: i/o timeout\r\n" time="2020-05-07T02:22:59Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:23:30Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:23:31Z" level=warning msg="unable to get cluster node count. E: Get https://elasticsearch.openshift-logging.svc:9200/_cluster/health: dial tcp 172.30.136.171:9200: i/o timeout\r\n" time="2020-05-07T02:24:00Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:24:30Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:25:00Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:25:30Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:26:00Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:26:30Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:26:31Z" level=warning msg="Unable to list existing templates in order to reconcile stale ones: Get https://elasticsearch.openshift-logging.svc:9200/_template: dial tcp 172.30.136.171:9200: i/o timeout" time="2020-05-07T02:27:00Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:27:01Z" level=error msg="Error creating index template for mapping app: Put https://elasticsearch.openshift-logging.svc:9200/_template/ocp-gen-app: dial tcp 172.30.136.171:9200: i/o timeout" {"level":"error","ts":1588818421.5804315,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile IndexMangement for Elasticsearch cluster: Put https://elasticsearch.openshift-logging.svc:9200/_template/ocp-gen-app: dial tcp 172.30.136.171:9200: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} time="2020-05-07T02:27:03Z" level=info msg="Requested to update node 'elasticsearch-cdm-wpykay58-1', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" time="2020-05-07T02:27:04Z" level=info msg="Requested to update node 'elasticsearch-cdm-wpykay58-2', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" time="2020-05-07T02:27:05Z" level=info msg="Requested to update node 'elasticsearch-cdm-wpykay58-3', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" time="2020-05-07T02:27:31Z" level=info msg="Kibana status successfully updated" time="2020-05-07T02:27:36Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-wpykay58-1: / green" time="2020-05-07T02:27:36Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-wpykay58-1: Cluster not in green state before beginning upgrade: " time="2020-05-07T02:27:43Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-wpykay58-2: / green" time="2020-05-07T02:27:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-wpykay58-2: Cluster not in green state before beginning upgrade: " time="2020-05-07T02:27:43Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-wpykay58-3: / green" time="2020-05-07T02:27:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-wpykay58-3: Cluster not in green state before beginning upgrade: " time="2020-05-07T02:27:44Z" level=warning msg="Unable to list existing templates in order to reconcile stale ones: There was an error retrieving list of templates. Error code: true, map[results:Open Distro not initialized]" time="2020-05-07T02:27:44Z" level=error msg="Error creating index template for mapping app: There was an error creating index template ocp-gen-app. Error code: true, map[results:Open Distro not initialized]" {"level":"error","ts":1588818464.3757575,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile IndexMangement for Elasticsearch cluster: There was an error creating index template ocp-gen-app. Error code: true, map[results:Open Distro not initialized]","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} time="2020-05-07T02:27:47Z" level=warning msg="Unable to evaluate the number of replicas for index \"results\": Open Distro not initialized. cluster: elasticsearch, namespace: openshift-logging " time="2020-05-07T02:27:47Z" level=error msg="Unable to evaluate number of replicas for index" time="2020-05-07T02:27:47Z" level=warning msg="Unable to list existing templates in order to reconcile stale ones: There was an error retrieving list of templates. Error code: true, map[results:Open Distro not initialized]" time="2020-05-07T02:27:47Z" level=error msg="Error creating index template for mapping app: There was an error creating index template ocp-gen-app. Error code: true, map[results:Open Distro not initialized]" {"level":"error","ts":1588818467.5358996,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile IndexMangement for Elasticsearch cluster: There was an error creating index template ocp-gen-app. Error code: true, map[results:Open Distro not initialized]","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} time="2020-05-07T02:27:50Z" level=warning msg="Unable to evaluate the number of replicas for index \"results\": Open Distro not initialized. cluster: elasticsearch, namespace: openshift-logging " time="2020-05-07T02:27:50Z" level=error msg="Unable to evaluate number of replicas for index" time="2020-05-07T02:27:50Z" level=warning msg="Unable to list existing templates in order to reconcile stale ones: There was an error retrieving list of templates. Error code: true, map[results:Open Distro not initialized]" time="2020-05-07T02:27:50Z" level=error msg="Error creating index template for mapping app: There was an error creating index template ocp-gen-app. Error code: true, map[results:Open Distro not initialized]" {"level":"error","ts":1588818470.975918,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile IndexMangement for Elasticsearch cluster: There was an error creating index template ocp-gen-app. Error code: true, map[results:Open Distro not initialized]","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} Version-Release number of selected component (if applicable): Logging images are from 4.5.0-0.ci-2020-05-06-225918 Manifests are copied from the master branch Cluster version: 4.5.0-0.nightly-2020-05-06-003431 How reproducible: Always Steps to Reproduce: 1. deploy logging with: apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" retentionPolicy: application: maxAge: 1d infra: maxAge: 7d audit: maxAge: 1w elasticsearch: nodeCount: 3 nodeSelector: logging: test redundancyPolicy: "SingleRedundancy" resources: requests: memory: "2Gi" storage: storageClassName: "standard" size: "20Gi" visualization: type: "kibana" kibana: nodeSelector: logging: test replicas: 1 collection: logs: type: "fluentd" fluentd: nodeSelector: logging: test note: no nodes have label `logging=test` in the cluster 2. check ES status 3. wait for a few minutes 4. check the ES pods Actual results: The EO removes the node selector configurations in the ES deployment Expected results: The EO should not remove the node selectors if there has node selector in the clusterlogging instance. Additional info:
Seems the EO always want to update the ES cluster when it can't connect to the ES cluster even when the ES cluster health is green. {"level":"info","ts":1589875076.724568,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibana-controller","worker count":1} {"level":"info","ts":1589875076.7255478,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"proxyconfig-controller","worker count":1} {"level":"info","ts":1589875076.7256155,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibanasecret-controller","worker count":1} time="2020-05-19T07:57:57Z" level=info msg="Updating status of Kibana" time="2020-05-19T07:57:57Z" level=info msg="Kibana status successfully updated" time="2020-05-19T07:57:57Z" level=info msg="Updating status of Kibana" time="2020-05-19T07:57:57Z" level=info msg="Updating status of Kibana" time="2020-05-19T07:57:57Z" level=info msg="Kibana status successfully updated" time="2020-05-19T07:57:57Z" level=info msg="Kibana status successfully updated" time="2020-05-19T07:57:57Z" level=info msg="Kibana status successfully updated" time="2020-05-19T07:58:01Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-vyhvbuyr-1: yellow / green" time="2020-05-19T07:58:01Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-vyhvbuyr-1: Cluster not in green state before beginning upgrade: yellow" time="2020-05-19T07:58:01Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-vyhvbuyr-2: yellow / green" time="2020-05-19T07:58:01Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-vyhvbuyr-2: Cluster not in green state before beginning upgrade: yellow" time="2020-05-19T07:58:01Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-vyhvbuyr-3: yellow / green" time="2020-05-19T07:58:01Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-vyhvbuyr-3: Cluster not in green state before beginning upgrade: yellow" time="2020-05-19T07:58:05Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart" time="2020-05-19T07:58:27Z" level=info msg="Kibana status successfully updated" time="2020-05-19T07:58:35Z" level=info msg="Timed out waiting for node elasticsearch-cdm-vyhvbuyr-1 to rollout" time="2020-05-19T07:58:35Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-vyhvbuyr-1: timed out waiting for the condition" time="2020-05-19T07:58:35Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart" time="2020-05-19T07:58:57Z" level=info msg="Kibana status successfully updated" time="2020-05-19T07:59:05Z" level=info msg="Timed out waiting for node elasticsearch-cdm-vyhvbuyr-2 to rollout" time="2020-05-19T07:59:05Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-vyhvbuyr-2: timed out waiting for the condition" time="2020-05-19T07:59:06Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart" time="2020-05-19T07:59:27Z" level=info msg="Kibana status successfully updated" time="2020-05-19T07:59:36Z" level=info msg="Timed out waiting for node elasticsearch-cdm-vyhvbuyr-3 to rollout" time="2020-05-19T07:59:36Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-vyhvbuyr-3: timed out waiting for the condition" time="2020-05-19T07:59:46Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart" time="2020-05-19T07:59:57Z" level=info msg="Kibana status successfully updated"
I have tried to reproduce this bug, but so far not able to. 1. create logging instance with CR [1] 2. wait 20 mins 3. check status of logging pods. vimalkum bug-1832656 $ oc -n openshift-logging get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-6f7f888684-292sq 1/1 Running 0 32m cluster-logging-operator-registry-6b94c44598-pmsh8 1/1 Running 0 33m elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw 0/2 Pending 0 20m kibana-6f74f6c49b-6hdsx 0/2 Pending 0 20m The elasticsearch is not deployed if the node selector doesnt match [1] logging CR deployed apiVersion: logging.openshift.io/v1 kind: Elasticsearch metadata: creationTimestamp: "2020-05-26T10:49:03Z" generation: 4 name: elasticsearch namespace: openshift-logging ownerReferences: - apiVersion: logging.openshift.io/v1 controller: true kind: ClusterLogging name: instance uid: 6e13c27f-70ef-4d20-8030-20e5d764171a resourceVersion: "326088" selfLink: /apis/logging.openshift.io/v1/namespaces/openshift-logging/elasticsearches/elasticsearch uid: 764c5c34-96eb-4abf-911b-5c8c8e0eb5b4 spec: indexManagement: mappings: - aliases: - app - logs-app name: app policyRef: app-policy - aliases: - infra - logs-infra name: infra policyRef: infra-policy - aliases: - audit - logs-audit name: audit policyRef: audit-policy policies: - name: app-policy phases: delete: minAge: 1d hot: actions: rollover: maxAge: 1h pollInterval: 15m - name: infra-policy phases: delete: minAge: 7d hot: actions: rollover: maxAge: 8h pollInterval: 15m - name: audit-policy phases: delete: minAge: 1w hot: actions: rollover: maxAge: 1h pollInterval: 15m managementState: Managed nodeSpec: nodeSelector: logging: test resources: requests: memory: 2Gi nodes: - genUUID: y4u7ur3g nodeCount: 1 resources: {} roles: - client - data - master storage: size: 20Gi storageClassName: standard redundancyPolicy: ZeroRedundancy status: cluster: activePrimaryShards: 0 activeShards: 0 initializingShards: 0 numDataNodes: 0 numNodes: 0 pendingTasks: 0 relocatingShards: 0 status: cluster health unknown unassignedShards: 0 clusterHealth: "" conditions: [] nodes: - conditions: - lastTransitionTime: "2020-05-26T10:49:04Z" message: '0/1 nodes are available: 1 node(s) didn''t match node selector.' reason: Unschedulable status: "True" type: Unschedulable deploymentName: elasticsearch-cdm-y4u7ur3g-1 upgradeStatus: {} pods: client: failed: [] notReady: - elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw ready: [] data: failed: [] notReady: - elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw ready: [] master: failed: [] notReady: - elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw ready: [] shardAllocationEnabled: shard allocation unknown $ oc -n openshift-logging describe Elasticsearches/elasticsearch Name: elasticsearch Namespace: openshift-logging Labels: <none> Annotations: <none> API Version: logging.openshift.io/v1 Kind: Elasticsearch Metadata: Creation Timestamp: 2020-05-26T10:49:03Z Generation: 4 Owner References: API Version: logging.openshift.io/v1 Controller: true Kind: ClusterLogging Name: instance UID: 6e13c27f-70ef-4d20-8030-20e5d764171a Resource Version: 326088 Self Link: /apis/logging.openshift.io/v1/namespaces/openshift-logging/elasticsearches/elasticsearch UID: 764c5c34-96eb-4abf-911b-5c8c8e0eb5b4 Spec: Index Management: Mappings: Aliases: app logs-app Name: app Policy Ref: app-policy Aliases: infra logs-infra Name: infra Policy Ref: infra-policy Aliases: audit logs-audit Name: audit Policy Ref: audit-policy Policies: Name: app-policy Phases: Delete: Min Age: 1d Hot: Actions: Rollover: Max Age: 1h Poll Interval: 15m Name: infra-policy Phases: Delete: Min Age: 7d Hot: Actions: Rollover: Max Age: 8h Poll Interval: 15m Name: audit-policy Phases: Delete: Min Age: 1w Hot: Actions: Rollover: Max Age: 1h Poll Interval: 15m Management State: Managed Node Spec: Node Selector: Logging: test Resources: Requests: Memory: 2Gi Nodes: Gen UUID: y4u7ur3g Node Count: 1 Resources: Roles: client data master Storage: Size: 20Gi Storage Class Name: standard Redundancy Policy: ZeroRedundancy Status: Cluster: Active Primary Shards: 0 Active Shards: 0 Initializing Shards: 0 Num Data Nodes: 0 Num Nodes: 0 Pending Tasks: 0 Relocating Shards: 0 Status: cluster health unknown Unassigned Shards: 0 Cluster Health: Conditions: Nodes: Conditions: Last Transition Time: 2020-05-26T10:49:04Z Message: 0/1 nodes are available: 1 node(s) didn't match node selector. Reason: Unschedulable Status: True Type: Unschedulable Deployment Name: elasticsearch-cdm-y4u7ur3g-1 Upgrade Status: Pods: Client: Failed: Not Ready: elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw Ready: Data: Failed: Not Ready: elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw Ready: Master: Failed: Not Ready: elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw Ready: Shard Allocation Enabled: shard allocation unknown Events: <none>
[1] logging CR deployed apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" retentionPolicy: application: maxAge: 1d infra: maxAge: 7d audit: maxAge: 1w elasticsearch: nodeCount: 1 nodeSelector: logging: test redundancyPolicy: "ZeroRedundancy" resources: requests: memory: "2Gi" storage: storageClassName: "standard" size: "20Gi" visualization: type: "kibana" kibana: nodeSelector: logging: test replicas: 1 collection: logs: type: "fluentd" fluentd: nodeSelector: logging: test $ oc -n openshift-logging get Elasticsearches/elasticsearch -o yaml apiVersion: logging.openshift.io/v1 kind: Elasticsearch metadata: creationTimestamp: "2020-05-26T10:49:03Z" generation: 4 name: elasticsearch namespace: openshift-logging ownerReferences: - apiVersion: logging.openshift.io/v1 controller: true kind: ClusterLogging name: instance uid: 6e13c27f-70ef-4d20-8030-20e5d764171a resourceVersion: "326088" selfLink: /apis/logging.openshift.io/v1/namespaces/openshift-logging/elasticsearches/elasticsearch uid: 764c5c34-96eb-4abf-911b-5c8c8e0eb5b4 spec: indexManagement: mappings: - aliases: - app - logs-app name: app policyRef: app-policy - aliases: - infra - logs-infra name: infra policyRef: infra-policy - aliases: - audit - logs-audit name: audit policyRef: audit-policy policies: - name: app-policy phases: delete: minAge: 1d hot: actions: rollover: maxAge: 1h pollInterval: 15m - name: infra-policy phases: delete: minAge: 7d hot: actions: rollover: maxAge: 8h pollInterval: 15m - name: audit-policy phases: delete: minAge: 1w hot: actions: rollover: maxAge: 1h pollInterval: 15m managementState: Managed nodeSpec: nodeSelector: logging: test resources: requests: memory: 2Gi nodes: - genUUID: y4u7ur3g nodeCount: 1 resources: {} roles: - client - data - master storage: size: 20Gi storageClassName: standard redundancyPolicy: ZeroRedundancy status: cluster: activePrimaryShards: 0 activeShards: 0 initializingShards: 0 numDataNodes: 0 numNodes: 0 pendingTasks: 0 relocatingShards: 0 status: cluster health unknown unassignedShards: 0 clusterHealth: "" conditions: [] nodes: - conditions: - lastTransitionTime: "2020-05-26T10:49:04Z" message: '0/1 nodes are available: 1 node(s) didn''t match node selector.' reason: Unschedulable status: "True" type: Unschedulable deploymentName: elasticsearch-cdm-y4u7ur3g-1 upgradeStatus: {} pods: client: failed: [] notReady: - elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw ready: [] data: failed: [] notReady: - elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw ready: [] master: failed: [] notReady: - elasticsearch-cdm-y4u7ur3g-1-8767dcb78-z5rsw ready: [] shardAllocationEnabled: shard allocation unknown
As soon as the label logging=test is added, the logging components proceed to be deployed.
I'm not able to reproduce this issue either. I'll close it. Please feel free to reopen it if you can reproduce.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409