This bug was initially created as a copy of Bug #1880926 I am copying this bug because: Description of problem: 1. deploy logging 4.5 on a 4.5 cluster 2. upgrade logging to 4.6 3. upgrade cluster to 4.6, the ES deployments are disappeared after the cluster upgraded successfully There are plenty of below logs in the EO: {"level":"error","ts":1600656185.4719505,"logger":"elasticsearch-operator","caller":"k8shandler/cluster.go:213","msg":"Failed to progress update of unschedulable node","node":"elasticsearch-cdm-2wg9lezz-1","error":"Deployment.apps \"elasticsearch-cdm-2wg9lezz-1\" not found"} {"level":"error","ts":1600656185.4720213,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:65","msg":"unable to progress unschedulable nodes","cluster":"elasticsearch","namespace":"openshift-logging","error":"Deployment.apps \"elasticsearch-cdm-2wg9lezz-1\" not found"} I found lots of secrets were recreated: $ oc get secrets NAME TYPE DATA AGE builder-dockercfg-wws6m kubernetes.io/dockercfg 1 4h10m builder-token-qqg2t kubernetes.io/service-account-token 4 4h10m builder-token-xcd6k kubernetes.io/service-account-token 4 4h10m cluster-logging-operator-dockercfg-f29rl kubernetes.io/dockercfg 1 4h10m cluster-logging-operator-token-kx45d kubernetes.io/service-account-token 4 4h10m cluster-logging-operator-token-lhvg8 kubernetes.io/service-account-token 4 4h10m default-dockercfg-w45ql kubernetes.io/dockercfg 1 4h10m default-token-7xxvz kubernetes.io/service-account-token 4 4h10m default-token-mk62p kubernetes.io/service-account-token 4 4h10m deployer-dockercfg-bf96k kubernetes.io/dockercfg 1 4h10m deployer-token-4d9p4 kubernetes.io/service-account-token 4 4h10m deployer-token-cshkb kubernetes.io/service-account-token 4 4h10m elasticsearch Opaque 7 169m elasticsearch-dockercfg-rbxv5 kubernetes.io/dockercfg 1 169m elasticsearch-metrics kubernetes.io/tls 2 169m elasticsearch-token-lbxgf kubernetes.io/service-account-token 4 169m elasticsearch-token-p594b kubernetes.io/service-account-token 4 169m fluentd Opaque 3 169m fluentd-metrics kubernetes.io/tls 2 169m kibana Opaque 3 169m kibana-dockercfg-bn6dk kubernetes.io/dockercfg 1 169m kibana-proxy Opaque 3 169m kibana-token-qqksm kubernetes.io/service-account-token 4 169m kibana-token-zs7w2 kubernetes.io/service-account-token 4 169m logcollector-dockercfg-gbwqn kubernetes.io/dockercfg 1 169m logcollector-token-f5qxl kubernetes.io/service-account-token 4 169m logcollector-token-gbvf8 kubernetes.io/service-account-token 4 169m master-certs Opaque 2 169m $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-779f857c67-zkq6k 1/1 Running 0 167m elasticsearch-delete-app-1600665300-bq9lq 0/1 Error 0 14m elasticsearch-delete-audit-1600665300-5pnrd 0/1 Error 0 14m elasticsearch-delete-infra-1600665300-kbm6l 0/1 Error 0 14m elasticsearch-rollover-app-1600665300-7tlhh 0/1 Error 0 14m elasticsearch-rollover-audit-1600665300-27b5q 0/1 Error 0 14m elasticsearch-rollover-infra-1600665300-psf7m 0/1 Error 0 14m fluentd-7jhf7 0/1 Init:CrashLoopBackOff 31 169m fluentd-fzg9m 0/1 Init:CrashLoopBackOff 30 169m fluentd-k7zqc 0/1 Init:CrashLoopBackOff 31 169m fluentd-l5ms4 0/1 Init:CrashLoopBackOff 31 169m fluentd-m4829 0/1 Init:CrashLoopBackOff 29 169m fluentd-zphfg 0/1 Init:CrashLoopBackOff 30 169m kibana-549dff7bcd-ml6r7 2/2 Running 0 167m Elasticsearch/elasticsearch: spec: indexManagement: mappings: - aliases: - app - logs.app name: app policyRef: app-policy - aliases: - infra - logs.infra name: infra policyRef: infra-policy - aliases: - audit - logs.audit name: audit policyRef: audit-policy policies: - name: app-policy phases: delete: minAge: 1d hot: actions: rollover: maxAge: 1h pollInterval: 15m - name: infra-policy phases: delete: minAge: 12h hot: actions: rollover: maxAge: 36m pollInterval: 15m - name: audit-policy phases: delete: minAge: 2w hot: actions: rollover: maxAge: 2h pollInterval: 15m managementState: Managed nodeSpec: proxyResources: limits: memory: 64Mi requests: cpu: 100m memory: 64Mi resources: requests: memory: 2Gi nodes: - genUUID: 2wg9lezz nodeCount: 3 proxyResources: {} resources: {} roles: - client - data - master storage: size: 20Gi storageClassName: standard redundancyPolicy: SingleRedundancy status: cluster: activePrimaryShards: 0 activeShards: 0 initializingShards: 0 numDataNodes: 0 numNodes: 0 pendingTasks: 0 relocatingShards: 0 status: cluster health unknown unassignedShards: 0 nodes: - conditions: - lastTransitionTime: "2020-09-21T02:38:19Z" message: '0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.' reason: Unschedulable status: "True" type: Unschedulable deploymentName: elasticsearch-cdm-2wg9lezz-1 upgradeStatus: upgradePhase: controllerUpdated - conditions: - lastTransitionTime: "2020-09-21T02:40:26Z" reason: Error status: "True" type: ElasticsearchContainerTerminated - lastTransitionTime: "2020-09-21T02:40:26Z" reason: Error status: "True" type: ProxyContainerTerminated deploymentName: elasticsearch-cdm-2wg9lezz-2 upgradeStatus: upgradePhase: controllerUpdated - conditions: - lastTransitionTime: "2020-09-21T02:40:29Z" reason: ContainerCreating status: "True" type: ElasticsearchContainerWaiting - lastTransitionTime: "2020-09-21T02:40:29Z" reason: ContainerCreating status: "True" type: ProxyContainerWaiting deploymentName: elasticsearch-cdm-2wg9lezz-3 upgradeStatus: upgradePhase: controllerUpdated pods: client: failed: [] notReady: [] ready: [] data: failed: [] notReady: [] ready: [] master: failed: [] notReady: [] ready: [] shardAllocationEnabled: shard allocation unknown $ oc get deploy NAME READY UP-TO-DATE AVAILABLE AGE cluster-logging-operator 1/1 1 1 4h12m kibana 1/1 1 1 171m Version-Release number of selected component (if applicable): $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.6.0-202009192030.p0 Cluster Logging 4.6.0-202009192030.p0 clusterlogging.4.5.0-202009161248.p0 Succeeded elasticsearch-operator.4.6.0-202009192030.p0 Elasticsearch Operator 4.6.0-202009192030.p0 elasticsearch-operator.4.5.0-202009182238.p0 Succeeded How reproducible: I tried 3 times, only hit once Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: must-gather: http://file.apac.redhat.com/~qitang/must-gather-0921.tar.gz
Blocked as there isn't 4.7 operator/bundle images
Verified this on clusterlogging.4.7.0-202011021919.p0, elasticsearch-operator.4.7.0-202011030448.p0 and cluster 4.7.0-0.nightly-2020-10-27-051128.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652