Description of problem: The case occured in the 4.3.16 Stage testing, the current pods status is as below. [anli@preserve-docker-slave 90176]$ oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-d77f9c468-hlcl2 1/1 Running 0 5h15m curator-1587729600-bg9c4 0/1 Error 0 139m curator-1587730200-rk898 1/1 Running 0 129m curator-1587730800-c2fwt 1/1 Running 0 119m curator-1587731400-vzcp7 1/1 Running 0 109m curator-1587732000-kssl4 1/1 Running 0 99m curator-1587732600-dskcd 1/1 Running 0 89m curator-1587733200-rzr8h 1/1 Running 0 79m curator-1587733800-blq85 1/1 Running 0 69m curator-1587734400-6rw8l 1/1 Running 0 59m curator-1587735000-9gfk6 1/1 Running 0 49m curator-1587735600-nh75w 1/1 Running 0 39m curator-1587736200-b8hsx 1/1 Running 0 29m curator-1587736800-hdjqj 1/1 Running 0 19m curator-1587737400-8zn2c 1/1 Running 0 9m45s elasticsearch-cdm-rkrf6xcu-1-745775957c-fh49k 1/2 ImagePullBackOff 0 10h elasticsearch-cdm-rkrf6xcu-2-9cc497dcb-zk4kn 1/2 ImagePullBackOff 0 10h elasticsearch-cdm-rkrf6xcu-3-5c4469cc9f-dvgng 1/2 ImagePullBackOff 0 10h fluentd-2j4hm 1/1 Running 0 10h fluentd-2xpl4 1/1 Running 0 10h fluentd-7p6c8 1/1 Running 0 10h fluentd-9622j 1/1 Running 0 10h fluentd-gmg2h 1/1 Running 0 10h fluentd-p4wbd 1/1 Running 0 10h kibana-f946dd446-vh9m5 2/2 Running 0 100m 1) At the begging, The ES pods are ImagePullBackOff, as the clusterlogging csv provides a wrong elasticsearch pullspec. 2) After the csv was fixed, the OLM pulled the new CSV bundles automatically. NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.3.16-202004240713 Cluster Logging 4.3.16-202004240713 clusterlogging.4.3.14-202004231410 Succeeded The cluster-logging-operator-d77f9c468-hlcl2 was redeployed (5h15m ago) 3) The elasticsearch crd was updated to the new pullspec. But the ES deployments weren't updated. The ES pods is still ImagePullBackOff.(10h) Version-Release number of selected component (if applicable): clusterlogging.4.3.14-202004231410-> clusterlogging.4.3.16-202004240713 How reproducible: Rarely Steps to Reproduce: 1. Provide an invalid elasticsearch5 pullspec in clusterlogging CSV. 2. Deploy ClusterLogging. and ES pods ImagePullBackOff 3. Fix the pullspec in the clusterLogging CSV bundles 4. Check the elasticsearch CRD, ES deployment. Actual results: The elasticsearch CRD is updated to the new pullspec. But the Elasticsearch Deployments weren't updated to the new pull spec. The elasticsearch-operators messages: {"level":"info","ts":1587699611.2833447,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"} {"level":"info","ts":1587699611.2833846,"logger":"cmd","msg":"Starting the Cmd."} {"level":"info","ts":1587699611.3837557,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"} {"level":"info","ts":1587699611.4843507,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1} {"level":"error","ts":1587702552.9938052,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile Elasticsearch deployment spec: rpc error: code = Unavailable desc = etcdserver: leader changed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} time="2020-04-24T10:24:02Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-1: / green" time="2020-04-24T10:24:02Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-1: Cluster not in green state before beginning upgrade: " time="2020-04-24T10:24:32Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-2: / green" time="2020-04-24T10:24:32Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-2: Cluster not in green state before beginning upgrade: " time="2020-04-24T10:25:02Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-3: / green" time="2020-04-24T10:25:02Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-3: Cluster not in green state before beginning upgrade: " time="2020-04-24T10:27:03Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-1: / green" time="2020-04-24T10:27:03Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-1: Cluster not in green state before beginning upgrade: " time="2020-04-24T10:27:33Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-2: / green" time=
Created attachment 1681510 [details] The ES related data
This is another condition to check for whether or not an ES node is "stuck" -- being stuck lets us bypass the normal upgrade scenario since the node will not have data on it to be concerned with.
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.
Verified in registry.svc.ci.openshift.org/origin/4.5:elasticsearch-operator Digest: sha256:ec19b0ee59db780b37062a1170f0fd3eca0fc6e42dd6cb5d122f0233f8127936 build-date=2020-05-11T17:22:15.219602
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409