Bug 1827690 - The elasticsearch deployments weren't updated
Summary: The elasticsearch deployments weren't updated
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.5.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks: 1828906
TreeView+ depends on / blocked
 
Reported: 2020-04-24 14:34 UTC by Anping Li
Modified: 2020-07-13 17:31 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1828906 1828907 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:31:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The ES related data (100.00 KB, application/x-tar)
2020-04-24 14:48 UTC, Anping Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 327 0 None closed Bug 1827690: Adding ImagePullBackOff condition to unschedulableNodes 2020-09-23 02:41:07 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:31:42 UTC

Description Anping Li 2020-04-24 14:34:03 UTC
Description of problem:
The case occured in the 4.3.16 Stage testing, the current pods status is as below.
 
[anli@preserve-docker-slave 90176]$ oc get pods
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-d77f9c468-hlcl2        1/1     Running            0          5h15m
curator-1587729600-bg9c4                        0/1     Error              0          139m
curator-1587730200-rk898                        1/1     Running            0          129m
curator-1587730800-c2fwt                        1/1     Running            0          119m
curator-1587731400-vzcp7                        1/1     Running            0          109m
curator-1587732000-kssl4                        1/1     Running            0          99m
curator-1587732600-dskcd                        1/1     Running            0          89m
curator-1587733200-rzr8h                        1/1     Running            0          79m
curator-1587733800-blq85                        1/1     Running            0          69m
curator-1587734400-6rw8l                        1/1     Running            0          59m
curator-1587735000-9gfk6                        1/1     Running            0          49m
curator-1587735600-nh75w                        1/1     Running            0          39m
curator-1587736200-b8hsx                        1/1     Running            0          29m
curator-1587736800-hdjqj                        1/1     Running            0          19m
curator-1587737400-8zn2c                        1/1     Running            0          9m45s
elasticsearch-cdm-rkrf6xcu-1-745775957c-fh49k   1/2     ImagePullBackOff   0          10h
elasticsearch-cdm-rkrf6xcu-2-9cc497dcb-zk4kn    1/2     ImagePullBackOff   0          10h
elasticsearch-cdm-rkrf6xcu-3-5c4469cc9f-dvgng   1/2     ImagePullBackOff   0          10h
fluentd-2j4hm                                   1/1     Running            0          10h
fluentd-2xpl4                                   1/1     Running            0          10h
fluentd-7p6c8                                   1/1     Running            0          10h
fluentd-9622j                                   1/1     Running            0          10h
fluentd-gmg2h                                   1/1     Running            0          10h
fluentd-p4wbd                                   1/1     Running            0          10h
kibana-f946dd446-vh9m5                          2/2     Running            0          100m


1) At the begging, The ES pods are ImagePullBackOff, as the clusterlogging csv provides a wrong elasticsearch pullspec.

2) After the csv was fixed, the OLM pulled the new CSV bundles automatically.
NAME                                         DISPLAY                  VERSION               REPLACES                             PHASE
clusterlogging.4.3.16-202004240713           Cluster Logging          4.3.16-202004240713   clusterlogging.4.3.14-202004231410   Succeeded

The cluster-logging-operator-d77f9c468-hlcl2 was redeployed (5h15m ago)

3) The elasticsearch crd was updated to the new pullspec. But the ES deployments weren't updated.  The ES pods is still ImagePullBackOff.(10h)


Version-Release number of selected component (if applicable):
clusterlogging.4.3.14-202004231410-> clusterlogging.4.3.16-202004240713

How reproducible:
Rarely

Steps to Reproduce:
1. Provide an invalid elasticsearch5 pullspec in clusterlogging CSV.
2. Deploy ClusterLogging. and ES pods ImagePullBackOff
3. Fix the pullspec in the clusterLogging CSV bundles
4. Check the elasticsearch CRD, ES deployment.

Actual results:
The elasticsearch CRD is updated to the new pullspec.
But the Elasticsearch Deployments weren't updated to the new pull spec.

The elasticsearch-operators messages:

{"level":"info","ts":1587699611.2833447,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"info","ts":1587699611.2833846,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1587699611.3837557,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1587699611.4843507,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
{"level":"error","ts":1587702552.9938052,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile Elasticsearch deployment spec: rpc error: code = Unavailable desc = etcdserver: leader changed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
time="2020-04-24T10:24:02Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-1:  / green"
time="2020-04-24T10:24:02Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-1: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:24:32Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-2:  / green"
time="2020-04-24T10:24:32Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-2: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:25:02Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-3:  / green"
time="2020-04-24T10:25:02Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-3: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:27:03Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-1:  / green"
time="2020-04-24T10:27:03Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-1: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:27:33Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-2:  / green"
time=

Comment 2 Anping Li 2020-04-24 14:48:02 UTC
Created attachment 1681510 [details]
The ES related data

Comment 3 ewolinet 2020-04-24 15:12:47 UTC
This is another condition to check for whether or not an ES node is "stuck" -- being stuck lets us bypass the normal upgrade scenario since the node will not have data on it to be concerned with.

Comment 5 Stephen Cuppett 2020-04-24 16:21:59 UTC
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 8 Anping Li 2020-05-15 05:00:09 UTC
Verified in registry.svc.ci.openshift.org/origin/4.5:elasticsearch-operator
Digest:      sha256:ec19b0ee59db780b37062a1170f0fd3eca0fc6e42dd6cb5d122f0233f8127936
build-date=2020-05-11T17:22:15.219602

Comment 9 errata-xmlrpc 2020-07-13 17:31:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.