Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1827690

Summary:

The elasticsearch deployments weren't updated

Product:

OpenShift Container Platform

Reporter:

Anping Li <anli>

Component:

Logging

Assignee:

ewolinet

Status:

CLOSED ERRATA

QA Contact:

Anping Li <anli>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

4.3.z

CC:

aos-bugs, ewolinet, jcantril, scuppett

Target Milestone:

---

Keywords:

Upgrades

Target Release:

4.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1828906 1828907 (view as bug list)

Environment:

Last Closed:

2020-07-13 17:31:25 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1828906

Attachments:

Description	Flags
The ES related data	none

Description Anping Li 2020-04-24 14:34:03 UTC

Description of problem:
The case occured in the 4.3.16 Stage testing, the current pods status is as below.
 
[anli@preserve-docker-slave 90176]$ oc get pods
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-d77f9c468-hlcl2        1/1     Running            0          5h15m
curator-1587729600-bg9c4                        0/1     Error              0          139m
curator-1587730200-rk898                        1/1     Running            0          129m
curator-1587730800-c2fwt                        1/1     Running            0          119m
curator-1587731400-vzcp7                        1/1     Running            0          109m
curator-1587732000-kssl4                        1/1     Running            0          99m
curator-1587732600-dskcd                        1/1     Running            0          89m
curator-1587733200-rzr8h                        1/1     Running            0          79m
curator-1587733800-blq85                        1/1     Running            0          69m
curator-1587734400-6rw8l                        1/1     Running            0          59m
curator-1587735000-9gfk6                        1/1     Running            0          49m
curator-1587735600-nh75w                        1/1     Running            0          39m
curator-1587736200-b8hsx                        1/1     Running            0          29m
curator-1587736800-hdjqj                        1/1     Running            0          19m
curator-1587737400-8zn2c                        1/1     Running            0          9m45s
elasticsearch-cdm-rkrf6xcu-1-745775957c-fh49k   1/2     ImagePullBackOff   0          10h
elasticsearch-cdm-rkrf6xcu-2-9cc497dcb-zk4kn    1/2     ImagePullBackOff   0          10h
elasticsearch-cdm-rkrf6xcu-3-5c4469cc9f-dvgng   1/2     ImagePullBackOff   0          10h
fluentd-2j4hm                                   1/1     Running            0          10h
fluentd-2xpl4                                   1/1     Running            0          10h
fluentd-7p6c8                                   1/1     Running            0          10h
fluentd-9622j                                   1/1     Running            0          10h
fluentd-gmg2h                                   1/1     Running            0          10h
fluentd-p4wbd                                   1/1     Running            0          10h
kibana-f946dd446-vh9m5                          2/2     Running            0          100m


1) At the begging, The ES pods are ImagePullBackOff, as the clusterlogging csv provides a wrong elasticsearch pullspec.

2) After the csv was fixed, the OLM pulled the new CSV bundles automatically.
NAME                                         DISPLAY                  VERSION               REPLACES                             PHASE
clusterlogging.4.3.16-202004240713           Cluster Logging          4.3.16-202004240713   clusterlogging.4.3.14-202004231410   Succeeded

The cluster-logging-operator-d77f9c468-hlcl2 was redeployed (5h15m ago)

3) The elasticsearch crd was updated to the new pullspec. But the ES deployments weren't updated.  The ES pods is still ImagePullBackOff.(10h)


Version-Release number of selected component (if applicable):
clusterlogging.4.3.14-202004231410-> clusterlogging.4.3.16-202004240713

How reproducible:
Rarely

Steps to Reproduce:
1. Provide an invalid elasticsearch5 pullspec in clusterlogging CSV.
2. Deploy ClusterLogging. and ES pods ImagePullBackOff
3. Fix the pullspec in the clusterLogging CSV bundles
4. Check the elasticsearch CRD, ES deployment.

Actual results:
The elasticsearch CRD is updated to the new pullspec.
But the Elasticsearch Deployments weren't updated to the new pull spec.

The elasticsearch-operators messages:

{"level":"info","ts":1587699611.2833447,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"info","ts":1587699611.2833846,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1587699611.3837557,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1587699611.4843507,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
{"level":"error","ts":1587702552.9938052,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile Elasticsearch deployment spec: rpc error: code = Unavailable desc = etcdserver: leader changed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
time="2020-04-24T10:24:02Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-1:  / green"
time="2020-04-24T10:24:02Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-1: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:24:32Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-2:  / green"
time="2020-04-24T10:24:32Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-2: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:25:02Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-3:  / green"
time="2020-04-24T10:25:02Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-3: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:27:03Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-1:  / green"
time="2020-04-24T10:27:03Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-rkrf6xcu-1: Cluster not in green state before beginning upgrade: "
time="2020-04-24T10:27:33Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-rkrf6xcu-2:  / green"
time=

Comment 2 Anping Li 2020-04-24 14:48:02 UTC

Created attachment 1681510 [details]
The ES related data

Comment 3 ewolinet 2020-04-24 15:12:47 UTC

This is another condition to check for whether or not an ES node is "stuck" -- being stuck lets us bypass the normal upgrade scenario since the node will not have data on it to be concerned with.

Comment 5 Stephen Cuppett 2020-04-24 16:21:59 UTC

Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 8 Anping Li 2020-05-15 05:00:09 UTC

Verified in registry.svc.ci.openshift.org/origin/4.5:elasticsearch-operator
Digest:      sha256:ec19b0ee59db780b37062a1170f0fd3eca0fc6e42dd6cb5d122f0233f8127936
build-date=2020-05-11T17:22:15.219602

Comment 9 errata-xmlrpc 2020-07-13 17:31:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409