Bug 1752073

Summary: Deploying elasticsearch fails using cluster-logging-operator 4.2.0-201909121019 : no matches for kind Elasticsearch
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: LoggingAssignee: IgorKarpukhin <ikarpukh>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: low Docs Contact:
Priority: low    
Version: 4.2.0CC: anli, aos-bugs, bparees, dkulkarn, jcantril, mfisher, redhat, rmeggins
Target Milestone: ---Keywords: Regression
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 11:13:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
clo logs and clusterlogging crd none

Description Mike Fiedler 2019-09-13 15:41:59 UTC
Description of problem:

Tried deploying clusterlogging using 4.2.0-201909121019 (full image list below) from OperatorHub and elasticsearch never deploys when a clusterlogging CR is created:


{"level":"error","ts":1568388775.6281145,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"clusterlogging-controller","request":"openshift-logging/instance","error":"Unable to create or update logstore for \"instance\": Failure creating Elasticsearch CR: no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1\"","stackt
race":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/contr
oller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/cluster-logging-operator/_outp
ut/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/cluster-logging-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait
.go:88"}



Version-Release number of selected component (if applicable): 4.2.0-201909121019


How reproducible: Always


Steps to Reproduce:
1.  Follow the instructions for using the redhat-operators-art registry (instructions in private comment)
2.  Install the CL and ES operators
3.  Create a CL instance

Actual results:

Elasticsearch never deploys - error above seen in clusterlogging operator pod log.


Expected results:

EFK install created successfully



Additional info:

ose-elasticsearch-operator:v4.2.0-201909121419
ose-oauth-proxy:v4.2.0-201909081401
ose-logging-elasticsearch5:v4.2.0-201909112219
ose-cluster-logging-operator:v4.2.0-201909121019
ose-logging-elasticsearch5:v4.2.0-201909112219
ose-logging-fluentd:v4.2.0-201909112219
ose-logging-kibana5:v4.2.0-201909112219
ose-logging-curator5:v4.2.0-201909112219

Comment 2 Mike Fiedler 2019-09-13 18:34:47 UTC
The ES operator is running and ready in the openshift-operators namespace (I installed it to all namespaces from OperatorHub).  It shows as "Copied" to the openshift-logging namespace, which is normal.    The elasticsearch-operator pod log:

oc logs elasticsearch-operator-7cf684655b-znjg5
time="2019-09-13T15:31:30Z" level=warning msg="Unable to parse loglevel \"\""
{"level":"info","ts":1568388690.6754048,"logger":"cmd","msg":"Go Version: go1.11.13"}
{"level":"info","ts":1568388690.6754224,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1568388690.675427,"logger":"cmd","msg":"Version of operator-sdk: v0.7.0"}
{"level":"info","ts":1568388690.6756885,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1568388690.7781808,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1568388690.7835686,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1568388690.8535068,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1568388690.8541727,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1568388690.9423194,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"info","ts":1568388690.9423404,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1568388691.0425055,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1568388691.142688,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}

Comment 3 Mike Fiedler 2019-09-13 18:35:53 UTC
# oc get crd | grep elastic
elasticsearches.logging.openshift.io                        2019-09-13T15:31:24Z

Comment 4 Mike Fiedler 2019-09-13 19:14:27 UTC
Installing EO first fixed the problem.  when I hit the problem the order of operations was:

Install CLO in openshift-logging
Install EO in all namespaces (showed as Copied in openshift-logging)
Created clusterlogging resource from the OperatorHub UI

I do not believe in 4.1 that the order of operator install mattered.  Lowering sev and removing testblocker

Comment 5 Mike Fiedler 2019-09-13 22:52:20 UTC
On 4.1 installing CLO before EO works fine.   Next experiment on 4.2 is to install CLO and EO and then wait before creating CL instance.

Comment 6 Mike Fiedler 2019-09-16 12:56:08 UTC
Verified this is not a timing issue of time between EO installation and clusterlogging instance creation.   When CLO is installed before EO the elasticsearch deployment always fails no matter how long the wait is after deployment of the operators.

When EO is installed before CLO this problem does not occur - elasticsearch deploys OK.   

This is different behavior from 4.1 where it did not matter which operator was installed first.

Comment 7 Jeff Cantrill 2019-09-16 19:34:02 UTC
Moving to 4.3 as installing EO first resolves the issue.

Comment 12 Jeff Cantrill 2019-09-20 14:23:01 UTC
Too risky of a change for code freeze.  Resetting to 4.3.  We'll have to cherry-pick back to 4.2 if needed

Comment 13 Jeff Cantrill 2019-11-08 13:49:55 UTC
Moving back to assigned as work is for CLO primarily not EO

Comment 14 IgorKarpukhin 2019-11-11 12:29:09 UTC
Fixed by increasing version of the operator-framework up to 0.8.2 for both EO and CLO. Tested on a 4.3 cluster. 
PR CLO: https://github.com/openshift/cluster-logging-operator/pull/287
RP EO: https://github.com/openshift/elasticsearch-operator/pull/198

Comment 16 Anping Li 2019-11-25 14:45:27 UTC
Created attachment 1639526 [details]
clo logs and clusterlogging crd

1) deploy CLO prior EO
$ oc get pods -n openshift-logging
NAME                                       READY   STATUS    RESTARTS   AGE
cluster-logging-operator-f5fdbfbbc-nzfpb   1/1     Running   0          14m
$ oc get pods -n openshift-operators
NAME                                      READY   STATUS    RESTARTS   AGE
elasticsearch-operator-57ff9bff5f-pxd48   1/1     Running   0          12m

2) elasticsearch resource wasn't created.
[anli@preserve-docker-slave 43]$ oc get clusterlogging instance
NAME       AGE
instance   11m
[anli@preserve-docker-slave 43]$ oc get elasticsearch
No resources found.
3) Version
registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-cluster-logging-operator@sha256:4cbbe746941bd2e97d37f1d4b85d4da57349c26ea8d023c0e588a2a7d5634410:v4.3.0-201911220712
registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-elasticsearch-operator@sha256:0d1e799cbd93baf1d78aad3ea407aa8b3ea905f11304811a0d48ff77001975c5:v4.3.0-201911220712

4) For CLO logs and clusterlogging instance, refer to the attached files

Comment 17 Jeff Cantrill 2019-11-25 20:57:40 UTC
Retarget to 4.4 since this is not a blocker

Comment 19 Jeff Cantrill 2020-02-04 20:50:53 UTC
lowering the priority as this is not a 4.4 blocker and work around is to deloy EO first.  We believe this has already been resolved.

Comment 20 Anping Li 2020-02-10 10:17:35 UTC
Hit the error message in 4.3. The elasticsearch resource are created after serveral minutes. and the clusterlogging are deployed finally,

Comment 21 IgorKarpukhin 2020-02-10 19:20:54 UTC
Can't reproduce that. I deployed the CLO before the EO, then created the CLO cr. The CLO instance showed an error because the EO is not installed. Then I deployed the EO, and the CLO immediately created the elasticsearch instance. Here is the log before installation of the EO:

E0210 14:23:36.409331       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Elasticsearch: the server could not find the requested resource (get elasticsearches.logging.openshift.io)
E0210 14:23:37.410913       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Elasticsearch: the server could not find the requested resource (get elasticsearches.logging.openshift.io)
E0210 14:23:38.412619       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Elasticsearch: the server could not find the requested resource (get elasticsearches.logging.openshift.io)


Here is after I installed the EO:

time="2020-02-10T14:24:37Z" level=info msg="Updating status of Elasticsearch"
time="2020-02-10T14:24:37Z" level=info msg="Updating status of Curator"
time="2020-02-10T14:24:37Z" level=info msg="Collector volumes change found, updating \"fluentd\""
time="2020-02-10T14:24:37Z" level=info msg="Collector container volumemounts change found, updating \"fluentd\""
time="2020-02-10T14:24:38Z" level=info msg="Updating status of Fluentd"
time="2020-02-10T14:24:38Z" level=info msg="Updating status of Fluentd"
time="2020-02-10T14:24:38Z" level=info msg="Updating status of Fluentd"


Elasticseach CR also created:

[ikarpukh@ikarpukh cluster-logging-operator]$ oc get elasticsearch
NAME            AGE
elasticsearch   151m

Comment 22 Jeff Cantrill 2020-02-17 16:29:04 UTC
(In reply to Anping Li from comment #20)
> Hit the error message in 4.3. The elasticsearch resource are created after
> serveral minutes. and the clusterlogging are deployed finally,


The error message is correct because the resource is in fact missing.  The operator eventually reconciles as expected and verified per your comment.  This issue is resolved and should be either marked VERIFIED or CLOSED WORKSFORME.

Comment 23 Anping Li 2020-02-18 14:54:40 UTC
Verified as per comment 20 and 22

Comment 26 errata-xmlrpc 2020-05-04 11:13:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581