Bug 1752073
Summary: | Deploying elasticsearch fails using cluster-logging-operator 4.2.0-201909121019 : no matches for kind Elasticsearch | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
Component: | Logging | Assignee: | IgorKarpukhin <ikarpukh> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 4.2.0 | CC: | anli, aos-bugs, bparees, dkulkarn, jcantril, mfisher, redhat, rmeggins | ||||
Target Milestone: | --- | Keywords: | Regression | ||||
Target Release: | 4.4.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-05-04 11:13:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mike Fiedler
2019-09-13 15:41:59 UTC
The ES operator is running and ready in the openshift-operators namespace (I installed it to all namespaces from OperatorHub). It shows as "Copied" to the openshift-logging namespace, which is normal. The elasticsearch-operator pod log: oc logs elasticsearch-operator-7cf684655b-znjg5 time="2019-09-13T15:31:30Z" level=warning msg="Unable to parse loglevel \"\"" {"level":"info","ts":1568388690.6754048,"logger":"cmd","msg":"Go Version: go1.11.13"} {"level":"info","ts":1568388690.6754224,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1568388690.675427,"logger":"cmd","msg":"Version of operator-sdk: v0.7.0"} {"level":"info","ts":1568388690.6756885,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1568388690.7781808,"logger":"leader","msg":"No pre-existing lock was found."} {"level":"info","ts":1568388690.7835686,"logger":"leader","msg":"Became the leader."} {"level":"info","ts":1568388690.8535068,"logger":"cmd","msg":"Registering Components."} {"level":"info","ts":1568388690.8541727,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="} {"level":"info","ts":1568388690.9423194,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"} {"level":"info","ts":1568388690.9423404,"logger":"cmd","msg":"Starting the Cmd."} {"level":"info","ts":1568388691.0425055,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"} {"level":"info","ts":1568388691.142688,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1} # oc get crd | grep elastic elasticsearches.logging.openshift.io 2019-09-13T15:31:24Z Installing EO first fixed the problem. when I hit the problem the order of operations was: Install CLO in openshift-logging Install EO in all namespaces (showed as Copied in openshift-logging) Created clusterlogging resource from the OperatorHub UI I do not believe in 4.1 that the order of operator install mattered. Lowering sev and removing testblocker On 4.1 installing CLO before EO works fine. Next experiment on 4.2 is to install CLO and EO and then wait before creating CL instance. Verified this is not a timing issue of time between EO installation and clusterlogging instance creation. When CLO is installed before EO the elasticsearch deployment always fails no matter how long the wait is after deployment of the operators. When EO is installed before CLO this problem does not occur - elasticsearch deploys OK. This is different behavior from 4.1 where it did not matter which operator was installed first. Moving to 4.3 as installing EO first resolves the issue. Too risky of a change for code freeze. Resetting to 4.3. We'll have to cherry-pick back to 4.2 if needed Moving back to assigned as work is for CLO primarily not EO Fixed by increasing version of the operator-framework up to 0.8.2 for both EO and CLO. Tested on a 4.3 cluster. PR CLO: https://github.com/openshift/cluster-logging-operator/pull/287 RP EO: https://github.com/openshift/elasticsearch-operator/pull/198 Created attachment 1639526 [details]
clo logs and clusterlogging crd
1) deploy CLO prior EO
$ oc get pods -n openshift-logging
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-f5fdbfbbc-nzfpb 1/1 Running 0 14m
$ oc get pods -n openshift-operators
NAME READY STATUS RESTARTS AGE
elasticsearch-operator-57ff9bff5f-pxd48 1/1 Running 0 12m
2) elasticsearch resource wasn't created.
[anli@preserve-docker-slave 43]$ oc get clusterlogging instance
NAME AGE
instance 11m
[anli@preserve-docker-slave 43]$ oc get elasticsearch
No resources found.
3) Version
registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-cluster-logging-operator@sha256:4cbbe746941bd2e97d37f1d4b85d4da57349c26ea8d023c0e588a2a7d5634410:v4.3.0-201911220712
registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-elasticsearch-operator@sha256:0d1e799cbd93baf1d78aad3ea407aa8b3ea905f11304811a0d48ff77001975c5:v4.3.0-201911220712
4) For CLO logs and clusterlogging instance, refer to the attached files
Retarget to 4.4 since this is not a blocker lowering the priority as this is not a 4.4 blocker and work around is to deloy EO first. We believe this has already been resolved. Hit the error message in 4.3. The elasticsearch resource are created after serveral minutes. and the clusterlogging are deployed finally, Can't reproduce that. I deployed the CLO before the EO, then created the CLO cr. The CLO instance showed an error because the EO is not installed. Then I deployed the EO, and the CLO immediately created the elasticsearch instance. Here is the log before installation of the EO: E0210 14:23:36.409331 1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Elasticsearch: the server could not find the requested resource (get elasticsearches.logging.openshift.io) E0210 14:23:37.410913 1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Elasticsearch: the server could not find the requested resource (get elasticsearches.logging.openshift.io) E0210 14:23:38.412619 1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Elasticsearch: the server could not find the requested resource (get elasticsearches.logging.openshift.io) Here is after I installed the EO: time="2020-02-10T14:24:37Z" level=info msg="Updating status of Elasticsearch" time="2020-02-10T14:24:37Z" level=info msg="Updating status of Curator" time="2020-02-10T14:24:37Z" level=info msg="Collector volumes change found, updating \"fluentd\"" time="2020-02-10T14:24:37Z" level=info msg="Collector container volumemounts change found, updating \"fluentd\"" time="2020-02-10T14:24:38Z" level=info msg="Updating status of Fluentd" time="2020-02-10T14:24:38Z" level=info msg="Updating status of Fluentd" time="2020-02-10T14:24:38Z" level=info msg="Updating status of Fluentd" Elasticseach CR also created: [ikarpukh@ikarpukh cluster-logging-operator]$ oc get elasticsearch NAME AGE elasticsearch 151m (In reply to Anping Li from comment #20) > Hit the error message in 4.3. The elasticsearch resource are created after > serveral minutes. and the clusterlogging are deployed finally, The error message is correct because the resource is in fact missing. The operator eventually reconciles as expected and verified per your comment. This issue is resolved and should be either marked VERIFIED or CLOSED WORKSFORME. Verified as per comment 20 and 22 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |