Bug 1889573 - The EO CrashLoopBackOff after update the kibana resource configurations in clusterlogging instance.
Summary: The EO CrashLoopBackOff after update the kibana resource configurations in cl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Hui Kang
QA Contact: Qiaoling Tang
Rolfe Dlugy-Hegwer
URL:
Whiteboard: logging-exploration
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-20 03:11 UTC by Qiaoling Tang
Modified: 2021-02-24 11:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, if you updated the Kibana resource configuration in the clusterlogging instance to `resource{}`. The resulting a nil map caused a panic and changed the status of the Elasticsearch Operator to `CrashLoopBackOff`. The current release fixes this issue by initializing the map. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889573[*BZ#1889573*])
Clone Of:
Environment:
Last Closed: 2021-02-24 11:21:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 545 0 None closed Bug 1889573: Fix nil map in updating kibana resource 2021-02-05 23:53:34 UTC
Red Hat Product Errata RHBA-2021:0652 0 None None None 2021-02-24 11:22:11 UTC

Description Qiaoling Tang 2020-10-20 03:11:29 UTC
Description of problem:
The EO CrashLoopBackOff after update the kibana resource configurations in clusterlogging instance.

EO logs:
$ oc logs -n openshift-operators-redhat elasticsearch-operator-5cff98d5d5-dgpbb 
{"level":"info","ts":1603161202.1153483,"logger":"cmd","msg":"Go Version: go1.13.4"}
{"level":"info","ts":1603161202.1153812,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1603161202.1153867,"logger":"cmd","msg":"Version of operator-sdk: v0.8.2"}
{"level":"info","ts":1603161202.1159978,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1603161202.4679542,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1603161202.4679945,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1603161202.858305,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1603161202.8615181,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"kibana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.861915,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.862184,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"proxyconfig-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.8623393,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"kibanasecret-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.8626356,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"trustedcabundle-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161203.299226,"logger":"cmd","msg":"This operator no longer honors the image specified by the custom resources so that it is able to properly coordinate the configuration with the image."}
{"level":"info","ts":1603161203.2992716,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1603161204.5996928,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"trustedcabundle-controller"}
{"level":"info","ts":1603161204.5997734,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibana-controller"}
{"level":"info","ts":1603161204.5998037,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1603161204.599835,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"proxyconfig-controller"}
{"level":"info","ts":1603161204.5998628,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibanasecret-controller"}
{"level":"info","ts":1603161204.6999161,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"proxyconfig-controller","worker count":1}
{"level":"info","ts":1603161204.7001863,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"trustedcabundle-controller","worker count":1}
{"level":"info","ts":1603161204.700282,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibana-controller","worker count":1}
{"level":"info","ts":1603161204.7006946,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
{"level":"info","ts":1603161204.702338,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibanasecret-controller","worker count":1}
E1020 02:33:25.661691       1 runtime.go:69] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/panic.go:679
/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/map_faststr.go:204
/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:18
/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:78
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:414
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:303
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:64
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:63
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:290
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:120
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:67
/go/src/github.com/openshift/elasticsearch-operator/pkg/controller/kibana/controller.go:69
/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215
/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/asm_amd64.s:1357
panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

goroutine 628 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105
panic(0x14be860, 0x18a0df0)
	/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/panic.go:679 +0x1b2
github.com/openshift/elasticsearch-operator/pkg/utils.CompareResources(0x0, 0x0, 0xc000412630, 0xc000412660, 0x54, 0x4f, 0x25fb)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:18 +0x1581
github.com/openshift/elasticsearch-operator/pkg/utils.AreResourcesDifferent(0x1681640, 0xc001bcfb00, 0x1681640, 0xc00243a900, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:78 +0x4b4
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.isDeploymentDifferent(0xc001bcfb00, 0xc00243a900, 0x6, 0x18c99e0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:414 +0x156
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.(*KibanaRequest).createOrUpdateKibanaDeployment.func1(0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:303 +0xcb
k8s.io/client-go/util/retry.RetryOnConflict.func1(0x28, 0xc002408a80, 0xd)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:64 +0x33
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0xc00257d468, 0x0, 0xc002408a60)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 +0xde
k8s.io/client-go/util/retry.RetryOnConflict(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0xc00257d6b0, 0x6, 0xc00055e274)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:63 +0xa0
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.(*KibanaRequest).createOrUpdateKibanaDeployment(0xc00257dad8, 0xc0003941a0, 0x0, 0x23f2a28)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:290 +0x511
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.reconcileKibana(0xc0008c7080, 0x1905b60, 0xc0007a31d0, 0x192dea0, 0xc00092c480, 0xc0003941a0, 0x6, 0x18c8820)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:120 +0x276
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.Reconcile(0xc0005dff20, 0x11, 0xc00055e274, 0x6, 0x1905b60, 0xc0007a31d0, 0x192dea0, 0xc00092c480, 0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:67 +0x171
github.com/openshift/elasticsearch-operator/pkg/controller/kibana.(*ReconcileKibana).Reconcile(0xc00052c400, 0xc0005dff20, 0x11, 0xc00055e274, 0x6, 0x5, 0x400, 0xc0002da000, 0xd)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/controller/kibana/controller.go:69 +0x2a6
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002ee460, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 +0x20a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
	/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc005f2e780)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc005f2e780, 0x3b9aca00, 0x0, 0x1, 0xc0002bc120)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc005f2e780, 0x3b9aca00, 0xc0002bc120)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x32e

$ oc get pod -n openshift-operators-redhat
NAME                                      READY   STATUS             RESTARTS   AGE
elasticsearch-operator-5cff98d5d5-dgpbb   0/1     CrashLoopBackOff   5          27m


Version-Release number of selected component (if applicable):
elasticsearch-operator.4.5.0-202010161522.p0

How reproducible:
Always

Steps to Reproduce:
1. deploy logging 4.5
2. create clusterlogging instance with:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 3h
      audit:
        maxAge: 2w
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: "SingleRedundancy"
      resources:
        requests:
          memory: "2Gi"
      storage:
        storageClassName: "standard"
        size: "20Gi"
  visualization:
    type: "kibana"
    kibana:
      resources: {}
      replicas: 1
  collection:
    logs:
      type: "fluentd"
      fluentd: {}
3. wait until all the EFK pods become Running, update the kibana resource configurations to:
    managementState: Managed
    visualization:
      kibana:
        proxy:
          resources:
            limits:
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 1Gi
        replicas: 1
        resources:
          limits:
            cpu: 1000m
            memory: 4Gi
          requests:
            cpu: 800m
            memory: 2Gi
      type: kibana
4. check the EO status

Actual results:


Expected results:


Additional info:
elasticsearch-operator.4.5.0-202010081312.p0(released version) also has this issue.

Comment 1 Qiaoling Tang 2020-10-20 03:12:19 UTC
elasticsearch-operator.4.6.0-202010140833.p0 has the same issue.

Comment 2 Qiaoling Tang 2020-10-20 03:30:37 UTC
> 1. deploy logging 4.5
> 2. create clusterlogging instance with:
> apiVersion: "logging.openshift.io/v1"
> kind: "ClusterLogging"
> metadata:
>   name: "instance"
>   namespace: "openshift-logging"
> spec:
>   managementState: "Managed"
>   logStore:
>     type: "elasticsearch"
>     retentionPolicy: 
>       application:
>         maxAge: 1d
>       infra:
>         maxAge: 3h
>       audit:
>         maxAge: 2w
>     elasticsearch:
>       nodeCount: 3
>       redundancyPolicy: "SingleRedundancy"
>       resources:
>         requests:
>           memory: "2Gi"
>       storage:
>         storageClassName: "standard"
>         size: "20Gi"
>   visualization:
>     type: "kibana"
>     kibana:
>       resources: {}
>       replicas: 1
>   collection:
>     logs:
>       type: "fluentd"
>       fluentd: {}

I found the issue only happens when there has `spec.visualization.kibana.resources: {}` in the clusterlogging instance, if create clusterlogging instance with the below yaml and do the same steps, no such issue.

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 3h
      audit:
        maxAge: 2w
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: "SingleRedundancy"
      resources:
        requests:
          memory: "2Gi"
      storage:
        storageClassName: "standard"
        size: "20Gi"
  visualization:
    type: "kibana"
    kibana:
      replicas: 1
  collection:
    logs:
      type: "fluentd"
      fluentd: {}


> 3. wait until all the EFK pods become Running, update the kibana resource
> configurations to:
>     managementState: Managed
>     visualization:
>       kibana:
>         proxy:
>           resources:
>             limits:
>               memory: 1Gi
>             requests:
>               cpu: 100m
>               memory: 1Gi
>         replicas: 1
>         resources:
>           limits:
>             cpu: 1000m
>             memory: 4Gi
>           requests:
>             cpu: 800m
>             memory: 2Gi
>       type: kibana
> 4. check the EO status

Comment 3 Hui Kang 2020-10-21 13:38:00 UTC
Doc Text: Previously, the operator fails when the current resource of kibana is resources: {}
Doc type: Bug fix

Comment 5 Qiaoling Tang 2020-10-27 08:18:34 UTC
Verified with quay.io/openshift/origin-elasticsearch-operator@sha256:1a1446fab00689c1e1eb256ad57be20ef0b2215236841564254862d888efd007

Comment 12 errata-xmlrpc 2021-02-24 11:21:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0652


Note You need to log in before you can comment on or make changes to this bug.