Bug 1889573 - The EO CrashLoopBackOff after update the kibana resource configurations in clusterlogging instance.
Summary: The EO CrashLoopBackOff after update the kibana resource configurations in cl...
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Hui Kang
QA Contact: Qiaoling Tang
URL:
Whiteboard: logging-exploration
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-20 03:11 UTC by Qiaoling Tang
Modified: 2020-11-09 11:09 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When current resource of kibana is set as "resource{}", it is a nil map. Consequence: panic due to access nil map Fix: initialize the map Result: Fix the bug
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 545 None closed Bug 1889573: Fix nil map in updating kibana resource 2020-11-09 11:09:22 UTC

Description Qiaoling Tang 2020-10-20 03:11:29 UTC
Description of problem:
The EO CrashLoopBackOff after update the kibana resource configurations in clusterlogging instance.

EO logs:
$ oc logs -n openshift-operators-redhat elasticsearch-operator-5cff98d5d5-dgpbb 
{"level":"info","ts":1603161202.1153483,"logger":"cmd","msg":"Go Version: go1.13.4"}
{"level":"info","ts":1603161202.1153812,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1603161202.1153867,"logger":"cmd","msg":"Version of operator-sdk: v0.8.2"}
{"level":"info","ts":1603161202.1159978,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1603161202.4679542,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1603161202.4679945,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1603161202.858305,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1603161202.8615181,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"kibana-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.861915,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.862184,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"proxyconfig-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.8623393,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"kibanasecret-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161202.8626356,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"trustedcabundle-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1603161203.299226,"logger":"cmd","msg":"This operator no longer honors the image specified by the custom resources so that it is able to properly coordinate the configuration with the image."}
{"level":"info","ts":1603161203.2992716,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1603161204.5996928,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"trustedcabundle-controller"}
{"level":"info","ts":1603161204.5997734,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibana-controller"}
{"level":"info","ts":1603161204.5998037,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1603161204.599835,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"proxyconfig-controller"}
{"level":"info","ts":1603161204.5998628,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibanasecret-controller"}
{"level":"info","ts":1603161204.6999161,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"proxyconfig-controller","worker count":1}
{"level":"info","ts":1603161204.7001863,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"trustedcabundle-controller","worker count":1}
{"level":"info","ts":1603161204.700282,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibana-controller","worker count":1}
{"level":"info","ts":1603161204.7006946,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
{"level":"info","ts":1603161204.702338,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibanasecret-controller","worker count":1}
E1020 02:33:25.661691       1 runtime.go:69] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/panic.go:679
/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/map_faststr.go:204
/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:18
/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:78
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:414
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:303
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:64
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:63
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:290
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:120
/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:67
/go/src/github.com/openshift/elasticsearch-operator/pkg/controller/kibana/controller.go:69
/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215
/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/asm_amd64.s:1357
panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

goroutine 628 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105
panic(0x14be860, 0x18a0df0)
	/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/panic.go:679 +0x1b2
github.com/openshift/elasticsearch-operator/pkg/utils.CompareResources(0x0, 0x0, 0xc000412630, 0xc000412660, 0x54, 0x4f, 0x25fb)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:18 +0x1581
github.com/openshift/elasticsearch-operator/pkg/utils.AreResourcesDifferent(0x1681640, 0xc001bcfb00, 0x1681640, 0xc00243a900, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/utils/resources.go:78 +0x4b4
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.isDeploymentDifferent(0xc001bcfb00, 0xc00243a900, 0x6, 0x18c99e0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:414 +0x156
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.(*KibanaRequest).createOrUpdateKibanaDeployment.func1(0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:303 +0xcb
k8s.io/client-go/util/retry.RetryOnConflict.func1(0x28, 0xc002408a80, 0xd)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:64 +0x33
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0xc00257d468, 0x0, 0xc002408a60)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 +0xde
k8s.io/client-go/util/retry.RetryOnConflict(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0xc00257d6b0, 0x6, 0xc00055e274)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/client-go/util/retry/util.go:63 +0xa0
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.(*KibanaRequest).createOrUpdateKibanaDeployment(0xc00257dad8, 0xc0003941a0, 0x0, 0x23f2a28)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:290 +0x511
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.reconcileKibana(0xc0008c7080, 0x1905b60, 0xc0007a31d0, 0x192dea0, 0xc00092c480, 0xc0003941a0, 0x6, 0x18c8820)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:120 +0x276
github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana.Reconcile(0xc0005dff20, 0x11, 0xc00055e274, 0x6, 0x1905b60, 0xc0007a31d0, 0x192dea0, 0xc00092c480, 0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/kibana/reconciler.go:67 +0x171
github.com/openshift/elasticsearch-operator/pkg/controller/kibana.(*ReconcileKibana).Reconcile(0xc00052c400, 0xc0005dff20, 0x11, 0xc00055e274, 0x6, 0x5, 0x400, 0xc0002da000, 0xd)
	/go/src/github.com/openshift/elasticsearch-operator/pkg/controller/kibana/controller.go:69 +0x2a6
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002ee460, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 +0x20a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
	/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc005f2e780)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc005f2e780, 0x3b9aca00, 0x0, 0x1, 0xc0002bc120)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc005f2e780, 0x3b9aca00, 0xc0002bc120)
	/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x32e

$ oc get pod -n openshift-operators-redhat
NAME                                      READY   STATUS             RESTARTS   AGE
elasticsearch-operator-5cff98d5d5-dgpbb   0/1     CrashLoopBackOff   5          27m


Version-Release number of selected component (if applicable):
elasticsearch-operator.4.5.0-202010161522.p0

How reproducible:
Always

Steps to Reproduce:
1. deploy logging 4.5
2. create clusterlogging instance with:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 3h
      audit:
        maxAge: 2w
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: "SingleRedundancy"
      resources:
        requests:
          memory: "2Gi"
      storage:
        storageClassName: "standard"
        size: "20Gi"
  visualization:
    type: "kibana"
    kibana:
      resources: {}
      replicas: 1
  collection:
    logs:
      type: "fluentd"
      fluentd: {}
3. wait until all the EFK pods become Running, update the kibana resource configurations to:
    managementState: Managed
    visualization:
      kibana:
        proxy:
          resources:
            limits:
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 1Gi
        replicas: 1
        resources:
          limits:
            cpu: 1000m
            memory: 4Gi
          requests:
            cpu: 800m
            memory: 2Gi
      type: kibana
4. check the EO status

Actual results:


Expected results:


Additional info:
elasticsearch-operator.4.5.0-202010081312.p0(released version) also has this issue.

Comment 1 Qiaoling Tang 2020-10-20 03:12:19 UTC
elasticsearch-operator.4.6.0-202010140833.p0 has the same issue.

Comment 2 Qiaoling Tang 2020-10-20 03:30:37 UTC
> 1. deploy logging 4.5
> 2. create clusterlogging instance with:
> apiVersion: "logging.openshift.io/v1"
> kind: "ClusterLogging"
> metadata:
>   name: "instance"
>   namespace: "openshift-logging"
> spec:
>   managementState: "Managed"
>   logStore:
>     type: "elasticsearch"
>     retentionPolicy: 
>       application:
>         maxAge: 1d
>       infra:
>         maxAge: 3h
>       audit:
>         maxAge: 2w
>     elasticsearch:
>       nodeCount: 3
>       redundancyPolicy: "SingleRedundancy"
>       resources:
>         requests:
>           memory: "2Gi"
>       storage:
>         storageClassName: "standard"
>         size: "20Gi"
>   visualization:
>     type: "kibana"
>     kibana:
>       resources: {}
>       replicas: 1
>   collection:
>     logs:
>       type: "fluentd"
>       fluentd: {}

I found the issue only happens when there has `spec.visualization.kibana.resources: {}` in the clusterlogging instance, if create clusterlogging instance with the below yaml and do the same steps, no such issue.

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 3h
      audit:
        maxAge: 2w
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: "SingleRedundancy"
      resources:
        requests:
          memory: "2Gi"
      storage:
        storageClassName: "standard"
        size: "20Gi"
  visualization:
    type: "kibana"
    kibana:
      replicas: 1
  collection:
    logs:
      type: "fluentd"
      fluentd: {}


> 3. wait until all the EFK pods become Running, update the kibana resource
> configurations to:
>     managementState: Managed
>     visualization:
>       kibana:
>         proxy:
>           resources:
>             limits:
>               memory: 1Gi
>             requests:
>               cpu: 100m
>               memory: 1Gi
>         replicas: 1
>         resources:
>           limits:
>             cpu: 1000m
>             memory: 4Gi
>           requests:
>             cpu: 800m
>             memory: 2Gi
>       type: kibana
> 4. check the EO status

Comment 3 Hui Kang 2020-10-21 13:38:00 UTC
Doc Text: Previously, the operator fails when the current resource of kibana is resources: {}
Doc type: Bug fix

Comment 5 Qiaoling Tang 2020-10-27 08:18:34 UTC
Verified with quay.io/openshift/origin-elasticsearch-operator@sha256:1a1446fab00689c1e1eb256ad57be20ef0b2215236841564254862d888efd007


Note You need to log in before you can comment on or make changes to this bug.