Bug 1662105

Summary: Elasticsearch-operator in "Error" status after executing "oc set env deployment/elasticsearch-clientdatamaster-0-1 REPLICA_SHARDS=1"
Product: OpenShift Container Platform Reporter: Qiaoling Tang <qitang>
Component: LoggingAssignee: Josef Karasek <jkarasek>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, ewolinet, jcantril, rmeggins
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:41:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qiaoling Tang 2018-12-26 08:27:00 UTC
Description of problem:
Deploy logging with 3 es nodes in the cluster, after all pods in running status,  execute `oc set env deployment/elasticsearch-clientdatamaster-0-1 REPLICA_SHARDS=1`, the check pod in openshift-logging namespace, the elasticsearch-operator pod is in "Error" status, and can't start anymore. The elasticsearch-operator pod logs show: 
panic: runtime error: index out of range [recovered]
	panic: runtime error: index out of range



$ oc set env deployment/elasticsearch-clientdatamaster-0-1 REPLICA_SHARDS=1
deployment.extensions/elasticsearch-clientdatamaster-0-1 updated
$ oc get pod
NAME                                                  READY     STATUS    RESTARTS   AGE
cluster-logging-operator-8866ff9c8-nq68f              1/1       Running   0          9m
elasticsearch-clientdatamaster-0-1-84d764899d-mn4zq   1/1       Running   0          7m
elasticsearch-clientdatamaster-0-2-56984bb76c-q2p25   1/1       Running   0          7m
elasticsearch-clientdatamaster-0-3-7cd67f75dd-j5fgc   1/1       Running   0          7m
elasticsearch-operator-86599f8849-ptstf               0/1       Error     1          9m

$ oc logs elasticsearch-operator-86599f8849-ptstf
time="2018-12-26T08:11:40Z" level=info msg="Go Version: go1.10.3"
time="2018-12-26T08:11:40Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-12-26T08:11:40Z" level=info msg="operator-sdk Version: 0.0.7"
time="2018-12-26T08:11:40Z" level=info msg="Metrics service elasticsearch-operator created"
time="2018-12-26T08:11:40Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, openshift-logging, 5000000000"
E1226 08:11:41.889292       1 runtime.go:66] Observed a panic: "index out of range" (runtime error: index out of range)
/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/panic.go:28
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/deployment.go:61
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/desirednodestate.go:312
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/cluster.go:169
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/cluster.go:50
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/stub/handler.go:68
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/stub/handler.go:29
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:88
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:52
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:36
/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:98
/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: runtime error: index out of range [recovered]
	panic: runtime error: index out of range

goroutine 179 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x10eee80, 0x1a075e0)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/openshift/elasticsearch-operator/pkg/k8shandler.(*deploymentNode).isDifferent(0xc42033ec00, 0xc420588b40, 0x0, 0x5, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/deployment.go:61 +0xd7d
github.com/openshift/elasticsearch-operator/pkg/k8shandler.(*desiredNodeState).IsUpdateNeeded(0xc420588b40, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/desirednodestate.go:312 +0x79
github.com/openshift/elasticsearch-operator/pkg/k8shandler.(*ClusterState).getRequiredAction(0xc420744f80, 0xc42068a000, 0x0, 0x1252867, 0xd, 0xc420735fc0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/cluster.go:169 +0x125
github.com/openshift/elasticsearch-operator/pkg/k8shandler.CreateOrUpdateElasticsearchCluster(0xc42068a000, 0x1252867, 0xd, 0x1252867, 0xd, 0x8, 0xc4203fcc30)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/k8shandler/cluster.go:50 +0x1cd
github.com/openshift/elasticsearch-operator/pkg/stub.Reconcile(0xc42068a000, 0xc4203fc480, 0xc42000e540)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/stub/handler.go:68 +0x447
github.com/openshift/elasticsearch-operator/pkg/stub.(*Handler).Handle(0x1a3b8f8, 0x1361820, 0xc4200ba018, 0x134cc80, 0xc42068a000, 0x42ac00, 0x0, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/openshift/elasticsearch-operator/pkg/stub/handler.go:29 +0x6d
github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).sync(0xc4201164d0, 0xc420660120, 0x1f, 0x10916e0, 0xc420374cf0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:88 +0x12d
github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).processNextItem(0xc4201164d0, 0xc420300900)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:52 +0xd2
github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).runWorker(0xc4201164d0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:36 +0x2b
github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).(github.com/operator-framework/operator-sdk/pkg/sdk.runWorker)-fm()
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:98 +0x2a
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc42026db50)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc42026db50, 0x3b9aca00, 0x0, 0x1, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
k8s.io/apimachinery/pkg/util/wait.Until(0xc42026db50, 0x3b9aca00, 0x0)
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).Run
	/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:98 +0x209



Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2018-12-25-225820   True        False         6h        Cluster version is 4.0.0-0.alpha-2018-12-25-225820

$ ./bin/openshift-install version
./bin/openshift-install v0.8.0-master-2-g5e7b36d6351c9cc773f1dadc64abf9d7041151b1

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging with 3 es nodes in the cluster
2. execute `oc set env deployment/elasticsearch-clientdatamaster-0-1 REPLICA_SHARDS=1`
3. get pod in openshift-logging namespace

Actual results:
elasticsearch-operator in "Error" status and can't become "Running" anymore.

Expected results:


Additional info:

Comment 1 Jeff Cantrill 2019-01-09 15:19:27 UTC
This is indeed a change we should gracefully handle, but the proper way to modify is via the CR as documented here: https://github.com/openshift/cluster-logging-operator/pull/64/files#diff-76e731333fb756df3bff5ddb3b731c46R82 . The CR to support this will be available after the merge of https://github.com/openshift/cluster-logging-operator/pull/70

Comment 3 Qiaoling Tang 2019-01-25 08:47:36 UTC
Verified in docker.io/openshift/origin-elasticsearch-operator@sha256:461d03c54f87ddc28846edbe78c8f3026a4508e3ab6c45c06be1ef9492f0be6e

Comment 6 errata-xmlrpc 2019-06-04 10:41:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758