Hide Forgot
Description of problem: The valid value of skippedimagestreams|skippedtemplates should be in an array. Then samples-operator pod go to crash when set skippedimagestreams|skippedtemplate to invalid value(such as jenkins,ruby|jenkins-ephemeral,jenkins-persistent) Version-Release number of selected component (if applicable): $ oc get clusterversion version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2018-12-06-005835 True False 5h Cluster version is 4.0.0-0.alpha-2018-12-06-005835 How reproducible: always Steps to Reproduce: 1.Set skippedTemplates: jenkins-ephemeral,jenkins-persistent in samplesresource 2.Check samples operator pod 3. Actual results: $ oc get pods NAME READY STATUS RESTARTS AGE cluster-samples-operator-859558f8bd-q42n6 0/1 CrashLoopBackOff 6 11m $ oc logs -f cluster-samples-operator-859558f8bd-q42n6 time="2018-12-06T08:04:35Z" level=info msg="Go Version: go1.10.3" time="2018-12-06T08:04:35Z" level=info msg="Go OS/Arch: linux/amd64" time="2018-12-06T08:04:35Z" level=info msg="operator-sdk Version: 0.0.5+git" time="2018-12-06T08:04:35Z" level=info msg="Metrics service cluster-samples-operator created" time="2018-12-06T08:04:35Z" level=info msg="Watching samplesoperator.config.openshift.io/v1alpha1, SamplesResource, openshift-cluster-samples-operator, 600" time="2018-12-06T08:04:35Z" level=info msg="Watching secrets" time="2018-12-06T08:04:35Z" level=info msg="template client &v1.TemplateV1Client{restClient:(*rest.RESTClient)(0xc42035d980)}" time="2018-12-06T08:04:35Z" level=info msg="image client &v1.ImageV1Client{restClient:(*rest.RESTClient)(0xc42035da40)}" time="2018-12-06T08:04:35Z" level=info msg="creating default SamplesResource" time="2018-12-06T08:04:38Z" level=info msg="got already exists error on create default" ERROR: logging before flag.Parse: E1206 08:04:39.013860 1 runtime.go:66] Observed a panic: &errors.errorString{s:"failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found \", error found in #10 byte of ...|mplates\":\"jenkins-ep|..., bigger context ...|egistry\":\"registry.redhat.io\",\"skippedTemplates\":\"jenkins-ephemeral,jenkins-persistent\"},\"status\":{\"|..."} (failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-ep|..., bigger context ...|egistry":"registry.redhat.io","skippedTemplates":"jenkins-ephemeral,jenkins-persistent"},"status":{"|...) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /usr/local/go/src/runtime/asm_amd64.s:573 /usr/local/go/src/runtime/panic.go:502 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil/k8sutil.go:83 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:77 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:52 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:36 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:92 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 /usr/local/go/src/runtime/asm_amd64.s:2361 panic: failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-ep|..., bigger context ...|egistry":"registry.redhat.io","skippedTemplates":"jenkins-ephemeral,jenkins-persistent"},"status":{"|... [recovered] panic: failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-ep|..., bigger context ...|egistry":"registry.redhat.io","skippedTemplates":"jenkins-ephemeral,jenkins-persistent"},"status":{"|... goroutine 158 [running]: github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107 panic(0x1092380, 0xc42077f060) /usr/local/go/src/runtime/panic.go:502 +0x229 github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil.RuntimeObjectFromUnstructured(0xc42000e0f8, 0xc42000e0f8, 0x11) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil/k8sutil.go:83 +0x3c9 github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).sync(0xc4200d8240, 0xc4200dbde0, 0x11, 0x1053460, 0xc4202f1e20) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:77 +0xcc github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).processNextItem(0xc4200d8240, 0xc4200b2700) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:52 +0xd2 github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).runWorker(0xc4200d8240) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:36 +0x2b github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).(github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.runWorker)-fm() /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:92 +0x2a github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc4202f0a20) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54 github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc4202f0a20, 0x3b9aca00, 0x0, 0x8b00000001, 0x0) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc4202f0a20, 0x3b9aca00, 0x0) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d created by github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).Run /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:92 +0x209 Expected results: Should reject to save the invalid value for samplesresource. Additional info:
Your yaml most likely is in the wrong format. Unfortunate that the sdk panics: /usr/local/go/src/runtime/panic.go:502 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil/k8sutil.go:83 Looking at the latest version of the sdk, they no longer call panic on formatting errors like this. In fact, only their metrics package calls panic on its initialization trying to get its kube client from the in cluster config. So presumably @Ben this bug could serve as the motivation / argument for upgrading the sdk for this operator. In the interim, please provide an copy/paste of the yaml you used @XiuJuan and I'll help you correct the formatting to assist your testing. Once we get that sorted out, Ben and I should probably be able to get to a consensus on upgrading the sdk.
Gabe, I know my setting invalid. As my expect result in the comment #0, this invalid setting should be rejected to save in samplesresource, otherwise this bug will happen. $ oc get samplesresources -o yaml apiVersion: v1 items: - apiVersion: samplesoperator.config.openshift.io/v1alpha1 kind: SamplesResource metadata: creationTimestamp: 2018-12-07T00:56:06Z finalizers: - samplesoperator.config.openshift.io/finalizer generation: 1 name: openshift-samples namespace: "" resourceVersion: "20398" selfLink: /apis/samplesoperator.config.openshift.io/v1alpha1/samplesresources/openshift-samples uid: e09b17b1-f9ba-11e8-a3ef-0ea829f86d10 spec: architectures: - x86_64 installType: centos managementState: Managed skippedTemplates: jenkins-persistent,jenkins-ephemeral status: architectures: - x86_64 conditions: - lastTransitionTime: 2018-12-07T00:56:26Z lastUpdateTime: 2018-12-07T00:56:26Z status: "True" type: SamplesExist - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: ImportCredentialsExists - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "True" type: ConfigurationValid - lastTransitionTime: 2018-12-07T00:58:07Z lastUpdateTime: 2018-12-07T00:58:07Z status: "False" type: ChangesInProgress - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: PendingRemove installType: centos managementState: Managed kind: List metadata: resourceVersion: "" selfLink: ""
Samplesresource also should treat below senario correctly: should save repeat values for skippedTemplates as one. $ oc get samplesresources -o yaml apiVersion: v1 items: - apiVersion: samplesoperator.config.openshift.io/v1alpha1 kind: SamplesResource metadata: creationTimestamp: 2018-12-07T00:56:06Z finalizers: - samplesoperator.config.openshift.io/finalizer generation: 1 name: openshift-samples namespace: "" resourceVersion: "26876" selfLink: /apis/samplesoperator.config.openshift.io/v1alpha1/samplesresources/openshift-samples uid: e09b17b1-f9ba-11e8-a3ef-0ea829f86d10 spec: architectures: - x86_64 installType: centos managementState: Managed skippedTemplates: - jenkins-persistent - jenkins-persistent status: architectures: - x86_64 conditions: - lastTransitionTime: 2018-12-07T00:56:26Z lastUpdateTime: 2018-12-07T00:56:26Z status: "True" type: SamplesExist - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: ImportCredentialsExists - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "True" type: ConfigurationValid - lastTransitionTime: 2018-12-07T01:26:23Z lastUpdateTime: 2018-12-07T01:26:23Z status: "False" type: ChangesInProgress - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: PendingRemove installType: centos managementState: Managed skippedTemplates: - jenkins-persistent - jenkins-persistent kind: List metadata: resourceVersion: "" selfLink: ""
ah ... right, sorry, I missed what you said in you description XiuJuan about putting an invalid value on purpose. And absolutely, it should not crash the operator. As I mentioned to Ben, I think this is what pushes us to upgrading the sdk in the samples operator, to get the changes so it does not call panic when invalid yaml is provided.
oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-9 True False 3h Cluster version is 4.0.0-9 The invalid value still could be saved, but operator pod doesn't crash now. Management State: Managed Skipped Imagestreams: jenkins,ruby
OK with the changes in https://github.com/openshift/cluster-samples-operator/pull/71 (still under review with Ben), I've confirmed that the invalid yaml for the samples struct noted in the description does not cause the panic. That said, the expectation that the yaml won't be saved will not hold for 4.0. As long as the yaml is valid from at valid yaml perspective, the yaml will be saved. The entry `skippedTemplates: jenkins-persistent,jenkins-ephemeral` is valid yaml. It is not a string array as expected for the SamplesResource, but a yaml key with a string value of `jenkins-persistent,jenkins-ephemeral` You'll see logs like: ERROR: logging before flag.Parse: E0108 20:06:58.490247 1 streamwatcher.go:109] Unable to decode an event from the watch stream: unable to decode watch event: v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-pe|..., bigger context ...|","managementState":"Managed","skippedTemplates":"jenkins-persistent,jenkins-ephemeral","version":"4|... ERROR: logging before flag.Parse: W0108 20:06:58.490437 1 reflector.go:341] github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: watch of *v1alpha1.SamplesResource ended with: very short watch: github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: Unexpected watch close - watch lasted less than a second and no items received ERROR: logging before flag.Parse: E0108 20:06:59.498704 1 reflector.go:205] github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: Failed to list *v1alpha1.SamplesResource: v1alpha1.SamplesResourceList.Items: []v1alpha1.SamplesResource: v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-pe|..., bigger context ...|","managementState":"Managed","skippedTemplates":"jenkins-persistent,jenkins-ephemeral","version":"4|... with the reflector.go log repeating. There is initial discussions around custom validators getting provided for the 4.0 CRDs, but that most likely will be post 4.0.
PR has merged ... look for the next installable level with the change
Can't reproduce this bug registry.svc.ci.openshift.org/openshifr/origin-release:4.0.0-0.alpha-2019-01-11-075335 #oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-01-11-075335 True False 1h Cluster version is 4.0.0-0.alpha-2019-01-11-075335
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758