Bug 1656740
| Summary: | Samples-operator pod go to crash when set skippedimagestreams|skippedtemplates to invalid values | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | XiuJuan Wang <xiuwang> |
| Component: | ImageStreams | Assignee: | Gabe Montero <gmontero> |
| Status: | CLOSED ERRATA | QA Contact: | XiuJuan Wang <xiuwang> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.1.0 | CC: | aos-bugs, bparees, jokerman, mmccomas, wzheng, xiuwang |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:41:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Your yaml most likely is in the wrong format. Unfortunate that the sdk panics: /usr/local/go/src/runtime/panic.go:502 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil/k8sutil.go:83 Looking at the latest version of the sdk, they no longer call panic on formatting errors like this. In fact, only their metrics package calls panic on its initialization trying to get its kube client from the in cluster config. So presumably @Ben this bug could serve as the motivation / argument for upgrading the sdk for this operator. In the interim, please provide an copy/paste of the yaml you used @XiuJuan and I'll help you correct the formatting to assist your testing. Once we get that sorted out, Ben and I should probably be able to get to a consensus on upgrading the sdk. Gabe, I know my setting invalid. As my expect result in the comment #0, this invalid setting should be rejected to save in samplesresource, otherwise this bug will happen. $ oc get samplesresources -o yaml apiVersion: v1 items: - apiVersion: samplesoperator.config.openshift.io/v1alpha1 kind: SamplesResource metadata: creationTimestamp: 2018-12-07T00:56:06Z finalizers: - samplesoperator.config.openshift.io/finalizer generation: 1 name: openshift-samples namespace: "" resourceVersion: "20398" selfLink: /apis/samplesoperator.config.openshift.io/v1alpha1/samplesresources/openshift-samples uid: e09b17b1-f9ba-11e8-a3ef-0ea829f86d10 spec: architectures: - x86_64 installType: centos managementState: Managed skippedTemplates: jenkins-persistent,jenkins-ephemeral status: architectures: - x86_64 conditions: - lastTransitionTime: 2018-12-07T00:56:26Z lastUpdateTime: 2018-12-07T00:56:26Z status: "True" type: SamplesExist - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: ImportCredentialsExists - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "True" type: ConfigurationValid - lastTransitionTime: 2018-12-07T00:58:07Z lastUpdateTime: 2018-12-07T00:58:07Z status: "False" type: ChangesInProgress - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: PendingRemove installType: centos managementState: Managed kind: List metadata: resourceVersion: "" selfLink: "" Samplesresource also should treat below senario correctly:
should save repeat values for skippedTemplates as one.
$ oc get samplesresources -o yaml
apiVersion: v1
items:
- apiVersion: samplesoperator.config.openshift.io/v1alpha1
kind: SamplesResource
metadata:
creationTimestamp: 2018-12-07T00:56:06Z
finalizers:
- samplesoperator.config.openshift.io/finalizer
generation: 1
name: openshift-samples
namespace: ""
resourceVersion: "26876"
selfLink: /apis/samplesoperator.config.openshift.io/v1alpha1/samplesresources/openshift-samples
uid: e09b17b1-f9ba-11e8-a3ef-0ea829f86d10
spec:
architectures:
- x86_64
installType: centos
managementState: Managed
skippedTemplates:
- jenkins-persistent
- jenkins-persistent
status:
architectures:
- x86_64
conditions:
- lastTransitionTime: 2018-12-07T00:56:26Z
lastUpdateTime: 2018-12-07T00:56:26Z
status: "True"
type: SamplesExist
- lastTransitionTime: 2018-12-07T00:56:03Z
lastUpdateTime: 2018-12-07T00:56:03Z
status: "False"
type: ImportCredentialsExists
- lastTransitionTime: 2018-12-07T00:56:03Z
lastUpdateTime: 2018-12-07T00:56:03Z
status: "True"
type: ConfigurationValid
- lastTransitionTime: 2018-12-07T01:26:23Z
lastUpdateTime: 2018-12-07T01:26:23Z
status: "False"
type: ChangesInProgress
- lastTransitionTime: 2018-12-07T00:56:03Z
lastUpdateTime: 2018-12-07T00:56:03Z
status: "False"
type: PendingRemove
installType: centos
managementState: Managed
skippedTemplates:
- jenkins-persistent
- jenkins-persistent
kind: List
metadata:
resourceVersion: ""
selfLink: ""
ah ... right, sorry, I missed what you said in you description XiuJuan about putting an invalid value on purpose. And absolutely, it should not crash the operator. As I mentioned to Ben, I think this is what pushes us to upgrading the sdk in the samples operator, to get the changes so it does not call panic when invalid yaml is provided. oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.0.0-9 True False 3h Cluster version is 4.0.0-9
The invalid value still could be saved, but operator pod doesn't crash now.
Management State: Managed
Skipped Imagestreams:
jenkins,ruby
OK with the changes in https://github.com/openshift/cluster-samples-operator/pull/71 (still under review with Ben), I've confirmed that the invalid yaml for the samples struct noted in the description does not cause the panic. That said, the expectation that the yaml won't be saved will not hold for 4.0. As long as the yaml is valid from at valid yaml perspective, the yaml will be saved. The entry `skippedTemplates: jenkins-persistent,jenkins-ephemeral` is valid yaml. It is not a string array as expected for the SamplesResource, but a yaml key with a string value of `jenkins-persistent,jenkins-ephemeral` You'll see logs like: ERROR: logging before flag.Parse: E0108 20:06:58.490247 1 streamwatcher.go:109] Unable to decode an event from the watch stream: unable to decode watch event: v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-pe|..., bigger context ...|","managementState":"Managed","skippedTemplates":"jenkins-persistent,jenkins-ephemeral","version":"4|... ERROR: logging before flag.Parse: W0108 20:06:58.490437 1 reflector.go:341] github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: watch of *v1alpha1.SamplesResource ended with: very short watch: github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: Unexpected watch close - watch lasted less than a second and no items received ERROR: logging before flag.Parse: E0108 20:06:59.498704 1 reflector.go:205] github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: Failed to list *v1alpha1.SamplesResource: v1alpha1.SamplesResourceList.Items: []v1alpha1.SamplesResource: v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-pe|..., bigger context ...|","managementState":"Managed","skippedTemplates":"jenkins-persistent,jenkins-ephemeral","version":"4|... with the reflector.go log repeating. There is initial discussions around custom validators getting provided for the 4.0 CRDs, but that most likely will be post 4.0. PR has merged ... look for the next installable level with the change Can't reproduce this bug registry.svc.ci.openshift.org/openshifr/origin-release:4.0.0-0.alpha-2019-01-11-075335 #oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-01-11-075335 True False 1h Cluster version is 4.0.0-0.alpha-2019-01-11-075335 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
Description of problem: The valid value of skippedimagestreams|skippedtemplates should be in an array. Then samples-operator pod go to crash when set skippedimagestreams|skippedtemplate to invalid value(such as jenkins,ruby|jenkins-ephemeral,jenkins-persistent) Version-Release number of selected component (if applicable): $ oc get clusterversion version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2018-12-06-005835 True False 5h Cluster version is 4.0.0-0.alpha-2018-12-06-005835 How reproducible: always Steps to Reproduce: 1.Set skippedTemplates: jenkins-ephemeral,jenkins-persistent in samplesresource 2.Check samples operator pod 3. Actual results: $ oc get pods NAME READY STATUS RESTARTS AGE cluster-samples-operator-859558f8bd-q42n6 0/1 CrashLoopBackOff 6 11m $ oc logs -f cluster-samples-operator-859558f8bd-q42n6 time="2018-12-06T08:04:35Z" level=info msg="Go Version: go1.10.3" time="2018-12-06T08:04:35Z" level=info msg="Go OS/Arch: linux/amd64" time="2018-12-06T08:04:35Z" level=info msg="operator-sdk Version: 0.0.5+git" time="2018-12-06T08:04:35Z" level=info msg="Metrics service cluster-samples-operator created" time="2018-12-06T08:04:35Z" level=info msg="Watching samplesoperator.config.openshift.io/v1alpha1, SamplesResource, openshift-cluster-samples-operator, 600" time="2018-12-06T08:04:35Z" level=info msg="Watching secrets" time="2018-12-06T08:04:35Z" level=info msg="template client &v1.TemplateV1Client{restClient:(*rest.RESTClient)(0xc42035d980)}" time="2018-12-06T08:04:35Z" level=info msg="image client &v1.ImageV1Client{restClient:(*rest.RESTClient)(0xc42035da40)}" time="2018-12-06T08:04:35Z" level=info msg="creating default SamplesResource" time="2018-12-06T08:04:38Z" level=info msg="got already exists error on create default" ERROR: logging before flag.Parse: E1206 08:04:39.013860 1 runtime.go:66] Observed a panic: &errors.errorString{s:"failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found \", error found in #10 byte of ...|mplates\":\"jenkins-ep|..., bigger context ...|egistry\":\"registry.redhat.io\",\"skippedTemplates\":\"jenkins-ephemeral,jenkins-persistent\"},\"status\":{\"|..."} (failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-ep|..., bigger context ...|egistry":"registry.redhat.io","skippedTemplates":"jenkins-ephemeral,jenkins-persistent"},"status":{"|...) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /usr/local/go/src/runtime/asm_amd64.s:573 /usr/local/go/src/runtime/panic.go:502 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil/k8sutil.go:83 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:77 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:52 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:36 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:92 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 /usr/local/go/src/runtime/asm_amd64.s:2361 panic: failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-ep|..., bigger context ...|egistry":"registry.redhat.io","skippedTemplates":"jenkins-ephemeral,jenkins-persistent"},"status":{"|... [recovered] panic: failed to decode json data with gvk(samplesoperator.config.openshift.io/v1alpha1, Kind=SamplesResource): v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-ep|..., bigger context ...|egistry":"registry.redhat.io","skippedTemplates":"jenkins-ephemeral,jenkins-persistent"},"status":{"|... goroutine 158 [running]: github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107 panic(0x1092380, 0xc42077f060) /usr/local/go/src/runtime/panic.go:502 +0x229 github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil.RuntimeObjectFromUnstructured(0xc42000e0f8, 0xc42000e0f8, 0x11) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil/k8sutil.go:83 +0x3c9 github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).sync(0xc4200d8240, 0xc4200dbde0, 0x11, 0x1053460, 0xc4202f1e20) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:77 +0xcc github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).processNextItem(0xc4200d8240, 0xc4200b2700) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:52 +0xd2 github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).runWorker(0xc4200d8240) /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer-sync.go:36 +0x2b github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).(github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.runWorker)-fm() /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:92 +0x2a github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc4202f0a20) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54 github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc4202f0a20, 0x3b9aca00, 0x0, 0x8b00000001, 0x0) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc4202f0a20, 0x3b9aca00, 0x0) /go/src/github.com/openshift/cluster-samples-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d created by github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.(*informer).Run /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:92 +0x209 Expected results: Should reject to save the invalid value for samplesresource. Additional info: