Summary: | Samples-operator pod go to crash when set skippedimagestreams|skippedtemplates to invalid values | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | XiuJuan Wang <xiuwang> |
Component: | ImageStreams | Assignee: | Gabe Montero <gmontero> |
Status: | CLOSED ERRATA | QA Contact: | XiuJuan Wang <xiuwang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.1.0 | CC: | aos-bugs, bparees, jokerman, mmccomas, wzheng, xiuwang |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:41:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: |
Description
XiuJuan Wang
2018-12-06 08:41:22 UTC
Your yaml most likely is in the wrong format. Unfortunate that the sdk panics: /usr/local/go/src/runtime/panic.go:502 /go/src/github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/util/k8sutil/k8sutil.go:83 Looking at the latest version of the sdk, they no longer call panic on formatting errors like this. In fact, only their metrics package calls panic on its initialization trying to get its kube client from the in cluster config. So presumably @Ben this bug could serve as the motivation / argument for upgrading the sdk for this operator. In the interim, please provide an copy/paste of the yaml you used @XiuJuan and I'll help you correct the formatting to assist your testing. Once we get that sorted out, Ben and I should probably be able to get to a consensus on upgrading the sdk. Gabe, I know my setting invalid. As my expect result in the comment #0, this invalid setting should be rejected to save in samplesresource, otherwise this bug will happen. $ oc get samplesresources -o yaml apiVersion: v1 items: - apiVersion: samplesoperator.config.openshift.io/v1alpha1 kind: SamplesResource metadata: creationTimestamp: 2018-12-07T00:56:06Z finalizers: - samplesoperator.config.openshift.io/finalizer generation: 1 name: openshift-samples namespace: "" resourceVersion: "20398" selfLink: /apis/samplesoperator.config.openshift.io/v1alpha1/samplesresources/openshift-samples uid: e09b17b1-f9ba-11e8-a3ef-0ea829f86d10 spec: architectures: - x86_64 installType: centos managementState: Managed skippedTemplates: jenkins-persistent,jenkins-ephemeral status: architectures: - x86_64 conditions: - lastTransitionTime: 2018-12-07T00:56:26Z lastUpdateTime: 2018-12-07T00:56:26Z status: "True" type: SamplesExist - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: ImportCredentialsExists - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "True" type: ConfigurationValid - lastTransitionTime: 2018-12-07T00:58:07Z lastUpdateTime: 2018-12-07T00:58:07Z status: "False" type: ChangesInProgress - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: PendingRemove installType: centos managementState: Managed kind: List metadata: resourceVersion: "" selfLink: "" Samplesresource also should treat below senario correctly: should save repeat values for skippedTemplates as one. $ oc get samplesresources -o yaml apiVersion: v1 items: - apiVersion: samplesoperator.config.openshift.io/v1alpha1 kind: SamplesResource metadata: creationTimestamp: 2018-12-07T00:56:06Z finalizers: - samplesoperator.config.openshift.io/finalizer generation: 1 name: openshift-samples namespace: "" resourceVersion: "26876" selfLink: /apis/samplesoperator.config.openshift.io/v1alpha1/samplesresources/openshift-samples uid: e09b17b1-f9ba-11e8-a3ef-0ea829f86d10 spec: architectures: - x86_64 installType: centos managementState: Managed skippedTemplates: - jenkins-persistent - jenkins-persistent status: architectures: - x86_64 conditions: - lastTransitionTime: 2018-12-07T00:56:26Z lastUpdateTime: 2018-12-07T00:56:26Z status: "True" type: SamplesExist - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: ImportCredentialsExists - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "True" type: ConfigurationValid - lastTransitionTime: 2018-12-07T01:26:23Z lastUpdateTime: 2018-12-07T01:26:23Z status: "False" type: ChangesInProgress - lastTransitionTime: 2018-12-07T00:56:03Z lastUpdateTime: 2018-12-07T00:56:03Z status: "False" type: PendingRemove installType: centos managementState: Managed skippedTemplates: - jenkins-persistent - jenkins-persistent kind: List metadata: resourceVersion: "" selfLink: "" ah ... right, sorry, I missed what you said in you description XiuJuan about putting an invalid value on purpose. And absolutely, it should not crash the operator. As I mentioned to Ben, I think this is what pushes us to upgrading the sdk in the samples operator, to get the changes so it does not call panic when invalid yaml is provided. oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-9 True False 3h Cluster version is 4.0.0-9 The invalid value still could be saved, but operator pod doesn't crash now. Management State: Managed Skipped Imagestreams: jenkins,ruby OK with the changes in https://github.com/openshift/cluster-samples-operator/pull/71 (still under review with Ben), I've confirmed that the invalid yaml for the samples struct noted in the description does not cause the panic. That said, the expectation that the yaml won't be saved will not hold for 4.0. As long as the yaml is valid from at valid yaml perspective, the yaml will be saved. The entry `skippedTemplates: jenkins-persistent,jenkins-ephemeral` is valid yaml. It is not a string array as expected for the SamplesResource, but a yaml key with a string value of `jenkins-persistent,jenkins-ephemeral` You'll see logs like: ERROR: logging before flag.Parse: E0108 20:06:58.490247 1 streamwatcher.go:109] Unable to decode an event from the watch stream: unable to decode watch event: v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-pe|..., bigger context ...|","managementState":"Managed","skippedTemplates":"jenkins-persistent,jenkins-ephemeral","version":"4|... ERROR: logging before flag.Parse: W0108 20:06:58.490437 1 reflector.go:341] github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: watch of *v1alpha1.SamplesResource ended with: very short watch: github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: Unexpected watch close - watch lasted less than a second and no items received ERROR: logging before flag.Parse: E0108 20:06:59.498704 1 reflector.go:205] github.com/openshift/cluster-samples-operator/pkg/generated/informers/externalversions/factory.go:58: Failed to list *v1alpha1.SamplesResource: v1alpha1.SamplesResourceList.Items: []v1alpha1.SamplesResource: v1alpha1.SamplesResource.Spec: v1alpha1.SamplesResourceSpec.SkippedTemplates: []string: decode slice: expect [ or n, but found ", error found in #10 byte of ...|mplates":"jenkins-pe|..., bigger context ...|","managementState":"Managed","skippedTemplates":"jenkins-persistent,jenkins-ephemeral","version":"4|... with the reflector.go log repeating. There is initial discussions around custom validators getting provided for the 4.0 CRDs, but that most likely will be post 4.0. PR has merged ... look for the next installable level with the change Can't reproduce this bug registry.svc.ci.openshift.org/openshifr/origin-release:4.0.0-0.alpha-2019-01-11-075335 #oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-01-11-075335 True False 1h Cluster version is 4.0.0-0.alpha-2019-01-11-075335 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |