Description of problem: It appears that OLM validation of CRs against CRDs fails if the value of a string field is a blank string ie `""` See the below explanation. Version-Release number of selected component (if applicable): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23b1bcae3cfdc7afb1a57e1c8c27b99a24039d494977e7add5d902f311b53033 Issue Validate that the CRD schema is the same from master as on cluster (in case the CRD is stale) MachinePool CRD from master openshift/hive ``` curl -s https://raw.githubusercontent.com/openshift/hive/master/config/crds/hive.openshift.io_machinepools.yaml | docker run --rm -i quay.io/app-sre/yq:3.4.1 yq r - "spec.validation.openAPIV3Schema.properties.spec.properties.labels" additionalProperties: type: string description: Map of label string keys and values that will be applied to the created MachineSet's MachineSpec. This list will overwrite any modifications made to Node labels on an ongoing basis. type: object ``` MachinePool CRD in the bundle (version matches the CSV name - see installplan) ``` curl -s https://xxxxxxxx/saas-hive-operator-bundle/-/raw/staging/hive/0.1.2834-sha2e18329/hive.openshift.io_machinepools.yaml | docker run --rm -i quay.io/app-sre/yq:3.4.1 yq r - "spec.validation.openAPIV3Schema.properties.spec.properties.labels" additionalProperties: type: string description: Map of label string keys and values that will be applied to the created MachineSet's MachineSpec. This list will overwrite any modifications made to Node labels on an ongoing basis. type: object ``` MachinePool CRD on hive ``` $ oc get crd machinepools.hive.openshift.io -o json | jq '.spec.versions[].schema.openAPIV3Schema.properties.spec.properties.labels' { "additionalProperties": { "type": "string" }, "description": "Map of label string keys and values that will be applied to the created MachineSet's MachineSpec. This list will overwrite any modifications made to Node labels on an ongoing basis.", "type": "object" } ``` Look at the failed installplan: ``` oc get installplan install-p4hmh -n hive -o json | jq '.status.conditions[] | select(.reason=="InstallComponentFailed")' { "lastTransitionTime": "2021-05-03T16:21:42Z", "lastUpdateTime": "2021-05-03T16:21:42Z", "message": "error validating existing CRs against new CRD's schema: machinepools.hive.openshift.io: error validating custom resource against new schema &apiextensions.CustomResourceValidation{OpenAPIV3Schema:(*apiextensions.JSONSchemaProps)(0xc034ffbe00)}: [].spec.labels.node-role.kubernetes.io/infra: Invalid value: \"null\": spec.labels.node-role.kubernetes.io/infra in body must be of type string: \"null\"", "reason": "InstallComponentFailed", "status": "False", "type": "Installed" } ``` Installplan http://file.rdu.redhat.com/jaharrin/install-p4hmh.json MachinePool API .spec defines the label pool as follows ``` Labels map[string]string `json:"labels,omitempty"` ``` https://github.com/openshift/hive/blob/master/apis/hive/v1/machinepool_types.go#L47-L51 OSD clusters are created with two MachinePools one “worker” which have a `null` `.spec.labels` field since its not specified (should be legal given the above) and all other “infra” which have the the label `node-role.kubernetes.io/infra` defined as “” which should also be legal. Example: ``` oc get machinepools --all-namespaces -o json | jq '.items[] | select(.metadata.name|match("vkareh-sts")) | {"name": .metadata.name, "labels": .spec.labels}' | more { "name": "vkareh-sts-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "vkareh-sts-worker", "labels": null } ``` I am able to apply the machinepool with the blank string without hitting a validation issue in the API. ``` { "apiVersion": "hive.openshift.io/v1", "kind": "MachinePool", "metadata": { "labels": { "api.openshift.com/id": "1kep9mkvjg986je99q2uf3ffohbqvq7g" }, "name": "vkareh-sts-infra", "namespace": "uhc-vkareh-1kep9mkvjg986je99q2uf3ffohbqvq7g" }, "spec": { "clusterDeploymentRef": { "name": "vkareh-sts" }, "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" }, "name": "infra", "platform": { "aws": { "rootVolume": { "iops": 100, "size": 300, "type": "gp2" }, "type": "r5.xlarge", "zones": [ "us-east-1a" ] } }, "replicas": 2, "taints": [ { "effect": "NoSchedule", "key": "node-role.kubernetes.io/infra" } ] } } ``` Applying the above: ``` oc apply -f vkareh-machine-pools-infra.json machinepool.hive.openshift.io/vkareh-sts-infra configured ``` The error in the installplan condition “error validating existing CRs against new CRD's schema” suggests that the validation is failing https://github.com/operator-framework/operator-lifecycle-manager/blob/908aa611e789a12ce2194868bc12d76eed2eba3c/pkg/controller/operators/catalog/operator.go#L1601-L1603 and looking at the `ValidatingCustomResource` function https://github.com/operator-framework/operator-lifecycle-manager/blob/1a5375b2aca63e2bcad7a83c303d7d7fe5bb8edf/vendor/k8s.io/apiextensions-apiserver/pkg/apiserver/validation/validation.go#L47 its using the dynamic client to pass an unstructured object (interface) to the validation. Could that be omitting the blank string value during the serialization process? Or maybe its the kube validation. Given all the above this is my best guess at what the issue is. I wasn't able to determine from the OLM/Catalog pod logs/installplan which MachinePool CR the installplan was failing on (assuming it was the first). For reference all MachinePool CR's configured with the `.spec.labels[""] have the value set to a blank string ``` oc get machinepools --all-namespaces -o json | jq '.items[] | select(.spec.labels!=null) | {"name": .metadata.name, "labels": .spec.labels}' { "name": "osde2e-hwlrn-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "osde2e-m7z3g-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "mnairn-fake-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "priya-ut9-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "test-fake1-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "test-fake2-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "test-fake6-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "test-fake7-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "priya-stage-9-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "pri-test-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "priya-stage-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "tavni-aws94-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "vkareh-sts-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } { "name": "zgalor-gcp-infra", "labels": { "node-role.kubernetes.io": "infra", "node-role.kubernetes.io/infra": "" } } ``` Actual results: OLM installplan fails sighting CRD schema validation errors Expected results: OLM successfully validates existing CRs against the CRD schema Additional info: Catalog pod logs: http://file.rdu.redhat.com/jaharrin/catalog-operator-9d9d85bd9-7sn4j.log OLM pod logs: http://file.rdu.redhat.com/jaharrin/olm-operator-7d748dd9d9-gsjxm.log
Tests conducted on a fresh 4.8.0 cluster to try to demonstrate that OLM is rejecting a CRD because there's a CR using a map[string]string where one of the values is empty string (which is valid data), when kubernetes itself will allow that CR successfully. This can be tested easily with Hive without needing to provision clusters or do any major configuration. First subscribe to an old version of Hive: apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: hive-sub namespace: openshift-operators spec: channel: alpha name: hive-operator source: community-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual startingCSV: hive-operator.v1.0.19 kubectl -n openshift-operators patch installplan install-dc4fv -p '{"spec":{"approved":true}}' --type merge I now have my Hive CRDs installed: ❯ kg crd | grep machinepools machinepools.hive.openshift.io 2021-05-04T12:19:08Z We don't need any real data here, we can just create a MachinePool that isn't linked to a real cluster and nothing will happen in hive. This machine pool has a spec.label with an empty string value. apiVersion: hive.openshift.io/v1 kind: MachinePool metadata: creationTimestamp: null name: f1-worker namespace: default spec: labels: node-role.kubernetes.io: infra node-role.kubernetes.io/infra: "" clusterDeploymentRef: name: f1 name: worker platform: aws: rootVolume: iops: 100 size: 22 type: gp2 type: m4.xlarge replicas: 3 taints: - effect: NoSchedule key: node-role.kubernetes.io/infra ❯ k apply -f myconfig/machinepool-empty-label.yaml machinepool.hive.openshift.io/f1-worker created Everything worked fine. Now lets try to upgrade to a newer Hive bundle by approving the next installplan that was automatically created: ❯ kg installplan NAME CSV APPROVAL APPROVED install-d26wv hive-operator.v1.1.0 Manual false install-dc4fv hive-operator.v1.0.19 Manual true ❯ kg installplan install-d26wv -o yaml This shows: message: 'error validating existing CRs against new CRD''s schema: machinepools.hive.openshift.io: error validating custom resource against new schema &apiextensions.CustomResourceValidation{OpenAPIV3Schema:(*apiextensions.JSONSchemaProps)(0xc015cc9800)}: [].spec.labels.node-role.kubernetes.io/infra: Invalid value: "null": spec.labels.node-role.kubernetes.io/infra in body must be of type string: "null"' CSV now stuck: ❯ kg csv NAME DISPLAY VERSION REPLACES PHASE hive-operator.v1.0.19 Hive for Red Hat OpenShift 1.0.19 hive-operator.v1.0.18 Replacing hive-operator.v1.1.0 Hive for Red Hat OpenShift 1.1.0 hive-operator.v1.0.19 Pending Lets delete the machinepool and see if we can update the Hive OLM bundle: ❯ k delete machinepool -n default f1-worker The installplan didn't seem to want to try again after 10 minutes or so, so I deleted my subscription and CSVs to try again. Reapply subscription, approve old version, let it install, then approve latest version. (this time with no "bad" CR in etcd, yet) We now have the latest hive installed: ❯ kg csv NAME DISPLAY VERSION REPLACES PHASE hive-operator.v1.1.0 Hive for Red Hat OpenShift 1.1.0 hive-operator.v1.0.19 Succeeded Now lets apply our "bar" CR and see if Kube is ok with it: ❯ k apply -f myconfig/machinepool-empty-label.yaml machinepool.hive.openshift.io/f1-worker created I believe this indicates there is a problem with the validation OLM is doing where it's rejecting valid CRD updates, which Kube would not. We desperately could use a workaround here if possible.
The PR is against the upstream repository, we'd need it merged there and pulled downstream.
LGTM -- [root@preserve-olm-env 1956611]# oc get pod -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-6f7dcb85cb-s2ncl 1/1 Running 0 32m olm-operator-74cc8c4bdc-xdftq 1/1 Running 0 32m packageserver-d96c94dd5-4ptws 1/1 Running 0 30m packageserver-d96c94dd5-vtxf6 1/1 Running 0 30m [root@preserve-olm-env 1956611]# oc exec catalog-operator-6f7dcb85cb-s2ncl -n openshift-operator-lifecycle-manager -- olm --version OLM version: 0.17.0 git commit: 9498948b664cdc43ab11581b77bbf1d9e5264692 [root@preserve-olm-env 1956611]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-08-025039 True False 16m Cluster version is 4.8.0-0.nightly-2021-05-08-025039 [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# cat sub.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: hive-sub namespace: openshift-operators spec: channel: alpha name: hive-operator source: community-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual startingCSV: hive-operator.v1.0.19 [root@preserve-olm-env 1956611]# oc apply -f sub.yaml subscription.operators.coreos.com/hive-sub created [root@preserve-olm-env 1956611]# oc get ip -n openshift-operators NAME CSV APPROVAL APPROVED install-dxmm8 hive-operator.v1.0.19 Manual false [root@preserve-olm-env 1956611]# oc -n openshift-operators patch installplan install-dxmm8 -p '{"spec":{"approved":true}}' --type merge installplan.operators.coreos.com/install-dxmm8 patched [root@preserve-olm-env 1956611]# oc get ip -n openshift-operators NAME CSV APPROVAL APPROVED install-dxmm8 hive-operator.v1.0.19 Manual true install-zj7cb hive-operator.v1.1.0 Manual false [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# oc get csv -n openshift-operators NAME DISPLAY VERSION REPLACES PHASE hive-operator.v1.0.19 Hive for Red Hat OpenShift 1.0.19 hive-operator.v1.0.18 Succeeded [root@preserve-olm-env 1956611]# oc get crd | grep machinepools machinepools.hive.openshift.io 2021-05-08T09:51:16Z [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# cat cr.yaml apiVersion: hive.openshift.io/v1 kind: MachinePool metadata: creationTimestamp: null name: f1-worker namespace: default spec: labels: node-role.kubernetes.io: infra node-role.kubernetes.io/infra: "" clusterDeploymentRef: name: f1 name: worker platform: aws: rootVolume: iops: 100 size: 22 type: gp2 type: m4.xlarge replicas: 3 taints: - effect: NoSchedule key: node-role.kubernetes.io/infra [root@preserve-olm-env 1956611]# oc apply -f cr.yaml machinepool.hive.openshift.io/f1-worker created [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# oc get MachinePool NAME POOLNAME CLUSTERDEPLOYMENT REPLICAS f1-worker worker f1 3 [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# oc get MachinePool f1-worker -o yaml apiVersion: hive.openshift.io/v1 kind: MachinePool metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"hive.openshift.io/v1","kind":"MachinePool","metadata":{"annotations":{},"creationTimestamp":null,"name":"f1-worker","namespace":"default"},"spec":{"clusterDeploymentRef":{"name":"f1"},"labels":{"node-role.kubernetes.io":"infra","node-role.kubernetes.io/infra":""},"name":"worker","platform":{"aws":{"rootVolume":{"iops":100,"size":22,"type":"gp2"},"type":"m4.xlarge"}},"replicas":3,"taints":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/infra"}]}} creationTimestamp: "2021-05-08T09:53:20Z" generation: 1 name: f1-worker namespace: default resourceVersion: "38334" uid: f46ec58b-d92d-4cac-8ed2-dfd4cb4ad9b6 spec: clusterDeploymentRef: name: f1 labels: node-role.kubernetes.io: infra node-role.kubernetes.io/infra: "" name: worker platform: aws: rootVolume: iops: 100 size: 22 type: gp2 type: m4.xlarge replicas: 3 taints: - effect: NoSchedule key: node-role.kubernetes.io/infra [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# oc -n openshift-operators patch installplan install-zj7cb -p '{"spec":{"approved":true}}' --type merge installplan.operators.coreos.com/install-zj7cb patched [root@preserve-olm-env 1956611]# oc get ip -n openshift-operators NAME CSV APPROVAL APPROVED install-dxmm8 hive-operator.v1.0.19 Manual true install-zj7cb hive-operator.v1.1.0 Manual true [root@preserve-olm-env 1956611]# oc get ip install-zj7cb -n openshift-operators -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan metadata: creationTimestamp: "2021-05-08T09:51:17Z" generateName: install- generation: 2 labels: operators.coreos.com/hive-operator.openshift-operators: "" name: install-zj7cb namespace: openshift-operators ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: hive-sub uid: f0a03a20-dbe3-4027-8baf-ba808cb3bcc2 resourceVersion: "39128" uid: 3c7ee83e-8be8-4386-afdb-840f9f3d2288 spec: approval: Manual approved: true clusterServiceVersionNames: - hive-operator.v1.1.0 generation: 2 status: bundleLookups: - catalogSourceRef: name: community-operators namespace: openshift-marketplace identifier: hive-operator.v1.1.0 ... - resolving: hive-operator.v1.1.0 resource: group: rbac.authorization.k8s.io kind: ClusterRoleBinding manifest: '{"kind":"ConfigMap","name":"26e5fa87dc5e412cf0af5e2820356a92ecf83031f5b9b832abc110b5485cc62","namespace":"openshift-marketplace","catalogSourceName":"community-operators","catalogSourceNamespace":"openshift-marketplace","replaces":"hive-operator.v1.0.19","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"HiveConfig\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterClaim\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterState\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterImageSet\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"SelectorSyncSet\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"MachinePool\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterPool\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterDeprovision\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"SyncIdentityProvider\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"Checkpoint\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterProvision\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hiveinternal.openshift.io\",\"kind\":\"ClusterSyncLease\",\"version\":\"v1alpha1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"DNSZone\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterDeployment\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"SyncSet\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"ClusterRelocate\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"MachinePoolNameLease\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hiveinternal.openshift.io\",\"kind\":\"ClusterSync\",\"version\":\"v1alpha1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"hive.openshift.io\",\"kind\":\"SelectorSyncIdentityProvider\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"hive-operator\",\"version\":\"1.1.0\"}}]}"}' name: hive-operator.v1.1.0-687dbf574d sourceName: community-operators sourceNamespace: openshift-marketplace version: v1 status: Created startTime: "2021-05-08T09:55:03Z" [root@preserve-olm-env 1956611]# [root@preserve-olm-env 1956611]# oc get csv -n openshift-operators NAME DISPLAY VERSION REPLACES PHASE hive-operator.v1.1.0 Hive for Red Hat OpenShift 1.1.0 hive-operator.v1.0.19 Succeeded [root@preserve-olm-env 1956611]# --
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438