Created attachment 1691930 [details] oc describe csv Description of problem: when upgrade from 4.4(4.4.0-202005221118) to 4.5(4.5.0-202005220507) for sriov operator. it was pending there oc get csv NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.4.0-202005221118 SR-IOV Network Operator 4.4.0-202005221118 Replacing sriov-network-operator.4.5.0-202005220507 SR-IOV Network Operator 4.5.0-202005220507 sriov-network-operator.4.4.0-202005221118 Pending see the attachment for `oc describe csv` Version-Release number of selected component (if applicable): 4.4 to 4.5 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
more details: there is some difference for 4.4 and 4.5 sriov namespaces scc see the change for 4.5 https://github.com/openshift/sriov-network-operator/commit/b2b549210cf242a884eb33cb6876bbcb9c4fc106 1. for 4.4 sriov operator namespace created with following yaml echo 'apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator labels: openshift.io/run-level: "1" --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subsription namespace: openshift-sriov-network-operator spec: channel: "4.4" name: sriov-network-operator source: qe-app-registry sourceNamespace: openshift-marketplace' | oc create -f - 2. when creating above, the 4.4 sriov operator can work well 3. update the `channel to 4.5` for upgrade 4. then check the csv #oc get csv NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.4.0-202005221118 SR-IOV Network Operator 4.4.0-202005221118 Replacing sriov-network-operator.4.5.0-202005220507 SR-IOV Network Operator 4.5.0-202005220507 sriov-network-operator.4.4.0-202005221118 Pending 5. please see the attachment for `oc describe csv sriov-network-operator.4.5.0-202005220507`
It failed at the SCC creating: ... Kind: PolicyRule Message: namespaced rule:{"verbs":["use"],"apiGroups":["security.openshift.io"],"resources":["securitycontextconstraints"],"resourceNames":["privileged"]} Status: NotSatisfied Version: v1 ... But, I can create it manually. mac:~ jianzhang$ oc create role sriov-plugin --verb=use --resource=securitycontextconstraints --resource-name=privileged -n openshift-sriov-network-operator role.rbac.authorization.k8s.io/sriov-plugin created mac:~ jianzhang$ oc get role NAME CREATED AT sriov-network-operator.4.4.0-202005221118-6k2mq 2020-05-25T09:53:19Z sriov-network-operator.4.4.0-202005221118-pkv5v 2020-05-25T09:53:18Z sriov-plugin 2020-05-26T07:46:27Z mac:~ jianzhang$ oc get role sriov-plugin -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: creationTimestamp: "2020-05-26T07:46:27Z" managedFields: - apiVersion: rbac.authorization.k8s.io/v1 fieldsType: FieldsV1 fieldsV1: f:rules: {} manager: oc operation: Update time: "2020-05-26T07:46:27Z" name: sriov-plugin namespace: openshift-sriov-network-operator resourceVersion: "692545" selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-sriov-network-operator/roles/sriov-plugin uid: 5faf3b24-7ba1-4235-9675-34c9af792895 rules: - apiGroups: - security.openshift.io resourceNames: - privileged resources: - securitycontextconstraints verbs: - use
@jian thank you for providing me with a cluster where the upgrade issue is present. I found the following condition in the installPlan Status associated with the 4.5 SRIOV Operator: ``` ... status: catalogSources: - qe-app-registry conditions: - lastTransitionTime: "2020-06-10T09:01:42Z" lastUpdateTime: "2020-06-10T09:01:42Z" message: 'error validating existing CRs agains new CRD''s schema: sriovnetworknodestates.sriovnetwork.openshift.io: error validating custom resource against new schema &apiextensions.CustomResourceValidation{OpenAPIV3Schema:(*apiextensions.JSONSchemaProps)(0xc0010f9e00)}: [].spec.interfaces.vfGroups.policyName: Required value' reason: InstallComponentFailed status: "False" type: Installed phase: Failed ... ``` Based on the presence of this condition, OLM is working as intended. This condition signals that the SRIOV opreator has added a required field to their CRD and CRs that exist on cluster that do not have the required field set. It is best practice to: * Introduce the field as an optional field and update the operator to set the field to some value that implements existing behavior. * In a future release of the operator (likely a different channel based on your release strategy), mark the field as required and make sure to update the API Version. In this case, the CR you created does not have the .spec.interfaces.vfGroups.PolicyName field which is required in the CRD shipped with SRIOV Operator 4.5. This happened because the SRIOV team rolled out a change to their API in a method that OLM does not support. OLM follows the guidelines suggested by the sig-architecture group [1]. As someone that installed the operator, possible workarounds include any one of the following steps: * Update the existing CRs to include the required field. * Delete the existing CRs that do not include the required field. OLM would then be able to perform the upgrade. I am going to mark this as `Not A Bug`, @Jian I suggest creating a new bug against the SRIOV operator. Ref: [1] https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#on-compatibility
To close the loop on this, I created a Doc PR [1] aginst OLM-Book which suggests reviewing [2] if changing the CRD Schema. Ref: [1] https://github.com/operator-framework/olm-book/pull/42 [2] https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#on-compatibility
Hi Alex, Many thanks for your information! Move this bug to the Networking team.
make this to 'verified' in order to this can be backport to 4.5.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196