Hide Forgot
Description of problem: After upgrade OCP 4.0, the cluster-storage-operator doesn't show the right change of its conditions including time and status Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-03-04-234414 True False 21m Cluster version is 4.0.0-0.nightly-2019-03-04-234414 How reproducible: Always Steps to Reproduce: 1. Install OCP 4.0 next-gen on AWS 2. Upgrade it to new version 3. Check the status of cluster-storage-operator Actual results: After upgrade the cluster-storage-operator does not show correctly: $ oc describe clusteroperator cluster-storage-operator Name: cluster-storage-operator Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-03-13T06:43:15Z Generation: 1 Resource Version: 9384 Self Link: /apis/config.openshift.io/v1/clusteroperators/cluster-storage-operator UID: 47858015-455b-11e9-b02f-02e5db6b6df6 Spec: Status: Conditions: Last Transition Time: 2019-03-13T06:43:15Z Status: False Type: Progressing Last Transition Time: 2019-03-13T06:43:15Z Status: True Type: Available Last Transition Time: 2019-03-13T06:43:15Z Status: False Type: Failing Extension: <nil> Related Objects: <nil> Versions: Name: operator Version: 0.0.1 Events: <none> But Actually the pods has updated along with the upgrade: $ oc get pods cluster-storage-operator-54d9899ccc-588bs -oyaml -n openshift-cluster-storage-operator apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "openshift-sdn", "ips": [ "10.128.0.70" ], "default": true, "dns": {} }] openshift.io/scc: restricted creationTimestamp: 2019-03-13T08:31:41Z generateName: cluster-storage-operator-54d9899ccc- labels: name: cluster-storage-operator pod-template-hash: 54d9899ccc name: cluster-storage-operator-54d9899ccc-588bs namespace: openshift-cluster-storage-operator ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: cluster-storage-operator-54d9899ccc uid: 6d70c2e8-456a-11e9-9148-06295a0e4212 resourceVersion: "79560" selfLink: /api/v1/namespaces/openshift-cluster-storage-operator/pods/cluster-storage-operator-54d9899ccc-588bs uid: 6d7584cb-456a-11e9-9148-06295a0e4212 spec: containers: - command: - cluster-storage-operator env: - name: WATCH_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: OPERATOR_NAME value: cluster-storage-operator image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c925fe57d348c88842b9bf12fa546e3ec101e6b080bcb3bf224750e7f373946c imagePullPolicy: IfNotPresent name: cluster-storage-operator ports: - containerPort: 60000 name: metrics protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID procMount: Default runAsUser: 1000340000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: cluster-storage-operator-token-jbvzv readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: cluster-storage-operator-dockercfg-p7898 nodeName: ip-172-31-128-77.us-east-2.compute.internal nodeSelector: node-role.kubernetes.io/master: "" priority: 2000000000 priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000340000 seLinuxOptions: level: s0:c18,c17 serviceAccount: cluster-storage-operator serviceAccountName: cluster-storage-operator terminationGracePeriodSeconds: 30 tolerations: - operator: Exists volumes: - name: cluster-storage-operator-token-jbvzv secret: defaultMode: 420 secretName: cluster-storage-operator-token-jbvzv status: conditions: - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:41Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:57Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:57Z status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:41Z status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://922e12ec2e38d980755c6a3c42a353657ee3c7a838e33f6ce1fecb0e190d01d9 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c925fe57d348c88842b9bf12fa546e3ec101e6b080bcb3bf224750e7f373946c imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c925fe57d348c88842b9bf12fa546e3ec101e6b080bcb3bf224750e7f373946c lastState: {} name: cluster-storage-operator ready: true restartCount: 0 state: running: startedAt: 2019-03-13T08:31:54Z hostIP: 172.31.128.77 phase: Running podIP: 10.128.0.70 qosClass: BestEffort startTime: 2019-03-13T08:31:41Z Expected results: clusteroperator's condition should also be updated Additional info: Related to bug#1686121
https://github.com/openshift/cluster-storage-operator/pull/16 should fix this, it's not in the installer yet (currently 0.14)
On second thought, moving this over to the installer component in case there's some work to be done there.
Putting this back on storage. This a bug in your code. Seems like it sits modified until QE tests a nightly.
I have tested with the upgrade between below two versions, still not changed after upgrade: Before upgrade: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-03-18-200009 True False 4h55m Cluster version is 4.0.0-0.nightly-2019-03-18-200009 $ oc get pods -oyaml -n openshift-cluster-storage-operator | grep image image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27e821eabac565c10d0f8833dc812a26d5803c5847b12f2d5c1951d4b257d96f $ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27e821eabac565c10d0f8833dc812a26d5803c5847b12f2d5c1951d4b257d96f | grep io.openshift.build.commit.id io.openshift.build.commit.id=4cdc1e782067eacd0eed79cc886b023868498194 Which above shows the image contains the fix already. $ oc get pods -oyaml -n openshift-cluster-storage-operator | grep conditions -A 25 conditions: - lastProbeTime: null lastTransitionTime: 2019-03-19T02:52:35Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2019-03-19T02:52:46Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2019-03-19T02:52:46Z status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: 2019-03-19T02:52:35Z status: "True" type: PodScheduled $ oc describe clusteroperator storage | grep conditions Conditions: Last Transition Time: 2019-03-19T02:52:47Z Status: False Type: Progressing Last Transition Time: 2019-03-19T02:52:47Z Status: True Type: Available Last Transition Time: 2019-03-19T02:52:47Z Status: False Type: Failing After upgrade: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-03-18-223058 True False 7m55s Cluster version is 4.0.0-0.nightly-2019-03-18-223058 $ oc get pods -oyaml -n openshift-cluster-storage-operator | grep conditions -A 25 conditions: - lastProbeTime: null lastTransitionTime: 2019-03-19T08:15:23Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2019-03-19T08:15:34Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2019-03-19T08:15:34Z status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: 2019-03-19T08:15:23Z status: "True" type: PodScheduled $ oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING FAILING SINCE service-catalog-apiserver 4.0.0-0.nightly-2019-03-18-223058 True False False 12m service-catalog-controller-manager 4.0.0-0.nightly-2019-03-18-223058 True False False 69s storage 4.0.0-0.nightly-2019-03-18-223058 True False False 5h35m $ oc describe clusteroperator storage | grep Conditions -A 15 Conditions: Last Transition Time: 2019-03-19T02:52:47Z Status: False Type: Progressing Last Transition Time: 2019-03-19T02:52:47Z Status: True Type: Available Last Transition Time: 2019-03-19T02:52:47Z Status: False Type: Failing
I don't understand, the clusteroperator has Available True and is reporting version 4.0.0-0.nightly-2019-03-18-223058. Is this not expected? What is the expected behaviour?
Is it because Last Transition Time hasn't changed? It is bit tricky because unlike other operators the storage operator's operand is just a StorageClass at the moment, the class is either created or not, there is no in-between state. But it might make sense to make the status Progressing for the x milliseconds during which the version does not match.
Hi Matthew, Yes, you are right, the bug reported because of the Last Transition Time has not changed, and when we `$ oc get clusteroperator` it still shows SINCE 5h35m ago while other operators shows just created after the upgrade. BTW, I am requested by manager to report this bug and it also related to the bug#1686121 which I mentioned in Description at the beginning. It said each operator owner should be responsible to each operator. Sorry let you confuse.
No problem, I did not read carefully. I've opened a PR: https://github.com/openshift/cluster-storage-operator/pull/22
Hi mawong, I tried to check this bug, but the last Transition Time still not update Before upgrade: oc get pods -oyaml -n openshift-cluster-storage-operator | grep conditions -A 25 conditions: - lastProbeTime: null lastTransitionTime: 2019-03-25T03:16:14Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2019-03-25T03:16:24Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2019-03-25T03:16:24Z status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: 2019-03-25T03:16:14Z status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://45039e6910f5a7f4a166852d9fdc390a07a8894ff38213e9559730cbaa280a7d image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dd43925dad987bc527cb625dc0316e1ddbf92c45c1fbb3198b344ecbb028f541 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dd43925dad987bc527cb625dc0316e1ddbf92c45c1fbb3198b344ecbb028f541 lastState: {} name: cluster-storage-operator ready: true restartCount: 0 state: After upgrade: oc describe clusteroperator storage Name: storage Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-03-25T03:16:24Z Generation: 1 Resource Version: 107915 Self Link: /apis/config.openshift.io/v1/clusteroperators/storage UID: 5ef48151-4eac-11e9-b15a-0227bde3ac64 Spec: Status: Conditions: Last Transition Time: 2019-03-25T03:16:24Z Status: False Type: Progressing Last Transition Time: 2019-03-25T03:16:24Z Status: True Type: Available Last Transition Time: 2019-03-25T03:16:24Z Status: False Type: Failing Extension: <nil> Related Objects: <nil> Versions: Name: operator Version: 4.0.0-0.nightly-2019-03-23-183709 Events: <none>
Please try on nightly 4.0.0-0.alpha-2019-03-24-171037 or later, it worked for me (I am using libvirt, I bump the release version & the clusterstorage goes Available(2019-03-25T16:42:43Z)->Progressing(2019-03-25T16:47:39Z)->Available(2019-03-25T16:47:39Z)) - lastTransitionTime: 2019-03-25T16:47:39Z message: Unsupported platform for storageclass creation status: "True" type: Available - lastTransitionTime: 2019-03-25T16:42:43Z status: "False" type: Failing - lastTransitionTime: 2019-03-25T16:47:39Z status: "False" type: Progressing
The change is present in 4.0.0-0.alpha-2019-03-22-235916 https://origin-release.svc.ci.openshift.org/releasestream/4.0.0-0.alpha/release/4.0.0-0.alpha-2019-03-25-172405?from=4.0.0-0.alpha-2019-03-22-235916, so it should work if the pre-upgrade version is at least 4.0.0-0.alpha-2019-03-22-235916
It is passed when upgrade from 4.0.0-0.nightly-2019-03-25-180911 to 4.0.0-0.nightly-2019-03-26-034754 oc describe clusteroperator storage Name: storage Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-03-26T06:01:36Z Generation: 1 Resource Version: 545806 Self Link: /apis/config.openshift.io/v1/clusteroperators/storage UID: 9d6d30bd-4f8c-11e9-8526-0ab0e92a085a Spec: Status: Conditions: Last Transition Time: 2019-03-27T04:05:40Z Status: True Type: Available Last Transition Time: 2019-03-26T06:01:36Z Status: False Type: Failing Last Transition Time: 2019-03-27T04:05:40Z Status: False Type: Progressing Extension: <nil> Related Objects: <nil> Versions: Name: operator Version: 4.0.0-0.nightly-2019-03-26-034754 Events: <none>
@chaoyang, Thanks to help me drive here!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758