Bug 1688582
| Summary: | [Upgrade] cluster-storage-operator incorrectly reporting status and version after the upgrade | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Wenqi He <wehe> |
| Component: | Storage | Assignee: | Bradley Childs <bchilds> |
| Status: | CLOSED ERRATA | QA Contact: | Wenqi He <wehe> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.0 | CC: | aos-bugs, aos-storage-staff, bbennett, bchilds, chaoyang, eparis, jokerman, mmccomas, sponnaga, wsun |
| Target Milestone: | --- | Keywords: | BetaBlocker |
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:45:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
https://github.com/openshift/cluster-storage-operator/pull/16 should fix this, it's not in the installer yet (currently 0.14) On second thought, moving this over to the installer component in case there's some work to be done there. Putting this back on storage. This a bug in your code. Seems like it sits modified until QE tests a nightly. I have tested with the upgrade between below two versions, still not changed after upgrade:
Before upgrade:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.0.0-0.nightly-2019-03-18-200009 True False 4h55m Cluster version is 4.0.0-0.nightly-2019-03-18-200009
$ oc get pods -oyaml -n openshift-cluster-storage-operator | grep image
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27e821eabac565c10d0f8833dc812a26d5803c5847b12f2d5c1951d4b257d96f
$ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27e821eabac565c10d0f8833dc812a26d5803c5847b12f2d5c1951d4b257d96f | grep io.openshift.build.commit.id
io.openshift.build.commit.id=4cdc1e782067eacd0eed79cc886b023868498194
Which above shows the image contains the fix already.
$ oc get pods -oyaml -n openshift-cluster-storage-operator | grep conditions -A 25
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-03-19T02:52:35Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-03-19T02:52:46Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-03-19T02:52:46Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-03-19T02:52:35Z
status: "True"
type: PodScheduled
$ oc describe clusteroperator storage | grep conditions
Conditions:
Last Transition Time: 2019-03-19T02:52:47Z
Status: False
Type: Progressing
Last Transition Time: 2019-03-19T02:52:47Z
Status: True
Type: Available
Last Transition Time: 2019-03-19T02:52:47Z
Status: False
Type: Failing
After upgrade:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.0.0-0.nightly-2019-03-18-223058 True False 7m55s Cluster version is 4.0.0-0.nightly-2019-03-18-223058
$ oc get pods -oyaml -n openshift-cluster-storage-operator | grep conditions -A 25
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-03-19T08:15:23Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-03-19T08:15:34Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-03-19T08:15:34Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-03-19T08:15:23Z
status: "True"
type: PodScheduled
$ oc get clusteroperator
NAME VERSION AVAILABLE PROGRESSING FAILING SINCE
service-catalog-apiserver 4.0.0-0.nightly-2019-03-18-223058 True False False 12m
service-catalog-controller-manager 4.0.0-0.nightly-2019-03-18-223058 True False False 69s
storage 4.0.0-0.nightly-2019-03-18-223058 True False False 5h35m
$ oc describe clusteroperator storage | grep Conditions -A 15
Conditions:
Last Transition Time: 2019-03-19T02:52:47Z
Status: False
Type: Progressing
Last Transition Time: 2019-03-19T02:52:47Z
Status: True
Type: Available
Last Transition Time: 2019-03-19T02:52:47Z
Status: False
Type: Failing
I don't understand, the clusteroperator has Available True and is reporting version 4.0.0-0.nightly-2019-03-18-223058. Is this not expected? What is the expected behaviour? Is it because Last Transition Time hasn't changed? It is bit tricky because unlike other operators the storage operator's operand is just a StorageClass at the moment, the class is either created or not, there is no in-between state. But it might make sense to make the status Progressing for the x milliseconds during which the version does not match. Hi Matthew, Yes, you are right, the bug reported because of the Last Transition Time has not changed, and when we `$ oc get clusteroperator` it still shows SINCE 5h35m ago while other operators shows just created after the upgrade. BTW, I am requested by manager to report this bug and it also related to the bug#1686121 which I mentioned in Description at the beginning. It said each operator owner should be responsible to each operator. Sorry let you confuse. No problem, I did not read carefully. I've opened a PR: https://github.com/openshift/cluster-storage-operator/pull/22 Hi mawong,
I tried to check this bug, but the last Transition Time still not update
Before upgrade:
oc get pods -oyaml -n openshift-cluster-storage-operator | grep conditions -A 25
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-03-25T03:16:14Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-03-25T03:16:24Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-03-25T03:16:24Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-03-25T03:16:14Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: cri-o://45039e6910f5a7f4a166852d9fdc390a07a8894ff38213e9559730cbaa280a7d
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dd43925dad987bc527cb625dc0316e1ddbf92c45c1fbb3198b344ecbb028f541
imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dd43925dad987bc527cb625dc0316e1ddbf92c45c1fbb3198b344ecbb028f541
lastState: {}
name: cluster-storage-operator
ready: true
restartCount: 0
state:
After upgrade:
oc describe clusteroperator storage
Name: storage
Namespace:
Labels: <none>
Annotations: <none>
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2019-03-25T03:16:24Z
Generation: 1
Resource Version: 107915
Self Link: /apis/config.openshift.io/v1/clusteroperators/storage
UID: 5ef48151-4eac-11e9-b15a-0227bde3ac64
Spec:
Status:
Conditions:
Last Transition Time: 2019-03-25T03:16:24Z
Status: False
Type: Progressing
Last Transition Time: 2019-03-25T03:16:24Z
Status: True
Type: Available
Last Transition Time: 2019-03-25T03:16:24Z
Status: False
Type: Failing
Extension: <nil>
Related Objects: <nil>
Versions:
Name: operator
Version: 4.0.0-0.nightly-2019-03-23-183709
Events: <none>
Please try on nightly 4.0.0-0.alpha-2019-03-24-171037 or later, it worked for me (I am using libvirt, I bump the release version & the clusterstorage goes Available(2019-03-25T16:42:43Z)->Progressing(2019-03-25T16:47:39Z)->Available(2019-03-25T16:47:39Z))
- lastTransitionTime: 2019-03-25T16:47:39Z
message: Unsupported platform for storageclass creation
status: "True"
type: Available
- lastTransitionTime: 2019-03-25T16:42:43Z
status: "False"
type: Failing
- lastTransitionTime: 2019-03-25T16:47:39Z
status: "False"
type: Progressing
The change is present in 4.0.0-0.alpha-2019-03-22-235916 https://origin-release.svc.ci.openshift.org/releasestream/4.0.0-0.alpha/release/4.0.0-0.alpha-2019-03-25-172405?from=4.0.0-0.alpha-2019-03-22-235916, so it should work if the pre-upgrade version is at least 4.0.0-0.alpha-2019-03-22-235916 It is passed when upgrade from 4.0.0-0.nightly-2019-03-25-180911 to 4.0.0-0.nightly-2019-03-26-034754
oc describe clusteroperator storage
Name: storage
Namespace:
Labels: <none>
Annotations: <none>
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2019-03-26T06:01:36Z
Generation: 1
Resource Version: 545806
Self Link: /apis/config.openshift.io/v1/clusteroperators/storage
UID: 9d6d30bd-4f8c-11e9-8526-0ab0e92a085a
Spec:
Status:
Conditions:
Last Transition Time: 2019-03-27T04:05:40Z
Status: True
Type: Available
Last Transition Time: 2019-03-26T06:01:36Z
Status: False
Type: Failing
Last Transition Time: 2019-03-27T04:05:40Z
Status: False
Type: Progressing
Extension: <nil>
Related Objects: <nil>
Versions:
Name: operator
Version: 4.0.0-0.nightly-2019-03-26-034754
Events: <none>
@chaoyang, Thanks to help me drive here! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
Description of problem: After upgrade OCP 4.0, the cluster-storage-operator doesn't show the right change of its conditions including time and status Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-03-04-234414 True False 21m Cluster version is 4.0.0-0.nightly-2019-03-04-234414 How reproducible: Always Steps to Reproduce: 1. Install OCP 4.0 next-gen on AWS 2. Upgrade it to new version 3. Check the status of cluster-storage-operator Actual results: After upgrade the cluster-storage-operator does not show correctly: $ oc describe clusteroperator cluster-storage-operator Name: cluster-storage-operator Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-03-13T06:43:15Z Generation: 1 Resource Version: 9384 Self Link: /apis/config.openshift.io/v1/clusteroperators/cluster-storage-operator UID: 47858015-455b-11e9-b02f-02e5db6b6df6 Spec: Status: Conditions: Last Transition Time: 2019-03-13T06:43:15Z Status: False Type: Progressing Last Transition Time: 2019-03-13T06:43:15Z Status: True Type: Available Last Transition Time: 2019-03-13T06:43:15Z Status: False Type: Failing Extension: <nil> Related Objects: <nil> Versions: Name: operator Version: 0.0.1 Events: <none> But Actually the pods has updated along with the upgrade: $ oc get pods cluster-storage-operator-54d9899ccc-588bs -oyaml -n openshift-cluster-storage-operator apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "openshift-sdn", "ips": [ "10.128.0.70" ], "default": true, "dns": {} }] openshift.io/scc: restricted creationTimestamp: 2019-03-13T08:31:41Z generateName: cluster-storage-operator-54d9899ccc- labels: name: cluster-storage-operator pod-template-hash: 54d9899ccc name: cluster-storage-operator-54d9899ccc-588bs namespace: openshift-cluster-storage-operator ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: cluster-storage-operator-54d9899ccc uid: 6d70c2e8-456a-11e9-9148-06295a0e4212 resourceVersion: "79560" selfLink: /api/v1/namespaces/openshift-cluster-storage-operator/pods/cluster-storage-operator-54d9899ccc-588bs uid: 6d7584cb-456a-11e9-9148-06295a0e4212 spec: containers: - command: - cluster-storage-operator env: - name: WATCH_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: OPERATOR_NAME value: cluster-storage-operator image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c925fe57d348c88842b9bf12fa546e3ec101e6b080bcb3bf224750e7f373946c imagePullPolicy: IfNotPresent name: cluster-storage-operator ports: - containerPort: 60000 name: metrics protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID procMount: Default runAsUser: 1000340000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: cluster-storage-operator-token-jbvzv readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: cluster-storage-operator-dockercfg-p7898 nodeName: ip-172-31-128-77.us-east-2.compute.internal nodeSelector: node-role.kubernetes.io/master: "" priority: 2000000000 priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000340000 seLinuxOptions: level: s0:c18,c17 serviceAccount: cluster-storage-operator serviceAccountName: cluster-storage-operator terminationGracePeriodSeconds: 30 tolerations: - operator: Exists volumes: - name: cluster-storage-operator-token-jbvzv secret: defaultMode: 420 secretName: cluster-storage-operator-token-jbvzv status: conditions: - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:41Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:57Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:57Z status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: 2019-03-13T08:31:41Z status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://922e12ec2e38d980755c6a3c42a353657ee3c7a838e33f6ce1fecb0e190d01d9 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c925fe57d348c88842b9bf12fa546e3ec101e6b080bcb3bf224750e7f373946c imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c925fe57d348c88842b9bf12fa546e3ec101e6b080bcb3bf224750e7f373946c lastState: {} name: cluster-storage-operator ready: true restartCount: 0 state: running: startedAt: 2019-03-13T08:31:54Z hostIP: 172.31.128.77 phase: Running podIP: 10.128.0.70 qosClass: BestEffort startTime: 2019-03-13T08:31:41Z Expected results: clusteroperator's condition should also be updated Additional info: Related to bug#1686121