Bug 1730488 - oc adm upgrade gives error status when rolling back to original version.
Summary: oc adm upgrade gives error status when rolling back to original version.
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Evan Cordell
QA Contact: Jian Zhang
Depends On:
TreeView+ depends on / blocked
Reported: 2019-07-16 20:45 UTC by Matt Woodson
Modified: 2019-08-06 12:55 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-08-06 12:55:16 UTC
Target Upstream Version:

Attachments (Terms of Use)
oc -n openshift-cluster-version logs cluster-version-operator-857c4f9697-v964h > cvo.log (1.96 MB, text/plain)
2019-07-16 21:22 UTC, Matt Woodson
no flags Details

Description Matt Woodson 2019-07-16 20:45:36 UTC
Description of problem:

While trying to upgrade to a version of a version of Openshift that was not in the stable channel, the upgrade failed.  We then rolled back the version to the existing version, but are still getting Status messages that should not be shown.

More details:

The cluster version is on 4.1.4.  I tried to update the cluster to 4.1.5, but 4.1.5 is NOT in the stable channel.

After a certain amount of time, we realized this was the problem, and set the clusterversion back to 4.1.4:

 oc get clusterversion -o yaml 
apiVersion: v1
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
    creationTimestamp: "2019-06-18T19:30:20Z"
    generation: 5
    name: version
    resourceVersion: "10940533"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: 8264d77e-91ff-11e9-8dd2-02d60ebc27c8
    channel: stable-4.1
    clusterID: e0912a56-4d18-4d60-8fd1-d27aa1cc22bc
      force: false
      image: ""
      version: 4.1.4
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
    availableUpdates: null
    - lastTransitionTime: "2019-06-18T19:43:15Z"
      message: Done applying 4.1.4
      status: "True"
      type: Available
    - lastTransitionTime: "2019-07-16T20:00:24Z"
      message: Could not update deployment "openshift-operator-lifecycle-manager/catalog-operator"
        (254 of 350)
      reason: UpdatePayloadFailed
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-07-09T13:37:29Z"
      message: 'Error while reconciling 4.1.4: the update could not be applied'
      reason: UpdatePayloadFailed
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-18T21:03:06Z"
      status: "True"
      type: RetrievedUpdates
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:a6c177eb007d20bb00bfd8f829e99bd40137167480112bd5ae1c25e40a4a163a
      version: 4.1.4
    - completionTime: "2019-07-09T13:37:29Z"
      image: quay.io/openshift-release-dev/ocp-release@sha256:a6c177eb007d20bb00bfd8f829e99bd40137167480112bd5ae1c25e40a4a163a
      startedTime: "2019-07-09T13:08:07Z"
      state: Completed
      verified: true
      version: 4.1.4
    - completionTime: "2019-07-09T13:08:07Z"
      image: quay.io/openshift-release-dev/ocp-release@sha256:f852f9d8c2e81a633e874e57a7d9bdd52588002a9b32fc037dba12b67cf1f8b0
      startedTime: "2019-06-27T13:15:15Z"
      state: Completed
      verified: true
      version: 4.1.3
    - completionTime: "2019-06-27T13:15:15Z"
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      startedTime: "2019-06-18T19:31:06Z"
      state: Completed
      verified: false
      version: 4.1.2
    observedGeneration: 4
    versionHash: LA4R9XrQHfk=
kind: List
  resourceVersion: ""
  selfLink: ""

Then we run the "oc adm upgrade" and we see this output:
oc adm upgrade               
Error while reconciling 4.1.4: the update could not be applied

No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss.

I believe because the cluster version is set at 4.1.4, this error should go away. In looking at the status, where this message appears to come from, it shows this status as false (see the first output of code) 

Version-Release number of the following components:
openshift 4.1.4

$ oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0+1b0f680-199-dirty", GitCommit:"1b0f680", GitTreeState:"dirty", BuildDate:"2019-07-16T18:02:53Z", GoVersion:"go1.11.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+c62ce01", GitCommit:"c62ce01", GitTreeState:"clean", BuildDate:"2019-06-27T18:14:14Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"linux/amd64"}

Actual results:
The error described above "Error while reconciling 4.1.4: the update could not be applied"

Expected results:

After rolling the version back to 4.1.4, from 4.1.5 (non-existent), I would not expect to see this error.

Comment 1 Abhinav Dahiya 2019-07-16 21:18:37 UTC
The OLM deployment update is failing to reconcile to 4.1.4 specifications. And therefore the Failing status is correct.

Can you provide logs from your cluster-version-operator pod.
or `oc adm must-gather`

Comment 2 Matt Woodson 2019-07-16 21:22:19 UTC
Created attachment 1591208 [details]
oc -n openshift-cluster-version logs cluster-version-operator-857c4f9697-v964h > cvo.log

Comment 3 Abhinav Dahiya 2019-07-25 23:19:01 UTC
rollbacks are sync to previous version. SO the oc adm upgrade response is correct. moving to OLM to figure out why the deployments are failing.

Comment 4 Evan Cordell 2019-07-26 13:32:03 UTC
Can you provide logs for the pods in the ‘openshift-operator-lifecycle-manager’ namespace? They should be collected with must-gather as well.

Note You need to log in before you can comment on or make changes to this bug.