Bug 1689139

Summary: OLM upgrade failed via the OTA
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: OLMAssignee: Evan Cordell <ecordell>
Status: CLOSED ERRATA QA Contact: Jian Zhang <jiazha>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: chezhang, dyan, jfan, sponnaga, zitang
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:45:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jian Zhang 2019-03-15 09:24:25 UTC
Description of problem:
OLM upgrade failed.

Version-Release number of selected component (if applicable):
The cluster version:
Before upgrade:
4.0.0-0.nightly-2019-03-13-233958
After upgrade:
4.0.0-0.nightly-2019-03-14-040908 

How reproducible:
always

Steps to Reproduce:
1. Install OCP 4.0 with payload: 4.0.0-0.nightly-2019-03-13-233958
2. Record the OLM clusteroperator CR:
[jzhang@dhcp-140-18 upgrade15]$ oc get clusteroperator operator-lifecycle-manager -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: 2019-03-15T02:38:20Z
  generation: 1
  name: operator-lifecycle-manager
  resourceVersion: "1654"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/operator-lifecycle-manager
  uid: 653821e6-46cb-11e9-acf2-02ca74d1ac00
spec: {}
status:
  conditions:
  - lastTransitionTime: 2019-03-15T02:38:20Z
    message: Done deploying 0.8.1.
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-03-15T02:38:20Z
    message: Done deploying 0.8.1.
    status: "False"
    type: Failing
  - lastTransitionTime: 2019-03-15T02:38:20Z
    message: Done deploying 0.8.1.
    status: "True"
    type: Available
  extension: null
  relatedObjects: null
  versions:
  - name: operator
    version: 0.8.1-31e16a9

3. Check the available new versions:
[jzhang@dhcp-140-18 upgrade15]$ oc adm upgrade
Cluster version is 4.0.0-0.nightly-2019-03-13-233958

Updates:

VERSION                           IMAGE
4.0.0-0.nightly-2019-03-14-040908 registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908
4.0.0-0.nightly-2019-03-14-135819 registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-135819
4.0.0-0.nightly-2019-03-14-040908 registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908
4.0.0-0.nightly-2019-03-14-135819 registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-135819

4, Upgrade to the new version:
[jzhang@dhcp-140-18 upgrade15]$ oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908 
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908

5, Check the OLM clusteroperator CR.

Actual results:
The OLM wasn't be upgraded, still, use the old image.
[jzhang@dhcp-140-18 upgrade15]$ oc get clusteroperator operator-lifecycle-manager -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: 2019-03-15T02:38:20Z
  generation: 1
  name: operator-lifecycle-manager
  resourceVersion: "1654"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/operator-lifecycle-manager
  uid: 653821e6-46cb-11e9-acf2-02ca74d1ac00
spec: {}
status:
  conditions:
  - lastTransitionTime: 2019-03-15T02:38:20Z
    message: Done deploying 0.8.1.
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-03-15T02:38:20Z
    message: Done deploying 0.8.1.
    status: "False"
    type: Failing
  - lastTransitionTime: 2019-03-15T02:38:20Z
    message: Done deploying 0.8.1.
    status: "True"
    type: Available
  extension: null
  relatedObjects: null
  versions:
  - name: operator
    version: 0.8.1-31e16a9

[jzhang@dhcp-140-18 upgrade15]$ oc get clusteroperator
NAME                                  VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
cluster-autoscaler                    4.0.0-0.nightly-2019-03-14-040908   True        False         False     34m
console                               4.0.0-0.nightly-2019-03-14-040908   True        False         False     33m
dns                                   4.0.0-0.nightly-2019-03-14-040908   True        False         False     6h31m
image-registry                        4.0.0-0.nightly-2019-03-14-040908   True        False         False     34m
ingress                               4.0.0-0.nightly-2019-03-14-040908   True        False         False     6h25m
kube-apiserver                        4.0.0-0.nightly-2019-03-14-040908   True        False         False     40m
kube-controller-manager               4.0.0-0.nightly-2019-03-14-040908   True        False         False     39m
kube-scheduler                        4.0.0-0.nightly-2019-03-14-040908   True        False         False     40m
machine-api                           4.0.0-0.nightly-2019-03-14-040908   True        False         False     6h31m
machine-config                        4.0.0-0.nightly-2019-03-14-040908   True        False         False     6h30m
marketplace-operator                  4.0.0-0.nightly-2019-03-14-040908   True        False         False     34m
monitoring                            4.0.0-0.nightly-2019-03-14-040908   True        False         False     27m
network                               4.0.0-0.nightly-2019-03-14-040908   True        False         False     6h26m
node-tuning                           4.0.0-0.nightly-2019-03-14-040908   True        False         False     34m
openshift-apiserver                   4.0.0-0.nightly-2019-03-14-040908   True        False         False     35m
openshift-authentication                                                  True        False         False     101m
openshift-cloud-credential-operator                                       True        False         False     6h30m
openshift-controller-manager          4.0.0-0.nightly-2019-03-14-040908   True        False         False     34m
openshift-samples                     4.0.0-0.nightly-2019-03-14-040908   True        False         False     33m
operator-lifecycle-manager            0.8.1-31e16a9                       True        False         False     6h31m
service-ca                                                                True        False         False     40m
service-catalog-apiserver             4.0.0-0.nightly-2019-03-14-040908   True        False         False     28m
service-catalog-controller-manager    4.0.0-0.nightly-2019-03-14-040908   True        False         False     33m
storage                               4.0.0-0.nightly-2019-03-14-040908   True        False         False     6h26m


Expected results:
The OLM should be upgraded successfully and use the new image from the newer payload.

Additional info:
[jzhang@dhcp-140-18 upgrade15]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-14-040908   True        False         10m     Cluster version is 4.0.0-0.nightly-2019-03-14-040908
...
  history:
  - completionTime: 2019-03-15T08:36:39Z
    image: registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908
    startedTime: 2019-03-15T08:26:38Z
    state: Completed
    version: 4.0.0-0.nightly-2019-03-14-040908
  - completionTime: 2019-03-15T08:26:38Z
    image: registry.svc.ci.openshift.org/ocp/release@sha256:128bf3c22c7f7fdc3747e481031022cc995d7282f7c53bc6676cc7e91931c73c
    startedTime: 2019-03-15T02:37:05Z
    state: Completed
    version: 4.0.0-0.nightly-2019-03-13-233958

Comment 2 Jian Zhang 2019-03-19 09:15:47 UTC
Before upgrade:
Cluster version is 4.0.0-0.nightly-2019-03-18-200009
OLM version commit: io.openshift.build.commit.id=69457423c2da01da0110b17fac1ac48b994b99e8
[jzhang@dhcp-140-18 ocp119]$ oc exec catalog-operator-657f5ddf79-nqmwg -- olm --version
OLM version: 0.8.1
git commit: e528ffb

After upgrade:
Cluster version is 4.0.0-0.nightly-2019-03-18-223058
OLM version commit: io.openshift.build.commit.id=5159b0a1c0dfe2cb76eb706afb4e3cc2ac4447fd
[jzhang@dhcp-140-18 ocp119]$ oc exec catalog-operator-657f5ddf79-nqmwg -- olm --version
OLM version: 0.8.1
git commit: e528ffb

[jzhang@dhcp-140-18 ocp119]$ oc get clusterversion -o yaml|grep history -A 9
    history:
    - completionTime: 2019-03-19T08:52:01Z
      image: registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-18-223058
      startedTime: 2019-03-19T08:29:56Z
      state: Completed
      version: 4.0.0-0.nightly-2019-03-18-223058
    - completionTime: 2019-03-19T08:29:56Z
      image: registry.svc.ci.openshift.org/ocp/release@sha256:e3f2bff3e7a40f7ca0777ada2ad89197a5ab6d7296d3bd12a28dc5aa6b4311dc
      startedTime: 2019-03-19T07:18:49Z
      state: Completed


LGTM, but there is still a little issue, the "since" time is incorrect. As below:

[jzhang@dhcp-140-18 ocp119]$ oc get clusteroperator
NAME                                  VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
authentication                        4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
cluster-autoscaler                    4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
console                               4.0.0-0.nightly-2019-03-18-223058   True        False         False     21m
dns                                   4.0.0-0.nightly-2019-03-18-223058   True        False         False     111m
image-registry                        4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
ingress                               4.0.0-0.nightly-2019-03-18-223058   True        False         False     104m
kube-apiserver                        4.0.0-0.nightly-2019-03-18-223058   True        False         False     25m
kube-controller-manager               4.0.0-0.nightly-2019-03-18-223058   True        False         False     38m
kube-scheduler                        4.0.0-0.nightly-2019-03-18-223058   True        False         False     31m
machine-api                           4.0.0-0.nightly-2019-03-18-223058   True        False         False     112m
machine-config                        4.0.0-0.nightly-2019-03-18-223058   True        False         False     24m
marketplace-operator                  4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
monitoring                            4.0.0-0.nightly-2019-03-18-223058   True        False         False     23m
network                               4.0.0-0.nightly-2019-03-18-223058   True        False         False     112m
node-tuning                           4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
openshift-apiserver                   4.0.0-0.nightly-2019-03-18-223058   True        False         False     23m
openshift-cloud-credential-operator   4.0.0-0.nightly-2019-03-18-223058   True        False         False     112m
openshift-controller-manager          4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
openshift-samples                     4.0.0-0.nightly-2019-03-18-223058   True        False         False     21m
operator-lifecycle-manager            4.0.0-0.nightly-2019-03-18-223058   True        False         False     112m
service-ca                                                                True        False         False     27m
service-catalog-apiserver             4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
service-catalog-controller-manager    4.0.0-0.nightly-2019-03-18-223058   True        False         False     22m
storage                               4.0.0-0.nightly-2019-03-18-223058   True        False         False     105m

Comment 3 Jian Zhang 2019-03-19 09:24:09 UTC
> LGTM, but there is still a little issue, the "since" time is incorrect. As below:

For the above issue, I think it as the same as bug 1688611, we can trace it in there.
For this issue, LGTM since the OLM images/versions can be updated successfully. Verify it.

Comment 5 errata-xmlrpc 2019-06-04 10:45:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758