Bug 1695244

Summary: Upgrade from 4.0.0-0.9 to 0.10: status is the cluster operator authentication has not yet successfully rolled out
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: apiserver-authAssignee: Mo <mkhan>
Status: CLOSED ERRATA QA Contact: Chuan Yu <chuyu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, bparees, jokerman, mmccomas, nagrawal, schoudha, vlaad, wking
Target Milestone: ---Keywords: BetaBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:46:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
clusterversion pod logs, clusterversion CR
none
authentication operator pod log none

Description Mike Fiedler 2019-04-02 17:35:40 UTC
Created attachment 1551100 [details]
clusterversion pod logs, clusterversion CR

Description of problem:

After upgrading 4.0.0-0.9 to 4.0.0-0.10.0 (test for beta3):  oc get clusterversion initially showed "Cluster version is 4.0.0-0.10" and Progressing=False.   When I checked it an hour+ later it showed 

                                                                                                                                                                                                                                                                                                                      
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS                                                                                                                                                                                                                                                                                                            
version   4.0.0-0.10   True        False         68m     Error while reconciling 4.0.0-0.10: the cluster operator authentication has not yet successfully rolled out

Checking after  82m showed the problem gone:

NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.10   True        False         82m     Cluster version is 4.0.0-0.10

At a minimum the Progressing = False flag is misleading.

How reproducible: 2/2 clusters so far

Steps to Reproduce:
1.  Install 4.0.0-0.9.0
2.  oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release:4.0.0-0.10
3.  Wait a while and check oc clusterversion

Attaching a tar with cluster-version pod logs plus the yaml of the clusterversion CR

Comment 1 Mike Fiedler 2019-04-02 17:38:09 UTC
Error is back a few minutes later.   Guess CVO is having problems reconciling a component.   Marking this as BetaBlocker until triaged.

NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.10   True        False         88m     Error while reconciling 4.0.0-0.10: the cluster operator authentication has not yet successfully rolled out

Comment 2 Mike Fiedler 2019-04-02 17:47:22 UTC
Created attachment 1551102 [details]
authentication operator pod log

Comment 3 Mike Fiedler 2019-04-02 18:16:39 UTC
    - lastTransitionTime: 2019-04-02T18:12:56Z
      message: 'Cluster operator authentication is still updating: upgrading integrated-oauth-server
        from 4.0.0-0.9_openshift to 4.0.0-0.10_openshift'
      reason: ClusterOperatorNotAvailable
      status: "True"
      type: Failing
    - lastTransitionTime: 2019-04-02T17:22:56Z
      message: 'Error while reconciling 4.0.0-0.10: the cluster operator authentication
        has not yet successfully rolled out'
      reason: ClusterOperatorNotAvailable
      status: "False"
      type: Progressing

Comment 4 Mike Fiedler 2019-04-02 18:35:47 UTC
version configmap contents from openshift-authentication-operator ns:

# oc get cm version-mapping -o yaml
apiVersion: v1
data:
  4.0.0-0.9: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:507bae2583cfb676344789fe5015bc1f7cf869bdd116102d316664166b7332ae
  4.0.0-0.9_openshift: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:482b6f2aff8b0c8910ac7bd412cb754502503489cad0f6e59f57ba683ccefa82
  4.0.0-0.10: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:507bae2583cfb676344789fe5015bc1f7cf869bdd116102d316664166b7332ae
  4.0.0-0.10_openshift: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:482b6f2aff8b0c8910ac7bd412cb754502503489cad0f6e59f57ba683ccefa82
kind: ConfigMap
metadata:
  creationTimestamp: 2019-04-02T15:25:46Z
  name: version-mapping
  namespace: openshift-authentication-operator
  resourceVersion: "25781"
  selfLink: /api/v1/namespaces/openshift-authentication-operator/configmaps/version-mapping
  uid: 964de681-555b-11e9-a1c0-06a47fee4532

Comment 5 Mike Fiedler 2019-04-02 19:50:50 UTC
Similar issue for openshift-apiserver:  https://bugzilla.redhat.com/show_bug.cgi?id=1695307

Comment 13 errata-xmlrpc 2019-06-04 10:46:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758