Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1940844

Summary: Authentication operator is degraded during 4.5 to 4.6
Product: OpenShift Container Platform Reporter: pmali
Component: NetworkingAssignee: Jacob Tanenbaum <jtanenba>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: aconstan, anbhat, aos-bugs, astoycos, jtanenba, kewang, mfojtik, slaznick, trozet, vpickard
Version: 4.6Keywords: Reopened, Upgrades
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-11 14:17:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description pmali 2021-03-19 10:39:18 UTC
Description of problem:
Authentication operator is degraded during upgrade 4.5 to 4.6 

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2021-03-13-204449

How reproducible:




Steps to Reproduce:
1. Upgrade from 4.5.35 to 4.6.0-0.nightly-2021-03-13-204449
2.
3.

Actual results:
authentication operator is in degraded state.


Expected results:
Upgrade should be successful.

Additional info:

 Spec:
16:46:21  Status:
16:46:21    Conditions:
16:46:21      Last Transition Time:  2021-03-16T08:59:08Z
16:46:21      Message:               APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver (crashlooping container is waiting in apiserver-5948d7cfd9-2kdfx pod)
16:46:21      Reason:                APIServerDeployment_UnavailablePod
16:46:21      Status:                True
16:46:21      Type:                  Degraded
16:46:21      Last Transition Time:  2021-03-16T08:57:28Z
16:46:21      Reason:                AsExpected
16:46:21      Status:                False
16:46:21      Type:                  Progressing
16:46:21      Last Transition Time:  2021-03-16T08:57:28Z
16:46:21      Message:               OAuthServerDeploymentAvailable: availableReplicas==2
16:46:21      Reason:                AsExpected
16:46:21      Status:                True
16:46:21      Type:                  Available
16:46:21      Last Transition Time:  2021-03-16T06:45:26Z
16:46:21      Reason:                AsExpected
16:46:21      Status:                True
16:46:21      Type:                  Upgradeable

Comment 3 Standa Laznicka 2021-03-19 11:31:15 UTC
Does the authentication operator stay in the degraded state?

Comment 4 pmali 2021-03-19 16:24:35 UTC
Yes, the authentication operator stays in the degraded state.

Comment 5 Standa Laznicka 2021-03-22 09:27:10 UTC
Interesting, looks like only a single of three oauth-apiserver pods is failing to connect to the kube-apiserver:

2021-03-16T11:11:23.06087827Z Error: unable to load configmap based request-header-client-ca-file: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Then again, they should each appear on a different node. Moving to networking to have a look.

Comment 6 Tim Rozet 2021-04-05 15:16:47 UTC
Please set blocker status

Comment 9 Standa Laznicka 2021-04-30 08:26:15 UTC
I somehow failed to notice the needinfo notification on me.

> I am guessing there is a retry mechanism in place for the oauth-apiserver container to try loading the config map again?

No, there is none in this specific case, but the pod restarted 12 times so I assume whatever needed to be created for it should have been created and if it hasn't, that would be a bug.

Comment 11 Ke Wang 2021-11-09 04:02:46 UTC
Reproduced the bug with below upgrade profile, so reopened the bug,
OCP IPI install on AWS,
Upgrade Path: 4.1.41-x86_64--> 4.2.36-x86_64 -> 4.3.40-x86_64 -> 4.4.33-x86_64 -> 4.5.41-x86_64-> 4.6.49-x86_64

Upgrade was stuck at phase that the cluster operator authentication is degraded
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.41    True        True          3h1m    Unable to apply 4.6.49: the cluster operator authentication is degraded

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
11-06 11:47:08.751  ip-10-0-132-253.us-east-2.compute.internal   Ready    master   9h    v1.18.3+d8ef5ad   10.0.132.253   <none>        Red Hat Enterprise Linux CoreOS 45.82.202106211530-0 (Ootpa)   4.18.0-193.56.1.el8_2.x86_64   cri-o://1.18.4-11.rhaos4.5.gitfa57051.el8
11-06 11:47:08.751  ip-10-0-139-165.us-east-2.compute.internal   Ready    worker   9h    v1.18.3+d8ef5ad   10.0.139.165   <none>        Red Hat Enterprise Linux CoreOS 45.82.202106211530-0 (Ootpa)   4.18.0-193.56.1.el8_2.x86_64   cri-o://1.18.4-11.rhaos4.5.gitfa57051.el8
11-06 11:47:08.751  ip-10-0-148-214.us-east-2.compute.internal   Ready    master   9h    v1.18.3+d8ef5ad   10.0.148.214   <none>        Red Hat Enterprise Linux CoreOS 45.82.202106211530-0 (Ootpa)   4.18.0-193.56.1.el8_2.x86_64   cri-o://1.18.4-11.rhaos4.5.gitfa57051.el8
11-06 11:47:08.751  ip-10-0-154-164.us-east-2.compute.internal   Ready    worker   9h    v1.18.3+d8ef5ad   10.0.154.164   <none>        Red Hat Enterprise Linux CoreOS 45.82.202106211530-0 (Ootpa)   4.18.0-193.56.1.el8_2.x86_64   cri-o://1.18.4-11.rhaos4.5.gitfa57051.el8
11-06 11:47:08.751  ip-10-0-161-60.us-east-2.compute.internal    Ready    worker   9h    v1.18.3+d8ef5ad   10.0.161.60    <none>        Red Hat Enterprise Linux CoreOS 45.82.202106211530-0 (Ootpa)   4.18.0-193.56.1.el8_2.x86_64   cri-o://1.18.4-11.rhaos4.5.gitfa57051.el8
11-06 11:47:08.751  ip-10-0-171-220.us-east-2.compute.internal   Ready    master   9h    v1.18.3+d8ef5ad   10.0.171.220   <none>        Red Hat Enterprise Linux CoreOS 45.82.202106211530-0 (Ootpa)   4.18.0-193.56.1.el8_2.x86_64   cri-o://1.18.4-11.rhaos4.5.gitfa57051.el8

$ oc get co    
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE                                                                                               
authentication                             4.6.49    True        False         True       145m    
cloud-credential                           4.6.49    True        False         False      9h    
cluster-autoscaler                         4.6.49    True        False         False      9h       
config-operator                            4.6.49    True        False         False      3h55m    
console                                    4.6.49    True        False         False      144m    
csi-snapshot-controller                    4.6.49    True        False         False      146m    
dns                                        4.5.41    True        False         False      9h       
etcd                                       4.6.49    True        False         False      5h12m    
image-registry                             4.6.49    True        False         False      3h17m    
ingress                                    4.6.49    True        False         False      157m     
insights                                   4.6.49    True        False         False      6h49m    
kube-apiserver                             4.6.49    True        False         False      9h      
kube-controller-manager                    4.6.49    True        False         False      5h9m    
kube-scheduler                             4.6.49    True        False         False      9h       
kube-storage-version-migrator              4.6.49    True        False         False      3h17m    
machine-api                                4.6.49    True        False         False      9h       
machine-approver                           4.6.49    True        False         False      3h40m    
machine-config                             4.5.41    True        False         False      4h24m    
marketplace                                4.6.49    True        False         False      147m    
monitoring                                 4.6.49    True        False         False      144m    
network                                    4.6.49    True        False         False      9h      
node-tuning                                4.6.49    True        False         False      157m     
openshift-apiserver                        4.6.49    True        False         False      4h38m    
openshift-controller-manager               4.6.49    True        False         False      155m    
openshift-samples                          4.6.49    True        False         False      156m    
operator-lifecycle-manager                 4.6.49    True        False         False      9h    
operator-lifecycle-manager-catalog         4.6.49    True        False         False      9h      
operator-lifecycle-manager-packageserver   4.6.49    True        False         False      146m    
service-ca                                 4.6.49    True        False         False      9h       
service-catalog-apiserver                  4.4.33    True        False         False      3h15m    
service-catalog-controller-manager         4.4.33    True        False         False      5h12m    
storage                                    4.6.49    True        False         False      147m 

$ oc get co/authentication
Name:         authentication
Namespace:˽˽˽˽                                                                                                                                                                                
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-11-05T18:01:09Z
  Generation:          1
  Resource Version:    396460
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/authentication
  UID:                 5b0a6128-3e62-11ec-b967-025dae34a298
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-11-06T01:23:03Z
    Message:               APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver (crashlooping container is waiting in apiserver-754bcf9c6d-8cvms pod)
    Reason:                APIServerDeployment_UnavailablePod
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-11-06T01:21:06Z
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2021-11-06T01:21:32Z
    Message:               OAuthServerDeploymentAvailable: availableReplicas==2
    Reason:                AsExpected
    Status:                True
    Type:                  Available
    Last Transition Time:  2021-11-05T18:01:09Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
...


Checked the apiserver-754bcf9c6d-8cvms pod logs from must-gather,
2021-11-06T03:44:42.740022522Z Error: unable to load configmap based request-header-client-ca-file: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Let see what happened at the same time with kube-apiserver and kubelet service,

$ grep -nr '03:44:42' | grep -E 'kube-apiserver|kubelet_service'  | grep -E 'E[0-9]{4}'
namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-0-148-214.us-east-2.compute.internal/kube-apiserver-check-endpoints/kube-apiserver-check-endpoints/logs/current.log:681:2021-11-06T03:44:42.353051845Z E1106 03:44:42.353002       1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:openshift-kube-apiserver:check-endpoints" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

host_service_logs/masters/kubelet_service.log:1075267:Nov 06 03:44:42.932216 ip-10-0-148-214 hyperkube[1390]: E1106 03:44:42.931594    1390 pod_workers.go:191] Error syncing pod 80b56c0b-60dd-45aa-a10c-cd5df2bc0544 ("apiserver-754bcf9c6d-8cvms_openshift-oauth-apiserver(80b56c0b-60dd-45aa-a10c-cd5df2bc0544)"), skipping: failed to "StartContainer" for "oauth-apiserver" with CrashLoopBackOff: "back-off 5m0s restarting failed container=oauth-apiserver pod=apiserver-754bcf9c6d-8cvms_openshift-oauth-apiserver(80b56c0b-60dd-45aa-a10c-cd5df2bc0544)"