Bug 1958198 - authentication operator takes too long to pick up a configuration change
Summary: authentication operator takes too long to pick up a configuration change
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Standa Laznicka
QA Contact: Xingxing Xia
URL:
Whiteboard: LifecycleFrozen tag-ci
: 2024416 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-07 12:23 UTC by Standa Laznicka
Modified: 2022-08-10 10:36 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Multiple authentication operator controllers attempting to synchronize at a time at regular intervals. Consequence: Client-side throttling for requests towards the API when reacting to a non-periodical change. Fix: Add jitter to the regular synchronization periods so that controllers of the authentication operator don't race for resources. Result: The wait time for the authentication operator reaction to changes decreased.
Clone Of:
Environment:
job=periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn=all
Last Closed: 2022-08-10 10:36:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kube_deployment_status_replicas_available{namespace="openshift-authentication"} (86.68 KB, image/png)
2021-08-25 08:20 UTC, Jan Chaloupka
no flags Details
count(kube_pod_status_ready{namespace="openshift-authentication",pod=~"oauth-openshift-9b79df6cb-.*",condition="true"}) by (pods) (77.55 KB, image/png)
2021-08-25 08:32 UTC, Jan Chaloupka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 491 0 None open Bug 1958198: Increase client side throttling 2022-04-21 10:51:44 UTC
Github openshift library-go pull 1357 0 None open Bug 1958198: Avoid zero resync interval 2022-04-21 10:54:25 UTC
Red Hat Knowledge Base (Solution) 6527301 0 None None None 2021-11-30 12:59:08 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:36:35 UTC

Description Standa Laznicka 2021-05-07 12:23:40 UTC
Description of problem:
The authentication operator appears to be taking too long to pick up a change to its configuration

Version-Release number of selected component (if applicable):
master

How reproducible:
100%

Steps to Reproduce:
1. configure a new identity provider
2. wait for the authentication operator to start deploying a new revision (it reports progressing=true)

Actual results:
It can take up to 5 minutes for the authentication operator to start rolling out a new revision

Expected results:
It should not take more than a minute after a new configuration was set for the the authentication operator to start deploying a new revision.

Additional info:

Comment 1 Standa Laznicka 2021-05-12 08:54:06 UTC
From my observations it is possible for the change to take up to 2 minutes to be observed by the deployment controller _sometimes_, but it is not the 5 minutes as reported by console.

I noticed that those 2 minutes extra usually happen when there's client-side throttling, I wonder whether and how it might affect informer caches. In the other cases the change is quite quick. The reaction to configuration changes and the CM syncing is usually almost immediate, so while this is not a big issue, it is still questionable why the deployment controller takes longer to observe current state.

Comment 2 Michal Fojtik 2021-06-11 09:03:24 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 3 Sergiusz Urbaniak 2021-08-16 12:44:03 UTC
sprint review: this issue is still being worked on.

Comment 4 Sebastian Łaskawiec 2021-08-18 09:59:49 UTC
I tried to reproduce this issue using https://github.com/stlaz/openshift-requestheader-config. 

The OAuth/cluster configuration was applied at '09:13:42Z':

  $ oc get OAuth cluster -o yaml
  ...
    - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:spec:
        f:identityProviders: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-08-18T09:13:42Z"

The Pod rotation started less than a minute after this event - at 09:14:11Z:


  $ oc get events -n openshift-authentication -o custom-columns=TIME:.lastTimestamp,MESSAGE:.message
  ...
  2021-08-18T08:25:12Z   Created pod: oauth-openshift-84d7896496-rbhkl
  2021-08-18T08:25:43Z   Created pod: oauth-openshift-84d7896496-zt679
  2021-08-18T08:26:13Z   Created pod: oauth-openshift-84d7896496-nvm25
  2021-08-18T09:14:11Z   Deleted pod: oauth-openshift-84d7896496-nvm25
  2021-08-18T09:14:42Z   Deleted pod: oauth-openshift-84d7896496-rbhkl
  2021-08-18T09:15:11Z   Deleted pod: oauth-openshift-84d7896496-zt679
  2021-08-18T08:21:30Z   Scaled up replica set oauth-openshift-7bc48d955 to 3
  2021-08-18T08:25:12Z   Scaled down replica set oauth-openshift-7bc48d955 to 2
  2021-08-18T08:25:12Z   Scaled up replica set oauth-openshift-84d7896496 to 1
  2021-08-18T08:25:43Z   Scaled down replica set oauth-openshift-7bc48d955 to 1
  2021-08-18T08:25:43Z   Scaled up replica set oauth-openshift-84d7896496 to 2
  2021-08-18T08:26:13Z   Scaled down replica set oauth-openshift-7bc48d955 to 0
  2021-08-18T08:26:13Z   Scaled up replica set oauth-openshift-84d7896496 to 3
  2021-08-18T09:14:11Z   Scaled down replica set oauth-openshift-84d7896496 to 2
  2021-08-18T09:14:11Z   Scaled up replica set oauth-openshift-58fcdf4f49 to 1
  2021-08-18T09:14:42Z   Scaled down replica set oauth-openshift-84d7896496 to 1
  2021-08-18T09:14:42Z   Scaled up replica set oauth-openshift-58fcdf4f49 to 2
  2021-08-18T09:15:11Z   Scaled down replica set oauth-openshift-84d7896496 to 0
  2021-08-18T09:15:11Z   Scaled up replica set oauth-openshift-58fcdf4f49 to 3

It took roughly 1 minute to rotate all the pods (09:15:11Z - 09:14:11Z).

The CAO also seemed to flip the Progressing flag on timely manner (09:15:32Z, which is roughly 1 minute after the deployment was finished).

  $ oc get clusteroperator authentication -o yaml

  - lastTransitionTime: "2021-08-18T09:15:32Z"
    message: 'AuthenticatorCertKeyProgressing: All is well'
    reason: AsExpected
    status: "False"
    type: Progressing

So the whole procedure took about 2 minutes (09:15:32Z - 09:13:42Z) from start to finish. Let me close this as "WORKSFORME". Perhaps the cluster was overloaded or AWS had its "hard time". Hard to say...

Comment 5 Sebastian Łaskawiec 2021-08-19 07:19:45 UTC
Together with Standa we clarified that the problem is between 09:13:42Z and 09:14:11Z. In my case that took 29 seconds but sometimes it takes 2.5 minutes. The biggest suspicion is client-side throttling. 

Putting this onto the back burner for now.

Comment 6 Jan Chaloupka 2021-08-25 08:20:33 UTC
Created attachment 1817377 [details]
kube_deployment_status_replicas_available{namespace="openshift-authentication"}

From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1428583979283386368:
```
1 unexpected clusteroperator state transitions during e2e test run

Aug 20 05:49:23.886 - 46s   E clusteroperator/authentication condition/Degraded status/True reason/Unknown
```

From https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1428583979283386368/artifacts/e2e-aws-ovn/gather-extra/artifacts/pods/openshift-authentication-operator_authentication-operator-f96986bb8-xnn6j_authentication-operator.log:
```
I0820 05:49:07.254217       1 status_controller.go:211] clusteroperator/authentication diff {
  "status":{
    "conditions":
      [
        {"lastTransitionTime":"2021-08-20T05:49:07Z","message":"OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()","reason":"OAuthServerDeployment_UnavailablePod","status":"True","type":"Degraded"},
        {"lastTransitionTime":"2021-08-20T05:48:20Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},
        {"lastTransitionTime":"2021-08-20T05:48:20Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},
        {"lastTransitionTime":"2021-08-20T05:23:51Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}
      ]
    }
  }
I0820 05:49:07.270768       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-authentication-operator", Name:"openshift-authentication-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from False to True ("OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()")
```

From kubelet logs:
```
ip-10-0-181-79.us-west-1.compute.internal:9313:Aug 20 05:36:21.304475 ip-10-0-181-79 hyperkube[1681]: I0820 05:36:21.304443    1681 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-b9m5k]
ip-10-0-181-79.us-west-1.compute.internal:9482:Aug 20 05:36:29.714223 ip-10-0-181-79 hyperkube[1681]: I0820 05:36:29.714198    1681 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-authentication/oauth-openshift-854d6fff9c-b9m5k"
ip-10-0-181-79.us-west-1.compute.internal:12166:Aug 20 05:45:45.759285 ip-10-0-181-79 hyperkube[1681]: I0820 05:45:45.755286    1681 kubelet.go:2081] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-b9m5k]
ip-10-0-181-79.us-west-1.compute.internal:12280:Aug 20 05:46:12.149280 ip-10-0-181-79 hyperkube[1681]: I0820 05:46:12.149255    1681 kubelet.go:2075] "SyncLoop REMOVE" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-b9m5k]

ip-10-0-221-30.us-west-1.compute.internal:8613:Aug 20 05:36:21.191781 ip-10-0-221-30 hyperkube[1683]: I0820 05:36:21.191744    1683 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-b6crd]
ip-10-0-221-30.us-west-1.compute.internal:8733:Aug 20 05:36:30.005871 ip-10-0-221-30 hyperkube[1683]: I0820 05:36:30.005844    1683 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-authentication/oauth-openshift-854d6fff9c-b6crd"
ip-10-0-221-30.us-west-1.compute.internal:12778:Aug 20 05:46:41.518540 ip-10-0-221-30 hyperkube[1683]: I0820 05:46:41.514047    1683 kubelet.go:2075] "SyncLoop REMOVE" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-b6crd]

ip-10-0-221-30.us-west-1.compute.internal:12779:Aug 20 05:46:41.524046 ip-10-0-221-30 hyperkube[1683]: I0820 05:46:41.524015    1683 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-9b79df6cb-dhvrb]
ip-10-0-221-30.us-west-1.compute.internal:12856:Aug 20 05:46:44.699371 ip-10-0-221-30 hyperkube[1683]: I0820 05:46:44.699343    1683 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-authentication/oauth-openshift-9b79df6cb-dhvrb"

ip-10-0-179-152.us-west-1.compute.internal:12916:Aug 20 05:36:21.303538 ip-10-0-179-152 hyperkube[1680]: I0820 05:36:21.303503    1680 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-j7bw9]
ip-10-0-179-152.us-west-1.compute.internal:13001:Aug 20 05:36:30.479977 ip-10-0-179-152 hyperkube[1680]: I0820 05:36:30.479933    1680 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-authentication/oauth-openshift-854d6fff9c-j7bw9"
ip-10-0-179-152.us-west-1.compute.internal:15877:Aug 20 05:46:44.764966 ip-10-0-179-152 hyperkube[1680]: I0820 05:46:44.764930    1680 kubelet.go:2081] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-j7bw9]
ip-10-0-179-152.us-west-1.compute.internal:15953:Aug 20 05:47:11.423571 ip-10-0-179-152 hyperkube[1680]: I0820 05:47:11.423477    1680 kubelet.go:2075] "SyncLoop REMOVE" source="api" pods=[openshift-authentication/oauth-openshift-854d6fff9c-j7bw9]

ip-10-0-179-152.us-west-1.compute.internal:15954:Aug 20 05:47:11.437638 ip-10-0-179-152 hyperkube[1680]: I0820 05:47:11.437612    1680 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-9b79df6cb-9wgf6]
ip-10-0-179-152.us-west-1.compute.internal:16031:Aug 20 05:47:14.544054 ip-10-0-179-152 hyperkube[1680]: I0820 05:47:14.544024    1680 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-authentication/oauth-openshift-9b79df6cb-9wgf6"

ip-10-0-181-79.us-west-1.compute.internal:12281:Aug 20 05:46:12.163056 ip-10-0-181-79 hyperkube[1681]: I0820 05:46:12.163030    1681 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-9b79df6cb-6f6hk]
ip-10-0-181-79.us-west-1.compute.internal:12365:Aug 20 05:46:15.096268 ip-10-0-181-79 hyperkube[1681]: I0820 05:46:15.096247    1681 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-authentication/oauth-openshift-9b79df6cb-6f6hk"
```

ip-10-0-221-30.us-west-1.compute.internal:
- oauth-openshift-854d6fff9c-b6crd: <05:36:21.191744; 05:46:41.514047>
- oauth-openshift-9b79df6cb-dhvrb:  <05:46:41.524015; ???>
ip-10-0-179-152.us-west-1.compute.internal:
- oauth-openshift-854d6fff9c-j7bw9: <05:36:21.303503; 05:47:11.423477>
- oauth-openshift-9b79df6cb-9wgf6:  <05:47:11.437612; ???>
ip-10-0-181-79.us-west-1.compute.internal:
- oauth-openshift-854d6fff9c-b9m5k: <05:36:29.714223; 05:46:12.149280>
- oauth-openshift-9b79df6cb-6f6hk:  <05:46:12.163030; ???>

From https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1428583979283386368/artifacts/e2e-aws-ovn/gather-extra/artifacts/pods/openshift-kube-controller-manager_kube-controller-manager-ip-10-0-221-30.us-west-1.compute.internal_kube-controller-manager.log:
```
I0820 05:45:45.720433       1 replica_set.go:599] "Too many replicas" replicaSet="openshift-authentication/oauth-openshift-854d6fff9c" need=2 deleting=1
I0820 05:45:45.720471       1 replica_set.go:227] "Found related ReplicaSets" replicaSet="openshift-authentication/oauth-openshift-854d6fff9c" relatedReplicaSets=[oauth-openshift-854d6fff9c oauth-openshift-9b79df6cb]
I0820 05:45:45.720543       1 controller_utils.go:592] "Deleting pod" controller="oauth-openshift-854d6fff9c" pod="openshift-authentication/oauth-openshift-854d6fff9c-b9m5k"
I0820 05:45:45.721679       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled down replica set oauth-openshift-854d6fff9c to 2"
I0820 05:45:45.743913       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift-854d6fff9c" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulDelete" message="Deleted pod: oauth-openshift-854d6fff9c-b9m5k"
I0820 05:45:45.753023       1 replica_set.go:563] "Too few replicas" replicaSet="openshift-authentication/oauth-openshift-9b79df6cb" need=1 creating=1
I0820 05:45:45.753403       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set oauth-openshift-9b79df6cb to 1"
I0820 05:45:45.799878       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift-9b79df6cb" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: oauth-openshift-9b79df6cb-6f6hk"
I0820 05:46:15.169692       1 replica_set.go:599] "Too many replicas" replicaSet="openshift-authentication/oauth-openshift-854d6fff9c" need=1 deleting=1
I0820 05:46:15.169736       1 replica_set.go:227] "Found related ReplicaSets" replicaSet="openshift-authentication/oauth-openshift-854d6fff9c" relatedReplicaSets=[oauth-openshift-854d6fff9c oauth-openshift-9b79df6cb]
I0820 05:46:15.169805       1 controller_utils.go:592] "Deleting pod" controller="oauth-openshift-854d6fff9c" pod="openshift-authentication/oauth-openshift-854d6fff9c-b6crd"
I0820 05:46:15.170261       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled down replica set oauth-openshift-854d6fff9c to 1"
I0820 05:46:15.190841       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift-854d6fff9c" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulDelete" message="Deleted pod: oauth-openshift-854d6fff9c-b6crd"
I0820 05:46:15.201758       1 replica_set.go:563] "Too few replicas" replicaSet="openshift-authentication/oauth-openshift-9b79df6cb" need=2 creating=1
I0820 05:46:15.202449       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set oauth-openshift-9b79df6cb to 2"
I0820 05:46:15.262665       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift-9b79df6cb" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: oauth-openshift-9b79df6cb-dhvrb"
I0820 05:46:44.744944       1 replica_set.go:599] "Too many replicas" replicaSet="openshift-authentication/oauth-openshift-854d6fff9c" need=0 deleting=1
I0820 05:46:44.745068       1 replica_set.go:227] "Found related ReplicaSets" replicaSet="openshift-authentication/oauth-openshift-854d6fff9c" relatedReplicaSets=[oauth-openshift-854d6fff9c oauth-openshift-9b79df6cb]
I0820 05:46:44.745094       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled down replica set oauth-openshift-854d6fff9c to 0"
I0820 05:46:44.745202       1 controller_utils.go:592] "Deleting pod" controller="oauth-openshift-854d6fff9c" pod="openshift-authentication/oauth-openshift-854d6fff9c-j7bw9"
I0820 05:46:44.757307       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift-854d6fff9c" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulDelete" message="Deleted pod: oauth-openshift-854d6fff9c-j7bw9"
I0820 05:46:44.770854       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set oauth-openshift-9b79df6cb to 3"
I0820 05:46:44.772211       1 replica_set.go:563] "Too few replicas" replicaSet="openshift-authentication/oauth-openshift-9b79df6cb" need=3 creating=1
I0820 05:46:44.813635       1 event.go:291] "Event occurred" object="openshift-authentication/oauth-openshift-9b79df6cb" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: oauth-openshift-9b79df6cb-9wgf6"
```

From which:
```
I0820 05:45:45.721679       1 event.go:291] message="Scaled down replica set oauth-openshift-854d6fff9c to 2"
I0820 05:45:45.753403       1 event.go:291] message="Scaled up replica set oauth-openshift-9b79df6cb to 1"
I0820 05:46:15.170261       1 event.go:291] message="Scaled down replica set oauth-openshift-854d6fff9c to 1"
I0820 05:46:15.202449       1 event.go:291] message="Scaled up replica set oauth-openshift-9b79df6cb to 2"
I0820 05:46:44.745094       1 event.go:291] message="Scaled down replica set oauth-openshift-854d6fff9c to 0"
I0820 05:46:44.770854       1 event.go:291] message="Scaled up replica set oauth-openshift-9b79df6cb to 3"
```

From must-gather and host_service_logs/masters/crio_service.log:
```
Aug 20 05:46:41.880945 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:41.880868895Z" level=info msg="Running pod sandbox: openshift-authentication/oauth-openshift-9b79df6cb-dhvrb/POD" id=0e7fc010-b3fe-483a-8b0a-fe15a7599f81 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Aug 20 05:46:41.902653 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:41.902609242Z" level=info msg="Got pod network &{Name:oauth-openshift-9b79df6cb-dhvrb Namespace:openshift-authentication ID:a6bd66adbb6aef3355ebfdebde018989f63a77f15506da65ee3eb314550df94e UID:73f63020-ce8f-478d-9e86-a62aa217a300 NetNS:/var/run/netns/4e0e52d0-09f3-4a90-8e2c-940a2439e0e3 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Aug 20 05:46:41.902653 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:41.902654779Z" level=info msg="Adding pod openshift-authentication_oauth-openshift-9b79df6cb-dhvrb to CNI network \"multus-cni-network\" (type=multus)"
Aug 20 05:46:43.834063 ip-10-0-221-30 crio[1645]: 2021-08-20T05:46:43Z [verbose] Add: openshift-authentication:oauth-openshift-9b79df6cb-dhvrb:73f63020-ce8f-478d-9e86-a62aa217a300:ovn-kubernetes(ovn-kubernetes):eth0 {"cniVersion":"0.4.0","interfaces":[{"name":"a6bd66adbb6aef3","mac":"c2:d8:67:f4:55:dd"},{"name":"eth0","mac":"0a:58:0a:80:00:2d","sandbox":"/var/run/netns/4e0e52d0-09f3-4a90-8e2c-940a2439e0e3"}],"ips":[{"version":"4","interface":1,"address":"10.128.0.45/23","gateway":"10.128.0.1"}],"dns":{}}
Aug 20 05:46:43.834063 ip-10-0-221-30 crio[1645]: I0820 05:46:43.809646   53468 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-authentication", Name:"oauth-openshift-9b79df6cb-dhvrb", UID:"73f63020-ce8f-478d-9e86-a62aa217a300", APIVersion:"v1", ResourceVersion:"28289", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.128.0.45/23] from ovn-kubernetes
Aug 20 05:46:43.835370 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:43.835334112Z" level=info msg="Got pod network &{Name:oauth-openshift-9b79df6cb-dhvrb Namespace:openshift-authentication ID:a6bd66adbb6aef3355ebfdebde018989f63a77f15506da65ee3eb314550df94e UID:73f63020-ce8f-478d-9e86-a62aa217a300 NetNS:/var/run/netns/4e0e52d0-09f3-4a90-8e2c-940a2439e0e3 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Aug 20 05:46:43.835585 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:43.835549249Z" level=info msg="Checking pod openshift-authentication_oauth-openshift-9b79df6cb-dhvrb for CNI network multus-cni-network (type=multus)"
Aug 20 05:46:43.870277 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:43.870235309Z" level=info msg="Ran pod sandbox a6bd66adbb6aef3355ebfdebde018989f63a77f15506da65ee3eb314550df94e with infra container: openshift-authentication/oauth-openshift-9b79df6cb-dhvrb/POD" id=0e7fc010-b3fe-483a-8b0a-fe15a7599f81 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Aug 20 05:46:43.875169 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:43.875145473Z" level=info msg="Creating container: openshift-authentication/oauth-openshift-9b79df6cb-dhvrb/oauth-openshift" id=7994b2e3-aae3-42bd-bbc6-7c3c1db50729 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 20 05:46:44.154018 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:44.153953325Z" level=info msg="Created container 354d411a59b72cc655cb7cc066defc1f6d0202bc3e70a7de7638fe8d05deaa13: openshift-authentication/oauth-openshift-9b79df6cb-dhvrb/oauth-openshift" id=7994b2e3-aae3-42bd-bbc6-7c3c1db50729 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 20 05:46:44.175436 ip-10-0-221-30 crio[1645]: time="2021-08-20 05:46:44.175362799Z" level=info msg="Started container" PID=53567 containerID=354d411a59b72cc655cb7cc066defc1f6d0202bc3e70a7de7638fe8d05deaa13 description=openshift-authentication/oauth-openshift-9b79df6cb-dhvrb/oauth-openshift id=7cfed9b2-7f10-45c0-8286-e30ba809f1b9 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a6bd66adbb6aef3355ebfdebde018989f63a77f15506da65ee3eb314550df94e

Aug 20 05:47:11.817443 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:11.817393617Z" level=info msg="Running pod sandbox: openshift-authentication/oauth-openshift-9b79df6cb-9wgf6/POD" id=438c5bb5-02a0-4cd9-8409-bcff85f85b06 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Aug 20 05:47:11.857327 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:11.857284805Z" level=info msg="Got pod network &{Name:oauth-openshift-9b79df6cb-9wgf6 Namespace:openshift-authentication ID:d2062f1fad1bed879a3fdf284dc7f1e95e8b313ca96be0cc355e2a890310676c UID:323debbc-dd6e-42ca-aee2-499dea4789b3 NetNS:/var/run/netns/20389f00-d34a-4ce1-b8b4-cc523eed8f58 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Aug 20 05:47:11.857327 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:11.857327467Z" level=info msg="Adding pod openshift-authentication_oauth-openshift-9b79df6cb-9wgf6 to CNI network \"multus-cni-network\" (type=multus)"
Aug 20 05:47:13.890280 ip-10-0-179-152 crio[1642]: 2021-08-20T05:47:13Z [verbose] Add: openshift-authentication:oauth-openshift-9b79df6cb-9wgf6:323debbc-dd6e-42ca-aee2-499dea4789b3:ovn-kubernetes(ovn-kubernetes):eth0 {"cniVersion":"0.4.0","interfaces":[{"name":"d2062f1fad1bed8","mac":"6e:ac:a4:09:2c:0e"},{"name":"eth0","mac":"0a:58:0a:82:00:3d","sandbox":"/var/run/netns/20389f00-d34a-4ce1-b8b4-cc523eed8f58"}],"ips":[{"version":"4","interface":1,"address":"10.130.0.61/23","gateway":"10.130.0.1"}],"dns":{}}
Aug 20 05:47:13.890280 ip-10-0-179-152 crio[1642]: I0820 05:47:13.865011   71431 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-authentication", Name:"oauth-openshift-9b79df6cb-9wgf6", UID:"323debbc-dd6e-42ca-aee2-499dea4789b3", APIVersion:"v1", ResourceVersion:"28523", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.130.0.61/23] from ovn-kubernetes
Aug 20 05:47:13.891836 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:13.891750971Z" level=info msg="Got pod network &{Name:oauth-openshift-9b79df6cb-9wgf6 Namespace:openshift-authentication ID:d2062f1fad1bed879a3fdf284dc7f1e95e8b313ca96be0cc355e2a890310676c UID:323debbc-dd6e-42ca-aee2-499dea4789b3 NetNS:/var/run/netns/20389f00-d34a-4ce1-b8b4-cc523eed8f58 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Aug 20 05:47:13.892094 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:13.892066862Z" level=info msg="Checking pod openshift-authentication_oauth-openshift-9b79df6cb-9wgf6 for CNI network multus-cni-network (type=multus)"
Aug 20 05:47:13.907960 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:13.907919770Z" level=info msg="Ran pod sandbox d2062f1fad1bed879a3fdf284dc7f1e95e8b313ca96be0cc355e2a890310676c with infra container: openshift-authentication/oauth-openshift-9b79df6cb-9wgf6/POD" id=438c5bb5-02a0-4cd9-8409-bcff85f85b06 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Aug 20 05:47:13.913124 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:13.913101443Z" level=info msg="Creating container: openshift-authentication/oauth-openshift-9b79df6cb-9wgf6/oauth-openshift" id=a6971e66-2849-4606-b3c9-a227d7fd4ca3 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 20 05:47:14.115938 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:14.115896673Z" level=info msg="Created container 0cefd44ed0ec845dd8e4b5c93a431462eea513987449df3cadb0fb12931f7e71: openshift-authentication/oauth-openshift-9b79df6cb-9wgf6/oauth-openshift" id=a6971e66-2849-4606-b3c9-a227d7fd4ca3 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 20 05:47:14.141299 ip-10-0-179-152 crio[1642]: time="2021-08-20 05:47:14.141170202Z" level=info msg="Started container" PID=71508 containerID=0cefd44ed0ec845dd8e4b5c93a431462eea513987449df3cadb0fb12931f7e71 description=openshift-authentication/oauth-openshift-9b79df6cb-9wgf6/oauth-openshift id=3a4e3297-2276-4f48-9ee1-a4c6fcaec30b name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=d2062f1fad1bed879a3fdf284dc7f1e95e8b313ca96be0cc355e2a890310676c

Aug 20 05:46:12.490298 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:12.490239160Z" level=info msg="Running pod sandbox: openshift-authentication/oauth-openshift-9b79df6cb-6f6hk/POD" id=7331708c-9b2d-43e5-9add-97a533157bc0 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Aug 20 05:46:12.521892 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:12.521837624Z" level=info msg="Got pod network &{Name:oauth-openshift-9b79df6cb-6f6hk Namespace:openshift-authentication ID:f470dfa72b93597f82e20b5a3bdff04c5cdf8d3459abc52c649e757a05a9ad99 UID:b3689bf3-844e-4595-ad6c-e379879289bc NetNS:/var/run/netns/93d326a1-8b32-4d23-a18c-37ff86b2f4c6 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Aug 20 05:46:12.521892 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:12.521888708Z" level=info msg="Adding pod openshift-authentication_oauth-openshift-9b79df6cb-6f6hk to CNI network \"multus-cni-network\" (type=multus)"
Aug 20 05:46:14.010119 ip-10-0-181-79 crio[1643]: 2021-08-20T05:46:13Z [verbose] Add: openshift-authentication:oauth-openshift-9b79df6cb-6f6hk:b3689bf3-844e-4595-ad6c-e379879289bc:ovn-kubernetes(ovn-kubernetes):eth0 {"cniVersion":"0.4.0","interfaces":[{"name":"f470dfa72b93597","mac":"56:13:f6:51:b4:e3"},{"name":"eth0","mac":"0a:58:0a:81:00:23","sandbox":"/var/run/netns/93d326a1-8b32-4d23-a18c-37ff86b2f4c6"}],"ips":[{"version":"4","interface":1,"address":"10.129.0.35/23","gateway":"10.129.0.1"}],"dns":{}}
Aug 20 05:46:14.010119 ip-10-0-181-79 crio[1643]: I0820 05:46:13.981905   53298 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-authentication", Name:"oauth-openshift-9b79df6cb-6f6hk", UID:"b3689bf3-844e-4595-ad6c-e379879289bc", APIVersion:"v1", ResourceVersion:"28045", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.129.0.35/23] from ovn-kubernetes
Aug 20 05:46:14.010691 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:14.010656394Z" level=info msg="Got pod network &{Name:oauth-openshift-9b79df6cb-6f6hk Namespace:openshift-authentication ID:f470dfa72b93597f82e20b5a3bdff04c5cdf8d3459abc52c649e757a05a9ad99 UID:b3689bf3-844e-4595-ad6c-e379879289bc NetNS:/var/run/netns/93d326a1-8b32-4d23-a18c-37ff86b2f4c6 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Aug 20 05:46:14.010855 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:14.010835979Z" level=info msg="Checking pod openshift-authentication_oauth-openshift-9b79df6cb-6f6hk for CNI network multus-cni-network (type=multus)"
Aug 20 05:46:14.020664 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:14.020597705Z" level=info msg="Ran pod sandbox f470dfa72b93597f82e20b5a3bdff04c5cdf8d3459abc52c649e757a05a9ad99 with infra container: openshift-authentication/oauth-openshift-9b79df6cb-6f6hk/POD" id=7331708c-9b2d-43e5-9add-97a533157bc0 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Aug 20 05:46:14.027498 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:14.027472074Z" level=info msg="Creating container: openshift-authentication/oauth-openshift-9b79df6cb-6f6hk/oauth-openshift" id=21c6b0d6-c28e-4fba-9848-ca6f4a7d12e8 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 20 05:46:14.180079 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:14.180040980Z" level=info msg="Created container 1ec7eaa560ae11b7470d6df481e27e5654bee1524f43308e664194e6f0b8ba01: openshift-authentication/oauth-openshift-9b79df6cb-6f6hk/oauth-openshift" id=21c6b0d6-c28e-4fba-9848-ca6f4a7d12e8 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 20 05:46:14.193655 ip-10-0-181-79 crio[1643]: time="2021-08-20 05:46:14.193610435Z" level=info msg="Started container" PID=53397 containerID=1ec7eaa560ae11b7470d6df481e27e5654bee1524f43308e664194e6f0b8ba01 description=openshift-authentication/oauth-openshift-9b79df6cb-6f6hk/oauth-openshift id=639c66fd-50c7-4513-99f2-8363a2cbc82a name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=f470dfa72b93597f82e20b5a3bdff04c5cdf8d3459abc52c649e757a05a9ad99
```

One can see all the newly spawned containers are running since they were started.

Checking metrics one can confirm the same (see the attachment). The deployment reports 3 replicas available since 05:47:54. Yet after 73s the operator still reports (at 05:49:07.270768) one replicas is missing. Making the operator change the Degraded condition to True even though it should not.

Comment 7 Jan Chaloupka 2021-08-25 08:32:51 UTC
Created attachment 1817379 [details]
count(kube_pod_status_ready{namespace="openshift-authentication",pod=~"oauth-openshift-9b79df6cb-.*",condition="true"}) by (pods)

All three oauth-openshift-9b79df6cb-.* pods are ready since 05:47:50.

Comment 9 Sebastian Łaskawiec 2021-09-03 05:58:04 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 11 Sebastian Łaskawiec 2021-09-20 06:40:12 UTC
I analyzed the second case presented by jchaloup (https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1428583979283386368) trying to find the answer what happens between 05:47:54 and 05:49:07+:

```
2021-08-20T05:47:03.435822885Z I0820 05:47:03.435771       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-20T05:43:38Z","message":"OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()\nWellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 2","reason":"OAuthServerDeployment_UnavailablePod::WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-08-20T05:36:17Z","message":"WellKnownReadyControllerProgressing: kube-apiserver oauth endpoint https://10.0.179.152:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)","reason":"WellKnownReadyController_OAuthMetadataNotYetServed","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-08-20T05:23:52Z","message":"WellKnownAvailable: The well-known endpoint is not yet available: need at least 3 kube-apiservers, got 2","reason":"WellKnown_NotReady","status":"False","type":"Available"},{"lastTransitionTime":"2021-08-20T05:23:51Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
2021-08-20T05:47:03.459082069Z I0820 05:47:03.455324       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-authentication-operator", Name:"openshift-authentication-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded message changed from "WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 2" to "OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()\nWellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 2",Progressing message changed from "OAuthServerDeploymentProgressing: deployment/oauth-openshift.openshift-authentication: 2/3 pods have been updated to the latest generation\nWellKnownReadyControllerProgressing: kube-apiserver oauth endpoint https://10.0.179.152:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)" to "WellKnownReadyControllerProgressing: kube-apiserver oauth endpoint https://10.0.179.152:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)"
2021-08-20T05:47:26.146880265Z I0820 05:47:26.146833       1 request.go:668] Waited for 1.072004557s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/audit
2021-08-20T05:47:27.147733220Z I0820 05:47:27.147690       1 request.go:668] Waited for 1.5958309s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-authentication/secrets/v4-0-config-system-router-certs
2021-08-20T05:47:28.147879448Z I0820 05:47:28.147836       1 request.go:668] Waited for 1.796436361s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/etcd-serving-ca
2021-08-20T05:47:29.015757077Z E0820 05:47:29.015703       1 base_controller.go:272] WellKnownReadyController reconciliation failed: need at least 3 kube-apiservers, got 2
2021-08-20T05:47:29.347741995Z I0820 05:47:29.347693       1 request.go:668] Waited for 1.195671848s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/trusted-ca-bundle
2021-08-20T05:47:30.547362686Z I0820 05:47:30.547314       1 request.go:668] Waited for 1.506119023s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets/router-certs
2021-08-20T05:47:31.547740259Z I0820 05:47:31.547698       1 request.go:668] Waited for 1.396113997s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-authentication/secrets/v4-0-config-system-session
2021-08-20T05:47:32.747015572Z I0820 05:47:32.746972       1 request.go:668] Waited for 1.188849439s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-authentication/configmaps/v4-0-config-system-cliconfig
2021-08-20T05:47:59.011135667Z E0820 05:47:59.011092       1 base_controller.go:272] WellKnownReadyController reconciliation failed: need at least 3 kube-apiservers, got 2
2021-08-20T05:48:20.594090404Z I0820 05:48:20.592221       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-20T05:43:38Z","message":"OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()\nWellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 2","reason":"OAuthServerDeployment_UnavailablePod::WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-20T05:23:51Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
2021-08-20T05:48:20.610361186Z I0820 05:48:20.610313       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-authentication-operator", Name:"openshift-authentication-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Progressing changed from True to False ("AuthenticatorCertKeyProgressing: All is well"),Available changed from False to True ("All is well")
2021-08-20T05:48:20.663368755Z I0820 05:48:20.663321       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-20T05:23:51Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
2021-08-20T05:48:20.709108344Z I0820 05:48:20.709042       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-authentication-operator", Name:"openshift-authentication-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from True to False ("OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()")
2021-08-20T05:48:21.793847210Z I0820 05:48:21.793805       1 request.go:668] Waited for 1.136380693s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/audit
2021-08-20T05:48:22.794617185Z I0820 05:48:22.794543       1 request.go:668] Waited for 1.394260215s due to client-side throttling, not priority and fairness, request: DELETE:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/secrets/encryption-config
2021-08-20T05:48:26.194132017Z I0820 05:48:26.194089       1 request.go:668] Waited for 1.116075959s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets/router-certs
2021-08-20T05:48:27.194324546Z I0820 05:48:27.194270       1 request.go:668] Waited for 1.591906687s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/audit
2021-08-20T05:48:28.394654866Z I0820 05:48:28.394604       1 request.go:668] Waited for 1.797038312s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/etcd-serving-ca
2021-08-20T05:48:29.594191226Z I0820 05:48:29.594150       1 request.go:668] Waited for 1.374382932s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-authentication/configmaps/v4-0-config-system-service-ca
2021-08-20T05:48:30.594671125Z I0820 05:48:30.594518       1 request.go:668] Waited for 1.552775541s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets/router-certs
2021-08-20T05:48:31.794529674Z I0820 05:48:31.794489       1 request.go:668] Waited for 1.396213117s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-authentication/secrets/v4-0-config-system-session
2021-08-20T05:48:32.794677831Z I0820 05:48:32.794636       1 request.go:668] Waited for 1.187852265s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-authentication
2021-08-20T05:49:07.254257397Z I0820 05:49:07.254217       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-20T05:49:07Z","message":"OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()","reason":"OAuthServerDeployment_UnavailablePod","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-20T05:23:51Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
2021-08-20T05:49:07.270816501Z I0820 05:49:07.270768       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-authentication-operator", Name:"openshift-authentication-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from False to True ("OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()")
2021-08-20T05:49:26.277476479Z I0820 05:49:26.277434       1 request.go:668] Waited for 1.051254939s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-authentication/serviceaccounts/oauth-openshift
2021-08-20T05:49:29.277667550Z I0820 05:49:29.277621       1 request.go:668] Waited for 1.030436518s due to client-side throttling, not priority and fairness, request: DELETE:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/secrets/encryption-config
2021-08-20T05:49:30.477463987Z I0820 05:49:30.477424       1 request.go:668] Waited for 1.044396892s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config/secrets/webhook-authentication-integrated-oauth
2021-08-20T05:50:10.673277949Z I0820 05:50:10.673222       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-20T05:50:10Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-20T05:23:51Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
2021-08-20T05:50:10.694601820Z I0820 05:50:10.693487       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-20T05:50:10Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-20T05:48:20Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-20T05:23:51Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
2021-08-20T05:50:10.698154172Z I0820 05:50:10.698103       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-authentication-operator", Name:"openshift-authentication-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from True to False ("All is well")
2021-08-20T05:50:10.704125566Z I0820 05:50:10.704068       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-authentication-operator", Name:"openshift-authentication-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from True to False ("All is well")
```

I found two things that are suspicious:
- Events are emitted twice, which may suggest that there are multiple worker threads. This is very unlikely as the OAuthServerWorkloadController runs with a single worker.
- Client side throttling happened.

Comment 12 Sergiusz Urbaniak 2021-11-08 06:58:07 UTC
reviewed-in-sprint: not enough capacity to work on this bugzilla.

Comment 14 Sergiusz Urbaniak 2021-11-18 11:21:31 UTC
*** Bug 2024416 has been marked as a duplicate of this bug. ***

Comment 15 Sergiusz Urbaniak 2021-11-26 07:23:15 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 18 aharchin 2022-01-19 12:26:56 UTC
Hello Team,

Any update on this?

The customer is continuously asking for updates. Please let us know in case of any information required from our end.

Regards,
Akhil Harchinder
Red Hat

Comment 22 helwazer 2022-03-16 11:57:13 UTC
Hello Team,

Any update on this?

Regards,
Hala Elwazery
Red Hat

Comment 25 helwazer 2022-04-06 09:14:51 UTC
Hello @Sebastian , 

I'm following up again on this bugzilla as The Support case (#3057022 - Waiting on Red Hat) is still opened because of this BZ and we didn't have any update since March,17.
did you have any chance to get a stable reproducer for this issue? 

Thank you, have a great day
Hala Elwazery - Technical Account Manager, Openshift

Comment 28 Xingxing Xia 2022-05-18 13:26:52 UTC
Synced with Ruiyang with necessary guide, he will help verify this bug.

Comment 29 gongruiyang 2022-05-19 14:42:29 UTC
A total of four tests were done, and the time required to obtain the expected results was as follows:
first: 62 seconds
second: 10 seconds
third: 44 seconds
fourth: 28 seconds
The test is basically as expected.

Comment 30 gongruiyang 2022-05-24 02:42:33 UTC
Test payload: 4.11.0-0.nightly-2022-05-18-171831

Test steps : OCP-22434 setup part( https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-22434 )
1. Save the oauth cluster yaml to oauth_cluster.yaml
2. change oauth cluster identityProviders part
3. Time to wait for authentication's PROGRESS status to become true (oc get co | grep "authentication")
4. restore oauth cluster envrioment.

Repeat steps 2~4 four times.

Test Result:
A total of four tests were done, and the time required to obtain the expected results was as follows:
first: 62 seconds
second: 10 seconds
third: 44 seconds
fourth: 28 seconds
The test is basically as expected.

Comment 34 errata-xmlrpc 2022-08-10 10:36:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.