Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2000004

Summary:	[bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Degraded: master nodes drained too quickly
Product:	OpenShift Container Platform	Reporter:	Jan Chaloupka <jchaloup>
Component:	openshift-apiserver	Assignee:	Jan Chaloupka <jchaloup>
Status:	CLOSED WONTFIX	QA Contact:	Xingxing Xia <xxia>
Severity:	high	Docs Contact:
Priority:	medium
Version:	4.9	CC:	aos-bugs, kewang, mfojtik, sttts, xxia
Target Milestone:	---	Flags:	mfojtik: needinfo?
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	LifecycleStale
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1998031	Environment:	jobs=periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade=all
Last Closed:	2023-01-16 11:36:58 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Chaloupka 2021-09-01 07:46:01 UTC

+++ This bug was initially created as a clone of Bug #1998031 +++

From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade/1430418602502131712:
```
[bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Degraded 

1 unexpected clusteroperator state transitions during e2e test run 

Aug 25 08:06:51.762 - 808s  E clusteroperator/openshift-apiserver condition/Degraded status/True reason/APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
```

From loki (https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%5B%221629842400000%22,%221629928799000%22,%22Grafana%20Cloud%22,%7B%22expr%22:%22%7Binvoker%3D%5C%22openshift-internal-ci%2Fperiodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade%2F1430418602502131712%5C%22%7D%20%7C%20unpack%20%7C%20namespace%3D%5C%22openshift-authentication-operator%5C%22%20%7C%20container_name%3D%5C%22authentication-operator%5C%22%22%7D%5D):
```
2021-08-25 10:06:51
I0825 08:06:51.740397       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-25T08:06:51Z","message":"APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()\nOAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()","reason":"APIServerDeployment_UnavailablePod::OAuthServerDeployment_UnavailablePod","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-08-25T07:45:00Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-25T07:57:01Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-25T06:49:31Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
2021-08-25 10:06:24
I0825 08:06:24.738959       1 request.go:668] Waited for 1.015853827s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/audit
```

Checking metrics (`kube_deployment_status_replicas_available{namespace="openshift-authentication"}`) the deployment had 1 replicas missing at 08:06:51.740397

There were 3 RSs (from oldest to youngest)
- oauth-openshift-7fb8f95bd8
- oauth-openshift-6f975f59c
- oauth-openshift-65d76d9bcd

- 08:04:43 -> 08:20:55 less than 3 replicas available

Checking events.json:
```
            "message": "Scaled up replica set oauth-openshift-65d76d9bcd to 1",
            "message": "Scaled up replica set oauth-openshift-65d76d9bcd to 2",
            "message": "Scaled up replica set oauth-openshift-65d76d9bcd to 3",
```
No events about scaling oauth-openshift-65d76d9bcd. Checking kubelet:
```
ovirt14-vhnb7-master-1:18830:Aug 25 07:43:47.183806 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 07:43:47.183768    1846 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mlj5s]
ovirt14-vhnb7-master-1:22979:Aug 25 08:09:43.697971 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:09:43.697943    1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mlj5s]
ovirt14-vhnb7-master-1:24774:Aug 25 08:10:11.266126 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:10:11.266097    1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mlj5s]
ovirt14-vhnb7-master-1:32270:Aug 25 08:14:43.473159 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:43.471314    1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-c8mv4]
ovirt14-vhnb7-master-2:16925:Aug 25 07:42:44.615329 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 07:42:44.615293    1850 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-k4g7p]
ovirt14-vhnb7-master-2:23385:Aug 25 08:14:50.475225 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:14:50.475018    1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-k4g7p]
ovirt14-vhnb7-master-2:26008:Aug 25 08:15:24.629063 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:15:24.629032    1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-k4g7p]
ovirt14-vhnb7-master-2:33998:Aug 25 08:19:59.335853 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.335828    1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-ptwm7]
ovirt14-vhnb7-master-0:21275:Aug 25 07:44:30.963045 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 07:44:30.962513    1851 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-snm9m]
ovirt14-vhnb7-master-0:24990:Aug 25 08:04:44.819607 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:04:44.819553    1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-snm9m]
ovirt14-vhnb7-master-0:26129:Aug 25 08:05:15.971355 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:05:15.971321    1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-snm9m]
ovirt14-vhnb7-master-0:34407:Aug 25 08:09:40.277931 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.277906    1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mnvdm]
```

- oauth-openshift-65d76d9bcd-snm9m: <07:44:30.962513; 08:04:44.819553> -> oauth-openshift-65d76d9bcd-mnvdm: <08:09:40.277906; ???>
- oauth-openshift-65d76d9bcd-mlj5s: <07:43:47.183768; 08:09:43.697943> -> oauth-openshift-65d76d9bcd-c8mv4: <08:14:43.471314; ???>
- oauth-openshift-65d76d9bcd-k4g7p: <07:42:44.615293; 08:14:50.475018> -> oauth-openshift-65d76d9bcd-ptwm7: <08:19:59.335828; ???>

Checking `kube_replicaset_spec_replicas{namespace="openshift-authentication",replicaset=~"oauth-openshift-65d76d9bcd"}`, the value was set to 3 the whole time. Meaning something just deleted the pods one by one.

Although, the same can be observed with openshift-apiserver instances (see `kube_deployment_status_replicas_available{namespace="openshift-apiserver"}`).

```
Aug 25 07:33:31.075468 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 07:33:31.075433    1851 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-5mlx4]
Aug 25 08:04:44.754920 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:04:44.754791    1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-5mlx4]
Aug 25 08:05:57.242004 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:05:57.241938    1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-5mlx4]
Aug 25 08:09:40.255344 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.255312    1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-g6nt9]

Aug 25 07:31:59.161083 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 07:31:59.161061    1846 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-wktv4]
Aug 25 08:09:44.274744 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:09:44.274715    1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-wktv4]
Aug 25 08:10:59.127089 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:10:59.127057    1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-wktv4]
Aug 25 08:14:43.407859 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:43.407823    1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-kdvqw]
Aug 25 08:14:50.437017 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:50.436981    1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-hq7xx]

Aug 25 07:30:31.271077 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 07:30:31.271018    1850 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-krq8w]
Aug 25 08:14:50.386938 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:14:50.386718    1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-p2wbp]
Aug 25 08:14:50.493640 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:14:50.492425    1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-krq8w]
Aug 25 08:15:04.621295 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:15:04.621266    1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-p2wbp]
Aug 25 08:16:14.597465 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:16:14.597413    1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-krq8w]
Aug 25 08:19:59.338981 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.338960    1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-gslsh]
```

- apiserver-7478d5665f-5mlx4: <07:33:31.075433; 08:04:44.754791> -> apiserver-7478d5665f-g6nt9: <08:09:40.255312; ???>
- apiserver-7478d5665f-wktv4: <07:31:59.161061; 08:09:44.274715> -> apiserver-7478d5665f-kdvqw: <08:14:43.407823; ???>
- apiserver-7478d5665f-krq8w: <07:30:31.271018; 08:14:50.386718> -> apiserver-7478d5665f-gslsh: <08:19:59.338960; ???>

Putting both alongside (master0 -> master1 -> master2)
- apiserver-7478d5665f-5mlx4: <07:33:31.075433; 08:04:44.754791> -> apiserver-7478d5665f-g6nt9: <08:09:40.255312; ???>
- oauth-openshift-65d76d9bcd-snm9m: <07:44:30.962513; 08:04:44.819553> -> oauth-openshift-65d76d9bcd-mnvdm: <08:09:40.277906; ???>
- apiserver-7478d5665f-wktv4: <07:31:59.161061; 08:09:44.274715> -> apiserver-7478d5665f-kdvqw: <08:14:43.407823; ???>
- oauth-openshift-65d76d9bcd-mlj5s: <07:43:47.183768; 08:09:43.697943> -> oauth-openshift-65d76d9bcd-c8mv4: <08:14:43.471314; ???>
- apiserver-7478d5665f-krq8w: <07:30:31.271018; 08:14:50.386718> -> apiserver-7478d5665f-gslsh: <08:19:59.338960; ???>
- oauth-openshift-65d76d9bcd-k4g7p: <07:42:44.615293; 08:14:50.475018> -> oauth-openshift-65d76d9bcd-ptwm7: <08:19:59.335828; ???>

Checking kubelets for cordoning/drain/update:
```
Aug 25 08:04:42.977292 ovirt14-vhnb7-master-0 root[392852]: machine-config-daemon[382060]: Starting update from rendered-master-9faf006a78c47c3fb7ca6a9223f67108 to rendered-master-7242100dcf595a1cee714eaa5fee9736: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Aug 25 08:04:43.016573 ovirt14-vhnb7-master-0 root[392853]: machine-config-daemon[382060]: Node has been successfully cordoned
Aug 25 08:04:43.020530 ovirt14-vhnb7-master-0 root[392854]: machine-config-daemon[382060]: Update prepared; beginning drain
Aug 25 08:05:57.764428 ovirt14-vhnb7-master-0 root[400279]: machine-config-daemon[382060]: drain complete
Aug 25 08:06:40.588616 ovirt14-vhnb7-master-0 root[403514]: machine-config-daemon[382060]: initiating reboot: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736
Aug 25 08:06:40.600122 ovirt14-vhnb7-master-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736.
Aug 25 08:06:40.706811 ovirt14-vhnb7-master-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736...
Aug 25 08:06:40.827596 ovirt14-vhnb7-master-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736.
Aug 25 08:09:40.255344 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.255312    1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-g6nt9]
Aug 25 08:09:40.277931 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.277906    1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mnvdm]

Aug 25 08:09:41.927616 ovirt14-vhnb7-master-1 root[392706]: machine-config-daemon[357804]: Starting update from rendered-master-9faf006a78c47c3fb7ca6a9223f67108 to rendered-master-7242100dcf595a1cee714eaa5fee9736: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Aug 25 08:09:41.968324 ovirt14-vhnb7-master-1 root[392714]: machine-config-daemon[357804]: Node has been successfully cordoned
Aug 25 08:09:41.971817 ovirt14-vhnb7-master-1 root[392715]: machine-config-daemon[357804]: Update prepared; beginning drain
Aug 25 08:10:59.273200 ovirt14-vhnb7-master-1 root[402449]: machine-config-daemon[357804]: drain complete
Aug 25 08:11:45.707314 ovirt14-vhnb7-master-1 root[405834]: machine-config-daemon[357804]: initiating reboot: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736
Aug 25 08:11:45.719771 ovirt14-vhnb7-master-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736.
Aug 25 08:11:45.767006 ovirt14-vhnb7-master-1 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736.
Aug 25 08:14:43.473159 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:43.471314    1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-c8mv4]
Aug 25 08:14:50.437017 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:50.436981    1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-hq7xx]

Aug 25 08:14:48.609668 ovirt14-vhnb7-master-2 root[427626]: machine-config-daemon[354533]: Starting update from rendered-master-9faf006a78c47c3fb7ca6a9223f67108 to rendered-master-7242100dcf595a1cee714eaa5fee9736: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Aug 25 08:14:48.646779 ovirt14-vhnb7-master-2 root[427627]: machine-config-daemon[354533]: Node has been successfully cordoned
Aug 25 08:14:48.652760 ovirt14-vhnb7-master-2 root[427628]: machine-config-daemon[354533]: Update prepared; beginning drain
Aug 25 08:16:15.397745 ovirt14-vhnb7-master-2 root[439943]: machine-config-daemon[354533]: drain complete
Aug 25 08:17:13.039866 ovirt14-vhnb7-master-2 root[443741]: machine-config-daemon[354533]: initiating reboot: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736
Aug 25 08:17:13.052337 ovirt14-vhnb7-master-2 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736.
Aug 25 08:17:13.159769 ovirt14-vhnb7-master-2 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736...
Aug 25 08:17:13.283820 ovirt14-vhnb7-master-2 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736.
Aug 25 08:19:59.335853 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.335828    1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-ptwm7]
Aug 25 08:19:59.338981 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.338960    1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-gslsh]
```

From openshift-authentication/oauth-openshift deployment:
```
{
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "metadata": {
        ...
        "name": "oauth-openshift",
        "namespace": "openshift-authentication",
        "resourceVersion": "28554",
        "uid": "ca7e25cc-c3d8-400f-89ef-49776b1e6094"
    },
    "spec": {
        ...
        "replicas": 3,
        "strategy": {
            "rollingUpdate": {
                "maxSurge": 0,
                "maxUnavailable": 1
            },
            "type": "RollingUpdate"
        },
        "template": {
            ...
            "spec": {
                "affinity": {
                    "podAntiAffinity": {
                        "requiredDuringSchedulingIgnoredDuringExecution": [
                            {
                                "labelSelector": {
                                    "matchLabels": {
                                        "app": "oauth-openshift",
                                        "oauth-openshift-anti-affinity": "true"
                                    }
                                },
                                "topologyKey": "kubernetes.io/hostname"
                            }
                        ]
                    }
                },
```

From openshift-apiserver/apiserver deployment:
```
{
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "metadata": {
        ...
        "name": "apiserver",
        "namespace": "openshift-apiserver",
    },
    "spec": {
        ...
        "replicas": 3,
        "strategy": {
            "rollingUpdate": {
                "maxSurge": 0,
                "maxUnavailable": 1
            },
            "type": "RollingUpdate"
        },
        "template": {
            ...
            "spec": {
                "affinity": {
                    "podAntiAffinity": {
                        "requiredDuringSchedulingIgnoredDuringExecution": [
                            {
                                "labelSelector": {
                                    "matchLabels": {
                                        "apiserver": "true",
                                        "app": "openshift-apiserver-a",
                                        "openshift-apiserver-anti-affinity": "true"
                                    }
                                },
                                "topologyKey": "kubernetes.io/hostname"
                            }
                        ]
                    }
                },
```

Conclusion:
1. Master nodes are drained one by one (approximately after 5 minute of each other).
2. Both apiserver/oauth-openshift pods do not have enough time to get back running/available in time.
3. There's no PDB that can avoid draining two apiserver/oauth-openshift instances at the same time so the number of available instances dropped from 3 to 1 at some point.

Current behavior:
- Both apiserver/oauth-openshift operators changes condition/Degraded to true when there's at least one replicas missing.

Expected behavior:
- The operators allow 1 replica to be unavailable and change condition/Degraded to true only when there are at least 2 replicas unavailable (since both deployments set maxUnavailable to 1)
- PDB is defined to avoid 2 unavailable replicas at the same time

Comment 1 Jan Chaloupka 2021-09-01 07:46:48 UTC

Cloning https://bugzilla.redhat.com/show_bug.cgi?id=1998031 to defer resolution of "The operators allow 1 replica to be unavailable and change condition/Degraded to true only when there are at least 2 replicas unavailable (since both deployments set maxUnavailable to 1)" part.

Comment 2 Xingxing Xia 2021-09-10 07:26:39 UTC

I don't quite agree. I think 1 replica unavailable should change Degraded to True. If 2 replicas unavailable, it is "Available" that should be changed (to False) instead. E.g. bug 1999946 and bug 2001456 both hit "1 of 3 requested instances are unavailable for apiserver.openshift-apiserver" and changed Degraded, they are indeed bugs. If not change, not sure whether the possible bug may be escaped.

Comment 3 Jan Chaloupka 2021-09-14 09:50:08 UTC

> I think 1 replica unavailable should change Degraded to True

https://github.com/openshift/cluster-authentication-operator/blob/9efb3c1e5ac657aaa87f237d2c6aea586b7aad49/vendor/github.com/openshift/api/config/v1/types_cluster_operator.go#L161-L177

// Degraded indicates that the operator's current state does not match its
// desired state over a period of time resulting in a lower quality of service.
// The period of time may vary by component, but a Degraded state represents
// persistent observation of a condition.
...
// ... A service should not
// report Degraded during the course of a normal upgrade

Given the operator is going through an upgrade, reporting condition/Degraded=True is incorrect. The important piece of information here is "Degraded state represents persistent observation of a condition". The reported issue is not persistent, only temporary.

Comment 4 Michal Fojtik 2021-10-20 17:38:35 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 5 Jan Chaloupka 2021-11-25 15:28:19 UTC

Will need more time to properly improve the logic for setting the operator Degraded condition

Comment 6 Jan Chaloupka 2022-01-07 11:43:14 UTC

Will need more time to properly improve the logic for setting the operator Degraded condition

Comment 7 Jan Chaloupka 2022-02-18 11:23:24 UTC

Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 8 Jan Chaloupka 2022-03-11 14:22:35 UTC

Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 9 Michal Fojtik 2023-01-16 11:36:58 UTC

Dear reporter, we greatly appreciate the bug you have reported here. Unfortunately, due to migration to a new issue-tracking system (https://issues.redhat.com/), we cannot continue triaging bugs reported in Bugzilla. Since this bug has been stale for multiple days, we, therefore, decided to close this bug.
If you think this is a mistake or this bug has a higher priority or severity as set today, please feel free to reopen this bug and tell us why. We are going to move every re-opened bug to https://issues.redhat.com. 

Thank you for your patience and understanding.