Bug 1926867

Summary: openshift-apiserver Available is False with 3 pods not ready for a while during upgrade
Product: OpenShift Container Platform Reporter: Luis Sanchez <sanchezl>
Component: kube-apiserverAssignee: Luis Sanchez <sanchezl>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: medium Docs Contact:
Priority: low    
Version: 4.7CC: akashem, aos-bugs, fabian, mf.flip, mfojtik, wking, xxia
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1912820 Environment:
Last Closed: 2021-07-27 22:42:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1912820, 1946856    
Bug Blocks: 1927321    

Comment 2 Ke Wang 2021-02-24 09:39:53 UTC
To verify, did a upgrade from ocp 4.7 GA to 4.8,

$ oc get clusterversion -o json|jq ".items[0].status.history"
[
  {
    "completionTime": "2021-02-23T21:18:35Z",
    "image": "registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-02-22-211839",
    "startedTime": "2021-02-23T19:18:05+08:00",
    "state": "Completed",
    "verified": false,
    "version": "4.8.0-0.nightly-2021-02-22-211839"
  },
  {
    "completionTime": "2021-02-23T18:57:36Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:d74b1cfa81f8c9cc23336aee72d8ae9c9905e62c4874b071317a078c316f8a70",
    "startedTime": "2021-02-23T18:30:46Z",
    "state": "Completed",
    "verified": false,
    "version": "4.7.0"
  }
]

During upgrade, to use one script watch-apiserver-in-upgrade.sh is run to watch `oc get project.project` command: ./watch-apiserver-in-upgrade.sh | tee watch.log, after the Upgrade succeeded. checked the watch.log, 

$ grep "failed" watch.log # totally 1 count
2021-02-23T21:10:24+08:00 oc get project.project failed

Checked the detail of above error from watch.log, the error has nothing to do with this bug,  caused this is that apiserver resided master node is in SchedulingDisabled, after that node is ready, no errors.
...
2021-02-23T21:10:15+08:00 oc get cm succeeded
version   4.7.0   True   True   112m   Working towards 4.8.0-0.nightly-2021-02-22-211839: 561 of 669 done (83% complete), waiting on machine-config
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get projects.project.openshift.io)
2021-02-23T21:10:24+08:00 oc get project.project failed
Status:
  Conditions:
    Last Transition Time:  2021-02-23T13:06:11Z
    Message:               APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
    Reason:                APIServerDeployment_UnavailablePod
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-02-23T13:10:08Z
    Message:               All is well
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2021-02-23T13:10:08Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Available
apiserver-7dd9b5b8cc-87h8q   0/2   Pending   0     2m25s   <none>        <none>                      <none>   <none>   apiserver=true,app=openshift-apiserver-a,openshift-apiserver-anti-affinity=true,pod-template-hash=7dd9b5b8cc,revision=1
apiserver-7dd9b5b8cc-nv8hj   2/2   Running   0     91m     10.130.0.72   kewang2373-kjmhk-master-2   <none>   <none>   apiserver=true,app=openshift-apiserver-a,openshift-apiserver-anti-affinity=true,pod-template-hash=7dd9b5b8cc,revision=1
apiserver-7dd9b5b8cc-z9xl4   2/2   Running   0     6m43s   10.129.0.11   kewang2373-kjmhk-master-0   <none>   <none>   apiserver=true,app=openshift-apiserver-a,openshift-apiserver-anti-affinity=true,pod-template-hash=7dd9b5b8cc,revision=1
openshift-apiserver   4.8.0-0.nightly-2021-02-22-211839   True   False   True   39s
kewang2373-kjmhk-master-0         Ready                      master   154m   v1.20.0+01ab7fd
kewang2373-kjmhk-master-1         Ready,SchedulingDisabled   master   155m   v1.20.0+ba45583
kewang2373-kjmhk-master-2         Ready                      master   154m   v1.20.0+ba45583
kewang2373-kjmhk-worker-0-67k77   Ready                      worker   146m   v1.20.0+01ab7fd
kewang2373-kjmhk-worker-0-d5tp5   Ready,SchedulingDisabled   worker   146m   v1.20.0+ba45583
kewang2373-kjmhk-worker-0-tzfsc   Ready                      worker   141m   v1.20.0+ba45583
2021-02-23T21:10:49+08:00 oc get cm succeeded
...

From above results, the bug was fixed as expected, so move the bug VERIFIED.

Comment 5 errata-xmlrpc 2021-07-27 22:42:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438