Bug 1894216 - Improve OpenShift Web Console availability
Summary: Improve OpenShift Web Console availability
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.5
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Jakub Hadvig
QA Contact: Yadan Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-03 18:21 UTC by Richard Theis
Modified: 2021-02-24 15:30 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Console's pod 'TopologyKey' is set to 'kubernetes.io/hostname'. Consequence: Console availability problems during the updates and zone outages. Fix: Use 'TopologyKey' 'topology.kubernetes.io/zone' instead of 'kubernetes.io/hostname'. Result: OpenShift Web Console has improved availability during the updates and zone outages.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:30:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift console-operator pull 483 0 None closed Bug 1894216: Improve OpenShift Console availability 2021-01-15 13:09:33 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:30:50 UTC

Description Richard Theis 2020-11-03 18:21:22 UTC
Description of problem:

Related to https://bugzilla.redhat.com/show_bug.cgi?id=1809665 and https://bugzilla.redhat.com/show_bug.cgi?id=1809667, the OpenShift Web Console has availability problems. While the OpenShift Web Console pods do have preferred pod anti-affinity rules, these rules could be expanded to include zone anti-affinity for multi-zone clusters. In addition, it would be preferred to have at least 3 pods.

Version-Release number of selected component (if applicable):

OpenShift versions 4.3, 4.4 and 4.5 were tested. This problem may impact other versions.

How reproducible:


Steps to Reproduce:
1. Create an OpenShift version 4.4 cluster with worker nodes in three zones.
2. Start the following script once the cluster is created:

while true; do
    curl -sk <console-url< | grep "Application is not available" && date -u
    sleep 1
done

3. Take down all of the nodes in a single zone to simulate a zone outage.

Actual results:

If the OpenShift Web Console pods are running in the impacted zone, then there will be a several minute outage while the pods are rescheduled to a healthy zone.

Expected results:

OpenShift Web Console has improved availability during the updates and zone outages.

Comment 1 Andrew McDermott 2020-11-05 17:13:28 UTC
Initially this looks to be the wrong component; moving to "dev console" as a first best guess. I don't see anything specifically related to edge routing in the description.

Comment 2 Jakub Hadvig 2020-11-13 16:32:58 UTC
Haven't got time to investigate this issue so far. Will get to it next sprint.

Comment 6 Yadan Pei 2020-12-23 08:15:22 UTC
1. Install a cluster with worker nodes in three different zones

$ oc get node -o wide
NAME                                         STATUS   ROLES    AGE     VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
ip-10-0-140-219.us-east-2.compute.internal   Ready    worker   8h      v1.20.0+87544c5   10.0.140.219   <none>        Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa)   4.18.0-240.8.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-148-60.us-east-2.compute.internal    Ready    master   8h      v1.20.0+87544c5   10.0.148.60    <none>        Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa)   4.18.0-240.8.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-175-182.us-east-2.compute.internal   Ready    worker   8h      v1.20.0+87544c5   10.0.175.182   <none>        Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa)   4.18.0-240.8.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-177-3.us-east-2.compute.internal     Ready    master   8h      v1.20.0+87544c5   10.0.177.3     <none>        Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa)   4.18.0-240.8.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-204-128.us-east-2.compute.internal   Ready    master   8h      v1.20.0+87544c5   10.0.204.128   <none>        Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa)   4.18.0-240.8.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-217-234.us-east-2.compute.internal   Ready    worker   7h58m   v1.20.0+87544c5   10.0.217.234   <none>        Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa)   4.18.0-240.8.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39

$ for node in $(oc get node --no-headers | awk -F ' ' '{print $1}'); do echo "getting $node"; oc get node $node -o yaml | grep 'topology.kubernetes.io/zone: us-east-2'; done
getting ip-10-0-140-219.us-east-2.compute.internal
    topology.kubernetes.io/zone: us-east-2a
getting ip-10-0-148-60.us-east-2.compute.internal
    topology.kubernetes.io/zone: us-east-2a
getting ip-10-0-175-182.us-east-2.compute.internal
    topology.kubernetes.io/zone: us-east-2b
getting ip-10-0-177-3.us-east-2.compute.internal
    topology.kubernetes.io/zone: us-east-2b
getting ip-10-0-204-128.us-east-2.compute.internal
    topology.kubernetes.io/zone: us-east-2c
getting ip-10-0-217-234.us-east-2.compute.internal
    topology.kubernetes.io/zone: us-east-2c

2. Check console pods, console pods are in us-east-2b and us-east-2a zone
$ oc get pods -n openshift-console -o wide
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          6h32m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>  
console-6685f6b866-x8bfc     1/1     Running   0          6h34m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>  

3. Goes to AWS console and stop all worker nodes in us-east-2a zone, that is: ip-10-0-140-219.us-east-2.compute.internal(one worker node) and ip-10-0-148-60.us-east-2.compute.internal(one master node), at the same time we watch console pods and check console accessibility

in one terminal, we watch console pods

$ while true; do oc get pods -n openshift-console -o wide; sleep 5; done
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          7h51m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Running   0          7h53m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running   0          7h49m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Running   0          7h53m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          7h52m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Running   0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running   0          7h49m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Running   0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          7h52m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Running   0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running   0          7h49m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Running   0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          7h52m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Running   0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running   0          7h49m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Running   0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          7h52m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Running   0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running   0          7h49m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Running   0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          7h52m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Running   0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running   0          7h49m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Running   0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-tpxmz     1/1     Running   0          7h52m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Running   0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running   0          7h50m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Running   0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
NAME                         READY   STATUS              RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-mtzp4     0/1     ContainerCreating   0          3s      <none>        ip-10-0-204-128.us-east-2.compute.internal   <none>           <none>
console-6685f6b866-tpxmz     1/1     Running             0          7h52m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Terminating         0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running             0          7h50m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Terminating         0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-sxnk4   0/1     ContainerCreating   0          3s      <none>        ip-10-0-175-182.us-east-2.compute.internal   <none>           <none>
NAME                         READY   STATUS        RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-mtzp4     1/1     Running       0          11s     10.130.0.81   ip-10-0-204-128.us-east-2.compute.internal   <none>           <none>
console-6685f6b866-tpxmz     1/1     Running       0          7h53m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Terminating   0          7h54m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running       0          7h50m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Terminating   0          7h54m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-sxnk4   1/1     Running       0          11s     10.131.1.9    ip-10-0-175-182.us-east-2.compute.internal   <none>           <none>
NAME                         READY   STATUS        RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-mtzp4     1/1     Running       0          20s     10.130.0.81   ip-10-0-204-128.us-east-2.compute.internal   <none>           <none>
console-6685f6b866-tpxmz     1/1     Running       0          7h53m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Terminating   0          7h55m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running       0          7h50m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Terminating   0          7h55m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-sxnk4   1/1     Running       0          20s     10.131.1.9    ip-10-0-175-182.us-east-2.compute.internal   <none>           <none>
NAME                         READY   STATUS        RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-mtzp4     1/1     Running       0          28s     10.130.0.81   ip-10-0-204-128.us-east-2.compute.internal   <none>           <none>
console-6685f6b866-tpxmz     1/1     Running       0          7h53m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Terminating   0          7h55m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running       0          7h50m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Terminating   0          7h55m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-sxnk4   1/1     Running       0          28s     10.131.1.9    ip-10-0-175-182.us-east-2.compute.internal   <none>           <none>
NAME                         READY   STATUS        RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
console-6685f6b866-mtzp4     1/1     Running       0          36s     10.130.0.81   ip-10-0-204-128.us-east-2.compute.internal   <none>           <none>
console-6685f6b866-tpxmz     1/1     Running       0          7h53m   10.129.0.72   ip-10-0-177-3.us-east-2.compute.internal     <none>           <none>
console-6685f6b866-x8bfc     1/1     Terminating   0          7h55m   10.128.0.71   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-674gr   1/1     Running       0          7h50m   10.129.2.21   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
downloads-5468b9795f-cksws   1/1     Terminating   0          7h55m   10.128.0.62   ip-10-0-148-60.us-east-2.compute.internal    <none>           <none>
downloads-5468b9795f-sxnk4   1/1     Running       0          36s     10.131.1.9    ip-10-0-175-182.us-east-2.compute.internal   <none>           <none>

in another terminal, we check console accessibility
$ while true; do curl -sk https://console-openshift-console.apps.qe-ui47-1223.qe.devcluster.openshift.com | grep -i 'Application is not available' && date -u ; sleep 1; done

During this period, console is always accessible 

Moving to VERIFIED on 4.7.0-0.nightly-2020-12-21-131655

Comment 8 errata-xmlrpc 2021-02-24 15:30:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.