Description of problem: Related to https://bugzilla.redhat.com/show_bug.cgi?id=1809665 and https://bugzilla.redhat.com/show_bug.cgi?id=1809667, the OpenShift Web Console has availability problems. While the OpenShift Web Console pods do have preferred pod anti-affinity rules, these rules could be expanded to include zone anti-affinity for multi-zone clusters. In addition, it would be preferred to have at least 3 pods. Version-Release number of selected component (if applicable): OpenShift versions 4.3, 4.4 and 4.5 were tested. This problem may impact other versions. How reproducible: Steps to Reproduce: 1. Create an OpenShift version 4.4 cluster with worker nodes in three zones. 2. Start the following script once the cluster is created: while true; do curl -sk <console-url< | grep "Application is not available" && date -u sleep 1 done 3. Take down all of the nodes in a single zone to simulate a zone outage. Actual results: If the OpenShift Web Console pods are running in the impacted zone, then there will be a several minute outage while the pods are rescheduled to a healthy zone. Expected results: OpenShift Web Console has improved availability during the updates and zone outages.
Initially this looks to be the wrong component; moving to "dev console" as a first best guess. I don't see anything specifically related to edge routing in the description.
Haven't got time to investigate this issue so far. Will get to it next sprint.
1. Install a cluster with worker nodes in three different zones $ oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-140-219.us-east-2.compute.internal Ready worker 8h v1.20.0+87544c5 10.0.140.219 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 ip-10-0-148-60.us-east-2.compute.internal Ready master 8h v1.20.0+87544c5 10.0.148.60 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 ip-10-0-175-182.us-east-2.compute.internal Ready worker 8h v1.20.0+87544c5 10.0.175.182 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 ip-10-0-177-3.us-east-2.compute.internal Ready master 8h v1.20.0+87544c5 10.0.177.3 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 ip-10-0-204-128.us-east-2.compute.internal Ready master 8h v1.20.0+87544c5 10.0.204.128 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 ip-10-0-217-234.us-east-2.compute.internal Ready worker 7h58m v1.20.0+87544c5 10.0.217.234 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 $ for node in $(oc get node --no-headers | awk -F ' ' '{print $1}'); do echo "getting $node"; oc get node $node -o yaml | grep 'topology.kubernetes.io/zone: us-east-2'; done getting ip-10-0-140-219.us-east-2.compute.internal topology.kubernetes.io/zone: us-east-2a getting ip-10-0-148-60.us-east-2.compute.internal topology.kubernetes.io/zone: us-east-2a getting ip-10-0-175-182.us-east-2.compute.internal topology.kubernetes.io/zone: us-east-2b getting ip-10-0-177-3.us-east-2.compute.internal topology.kubernetes.io/zone: us-east-2b getting ip-10-0-204-128.us-east-2.compute.internal topology.kubernetes.io/zone: us-east-2c getting ip-10-0-217-234.us-east-2.compute.internal topology.kubernetes.io/zone: us-east-2c 2. Check console pods, console pods are in us-east-2b and us-east-2a zone $ oc get pods -n openshift-console -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 6h32m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 6h34m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> 3. Goes to AWS console and stop all worker nodes in us-east-2a zone, that is: ip-10-0-140-219.us-east-2.compute.internal(one worker node) and ip-10-0-148-60.us-east-2.compute.internal(one master node), at the same time we watch console pods and check console accessibility in one terminal, we watch console pods $ while true; do oc get pods -n openshift-console -o wide; sleep 5; done NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 7h51m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 7h53m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Running 0 7h53m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-mtzp4 0/1 ContainerCreating 0 3s <none> ip-10-0-204-128.us-east-2.compute.internal <none> <none> console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Terminating 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Terminating 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-sxnk4 0/1 ContainerCreating 0 3s <none> ip-10-0-175-182.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-mtzp4 1/1 Running 0 11s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none> console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Terminating 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Terminating 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-sxnk4 1/1 Running 0 11s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-mtzp4 1/1 Running 0 20s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none> console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Terminating 0 7h55m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Terminating 0 7h55m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-sxnk4 1/1 Running 0 20s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-mtzp4 1/1 Running 0 28s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none> console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Terminating 0 7h55m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Terminating 0 7h55m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-sxnk4 1/1 Running 0 28s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES console-6685f6b866-mtzp4 1/1 Running 0 36s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none> console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none> console-6685f6b866-x8bfc 1/1 Terminating 0 7h55m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none> downloads-5468b9795f-cksws 1/1 Terminating 0 7h55m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none> downloads-5468b9795f-sxnk4 1/1 Running 0 36s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none> in another terminal, we check console accessibility $ while true; do curl -sk https://console-openshift-console.apps.qe-ui47-1223.qe.devcluster.openshift.com | grep -i 'Application is not available' && date -u ; sleep 1; done During this period, console is always accessible Moving to VERIFIED on 4.7.0-0.nightly-2020-12-21-131655
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633