Bug 1894216
| Summary: | Improve OpenShift Web Console availability | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Richard Theis <rtheis> |
| Component: | Management Console | Assignee: | Jakub Hadvig <jhadvig> |
| Status: | CLOSED ERRATA | QA Contact: | Yadan Pei <yapei> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.5 | CC: | aballant, aos-bugs, jokerman, nmukherj, spadgett, yapei |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Console's pod 'TopologyKey' is set to 'kubernetes.io/hostname'.
Consequence: Console availability problems during the updates and zone outages.
Fix: Use 'TopologyKey' 'topology.kubernetes.io/zone' instead of 'kubernetes.io/hostname'.
Result: OpenShift Web Console has improved availability during the updates and zone outages.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:30:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Richard Theis
2020-11-03 18:21:22 UTC
Initially this looks to be the wrong component; moving to "dev console" as a first best guess. I don't see anything specifically related to edge routing in the description. Haven't got time to investigate this issue so far. Will get to it next sprint. 1. Install a cluster with worker nodes in three different zones
$ oc get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-140-219.us-east-2.compute.internal Ready worker 8h v1.20.0+87544c5 10.0.140.219 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-148-60.us-east-2.compute.internal Ready master 8h v1.20.0+87544c5 10.0.148.60 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-175-182.us-east-2.compute.internal Ready worker 8h v1.20.0+87544c5 10.0.175.182 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-177-3.us-east-2.compute.internal Ready master 8h v1.20.0+87544c5 10.0.177.3 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-204-128.us-east-2.compute.internal Ready master 8h v1.20.0+87544c5 10.0.204.128 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-217-234.us-east-2.compute.internal Ready worker 7h58m v1.20.0+87544c5 10.0.217.234 <none> Red Hat Enterprise Linux CoreOS 47.83.202012190438-0 (Ootpa) 4.18.0-240.8.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
$ for node in $(oc get node --no-headers | awk -F ' ' '{print $1}'); do echo "getting $node"; oc get node $node -o yaml | grep 'topology.kubernetes.io/zone: us-east-2'; done
getting ip-10-0-140-219.us-east-2.compute.internal
topology.kubernetes.io/zone: us-east-2a
getting ip-10-0-148-60.us-east-2.compute.internal
topology.kubernetes.io/zone: us-east-2a
getting ip-10-0-175-182.us-east-2.compute.internal
topology.kubernetes.io/zone: us-east-2b
getting ip-10-0-177-3.us-east-2.compute.internal
topology.kubernetes.io/zone: us-east-2b
getting ip-10-0-204-128.us-east-2.compute.internal
topology.kubernetes.io/zone: us-east-2c
getting ip-10-0-217-234.us-east-2.compute.internal
topology.kubernetes.io/zone: us-east-2c
2. Check console pods, console pods are in us-east-2b and us-east-2a zone
$ oc get pods -n openshift-console -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 6h32m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 6h34m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
3. Goes to AWS console and stop all worker nodes in us-east-2a zone, that is: ip-10-0-140-219.us-east-2.compute.internal(one worker node) and ip-10-0-148-60.us-east-2.compute.internal(one master node), at the same time we watch console pods and check console accessibility
in one terminal, we watch console pods
$ while true; do oc get pods -n openshift-console -o wide; sleep 5; done
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 7h51m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 7h53m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Running 0 7h53m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h49m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Running 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Running 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-mtzp4 0/1 ContainerCreating 0 3s <none> ip-10-0-204-128.us-east-2.compute.internal <none> <none>
console-6685f6b866-tpxmz 1/1 Running 0 7h52m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Terminating 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Terminating 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-sxnk4 0/1 ContainerCreating 0 3s <none> ip-10-0-175-182.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-mtzp4 1/1 Running 0 11s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none>
console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Terminating 0 7h54m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Terminating 0 7h54m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-sxnk4 1/1 Running 0 11s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-mtzp4 1/1 Running 0 20s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none>
console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Terminating 0 7h55m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Terminating 0 7h55m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-sxnk4 1/1 Running 0 20s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-mtzp4 1/1 Running 0 28s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none>
console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Terminating 0 7h55m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Terminating 0 7h55m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-sxnk4 1/1 Running 0 28s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-6685f6b866-mtzp4 1/1 Running 0 36s 10.130.0.81 ip-10-0-204-128.us-east-2.compute.internal <none> <none>
console-6685f6b866-tpxmz 1/1 Running 0 7h53m 10.129.0.72 ip-10-0-177-3.us-east-2.compute.internal <none> <none>
console-6685f6b866-x8bfc 1/1 Terminating 0 7h55m 10.128.0.71 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-674gr 1/1 Running 0 7h50m 10.129.2.21 ip-10-0-217-234.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-cksws 1/1 Terminating 0 7h55m 10.128.0.62 ip-10-0-148-60.us-east-2.compute.internal <none> <none>
downloads-5468b9795f-sxnk4 1/1 Running 0 36s 10.131.1.9 ip-10-0-175-182.us-east-2.compute.internal <none> <none>
in another terminal, we check console accessibility
$ while true; do curl -sk https://console-openshift-console.apps.qe-ui47-1223.qe.devcluster.openshift.com | grep -i 'Application is not available' && date -u ; sleep 1; done
During this period, console is always accessible
Moving to VERIFIED on 4.7.0-0.nightly-2020-12-21-131655
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |