Bug 1915295
Summary: | [BM][IP][Dualstack] Installation failed - operators report dial tcp 172.30.0.1:443: i/o timeout | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yurii Prokulevych <yprokule> |
Component: | Networking | Assignee: | Antonio Ojea <aojeagar> |
Networking sub component: | ovn-kubernetes | QA Contact: | Yurii Prokulevych <yprokule> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | high | CC: | achernet, anbhat, aojeagar, bbennett, kquinn, mcornea, sasha, wsun, zzhao |
Version: | 4.7 | Keywords: | TestBlocker |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The OVN-Kubernetes master node performs an initial synchronization to keep OVN and Kubernetes system databases in sync, however, OVN-Kubernetes was using the wrong source of truth. This issue only affected deploying the cluster in dual stack mode.
Consequence: Race conditions on OVN-Kubernetes startup lead to some of the Kubernetes services becoming unreachable because the bootstrap logic deleted them as they were considered orphans.
Fix: Use Kubernetes as a source of truth.
Result: OVN-Kubernetes starts correctly and keeps both OVN and Kubernetes in sync on startup.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:52:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yurii Prokulevych
2021-01-12 12:07:42 UTC
It is a race condition in the new service controller, it runs a function to deal with stale data, but it wrongly delete the right services. We can see in the logs: 2021-01-12T10:44:42.718Z|00363|nbctl|INFO|Running command run --if-exists -- remove load_balancer 0d44f11a-6c61-457b-8065-251f39c9ce22 vips "\"172.30.0.10:9154\"" I0112 10:44:42.721772 1 repair.go:106] Deleting non-existing Kubernetes vip 172.30.0.1:443 from OVN load balancer [0d44f11a-6c61-457b-8065-251f39c9ce22 58d9cf35-1960-4220-b224-51a0158cc12d c0d0698d-1e05-43ec-85c2-6458d25d46a9 7186b91e-cfa0-4509-b68a-b4f7a6fdba67] 2021-01-12T10:44:42.731Z|00368|nbctl|INFO|Running command run --if-exists -- remove load_balancer 0d44f11a-6c61-457b-8065-251f39c9ce22 vips "\"172.30.0.1:443\"" I0112 10:44:42.917421 1 repair.go:106] Deleting non-existing Kubernetes vip 172.30.0.10:53 from OVN load balancer [342e0ee1-7d07-4990-bb0a-60bc2cc5a3ad adb10c2a-8bd5-4ba9-9400-f5df73286251 cfde3642-134d-4e82-9e73-2e20471d4fb3 ac843e5a-89ac-475b-ab61-997c13ba8087] Fix in https://github.com/ovn-org/ovn-kubernetes/pull/1945 Adding the testblocker keyword since it's blocking the test for IPI on Bare Metal with Dualstack this issue should be fixed. move this issue to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |