Bug 1800969
Summary: | keepalived conf file generated for IPv6 cluster contains health checks via IPv4 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Victor Voronkov <vvoronko> |
Component: | Installer | Assignee: | Yossi Boaron <yboaron> |
Installer sub component: | OpenShift on Bare Metal IPI | QA Contact: | Victor Voronkov <vvoronko> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | augol, bperkins, bschmaus, stbenjam, yboaron |
Version: | 4.3.z | Keywords: | Triaged |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
In baremetal-ipi Keepalived is used to provide IP failover for both API-VIP and INGRESS-VIP, Keepalived runs repeatedly a script to monitor local component (e.g: ocp api-server) status to decide which node should own the VIP. In IPV6 deployment Keepalived uses IPV4 local address (i.e: 127.0.0.1) to check local component status.
Consequence:
In IPv6 deployments, Keepalived may receive a wrong component staus
Fix:
Update Keepalived script to use localhost which should be resolved to 127.0.0.1 in V4 and ::1 in V6
Result:
Keepalived monitors local component status using the correct local IP address.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:14:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Victor Voronkov
2020-02-09 15:46:15 UTC
In the latest 4.4 tree, Keepalived check scripts in keepalived.conf use 'localhost' and not '0'. See [1], while in 4.3 tree '0' is used. Did you run the test with 4.3 or 4.4 (cause you filed the bug on 4.4)? In [2] u can find the keepalived.conf from my env (see [3] for OC version) If we should run IPV6 on 4.3, I guess we should backport this fix to 4.3. [1] https://github.com/openshift/machine-config-operator/blob/master/templates/master/00-master/baremetal/files/baremetal-keepalived-keepalived.yaml#L21 [2] vrrp_script chk_ocp { script "/usr/bin/curl -o /dev/null -kLs https://localhost:6443/readyz" interval 1 weight 50 } vrrp_script chk_dns { script "/usr/bin/host -t SRV _etcd-server-ssl._tcp.ostest.test.metalkube.org localhost" interval 1 weight 50 } # TODO: Improve this check. The port is assumed to be alive. # Need to assess what is the ramification if the port is not there. vrrp_script chk_ingress { script "/usr/bin/curl -o /dev/null -kLs http://localhost:1936/healthz" interval 1 weight 50 } [3] [kni@worker-0 dev-scripts]$ oc version Client Version: 4.4.0-0.ci-2020-02-08-192852 Server Version: 4.4.0-0.ci-2020-02-08-192852 Kubernetes Version: v1.17.1 [kni@worker-0 dev-scripts]$ My bad, Yossi, I tested on 4.3.0-0.nightly-2020-02-03-115336-ipv6.1 Fixing the bug OCP version and yes, we test on IPv6, so backport is required. Moving this to 4.5. To get this change in 4.4 at this point, you'll need to fix it in 4.5, and clone this bug to 4.4. Verified on 4.4.0-0.nightly-2020-03-11-212258 all healthchecks resolve localhost to IPv6 localhost = ::1 cat /etc/keepalived/keepalived.conf rrp_script chk_ocp { script "/usr/bin/curl -o /dev/null -kLs https://localhost:6443/readyz" interval 1 weight 50 } vrrp_script chk_dns { script "/usr/bin/host -t SRV _etcd-server-ssl._tcp.ocp-edge-cluster.qe.lab.redhat.com localhost" interval 1 weight 50 } # TODO: Improve this check. The port is assumed to be alive. # Need to assess what is the ramification if the port is not there. vrrp_script chk_ingress { script "/usr/bin/curl -o /dev/null -kLs http://localhost:1936/healthz" interval 1 weight 50 } curl -kLs https://localhost:6443/readyz -vvv * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 6443 (#0) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |