Bug 2089775
Summary: | keepalived can keep ingress VIP on wrong node under certain circumstances | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | nsmirnov |
Component: | Machine Config Operator | Assignee: | Ben Nemec <bnemec> |
Machine Config Operator sub component: | platform-baremetal | QA Contact: | Silvia Serafini <sserafin> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | kgarriso, sserafin, tsedovic |
Version: | 4.10 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 11:13:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
nsmirnov
2022-05-24 12:10:01 UTC
This is a bug in https://github.com/openshift/machine-config-operator/blob/031234ceb6f641ade2aa7d4176000960080a9e09/templates/common/on-prem/files/keepalived-script-default-ingress.yaml#L6 We need to tighten up the grep so it doesn't match incorrectly. cluster 3 master + 3 workers [kni@provisionhost-0-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-21-040754 True False 50s Cluster version is 4.11.0-0.nightly-2022-06-21-040754 master node IPs x.x.x.11-13, and worker node IPs x.x.x.110-112 ingressVIP: 192.168.123.10 [kni@provisionhost-0-0 ~]$ ssh core.123.10 -- hostname -s worker-0-0 [kni@provisionhost-0-0 ~]$ oc get pods -n openshift-ingress -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-5d7fbdd474-sd72q 1/1 Running 0 24m 192.168.123.111 worker-0-1.ocp-edge-cluster-0.qe.lab.redhat.com <none> <none> router-default-5d7fbdd474-xwq6k 1/1 Running 0 24m 12.168.123.110 worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com <none> <none> [core@worker-0-0 ~]$ sudo cat /var/log/containers/keepalived-worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com_openshift-kni-infra_keepalived-3e26b364e065e94a5dfb5743a6722f5ea2b69780c6c266209fd170acd6c22083.log | grep chk_default_ingress 2022-06-21T13:06:13.567216113+00:00 stderr F Tue Jun 21 13:06:13 2022: Script `chk_default_ingress` now returning 1 2022-06-21T13:06:13.567252575+00:00 stderr F Tue Jun 21 13:06:13 2022: VRRP_Script(chk_default_ingress) failed (exited with status 1) 2022-06-21T13:06:23.514668223+00:00 stderr F Tue Jun 21 13:06:23 2022: Script `chk_default_ingress` now returning 0 2022-06-21T13:06:33.482149496+00:00 stderr F Tue Jun 21 13:06:33 2022: VRRP_Script(chk_default_ingress) succeeded 2022-06-21T13:06:53.674191003+00:00 stderr F Tue Jun 21 13:06:53 2022: VRRP_Script(chk_default_ingress) considered successful on reload ingressVIP moved to worker-0-0 where is running default router Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |