Bug 1960469

Summary:	[4.7] - vsphere keepalived fails with ingress controllers shards - causing incorrect routing
Product:	OpenShift Container Platform	Reporter:	Vladislav Walek <vwalek>
Component:	Machine Config Operator	Assignee:	Ben Nemec <bnemec>
Status:	CLOSED DUPLICATE	QA Contact:	Rio Liu <rioliu>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	4.7	CC:	bbennett, bnemec, bperkins, bverschu, dgautam, dkulkarn, jcallen, jerzhang, pescorza, pnguyen, skumari, smilner, yboaron
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-08-23 14:30:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vladislav Walek 2021-05-14 00:19:15 UTC

Description of problem:

In current configuration, the keepalived is configured to add the ingress-vip on any node which runs any ingress controller.
When using the router sharding, it causes that the keepalived will not configure the ingress-vip to the node which runs the "default" router, but randomly chooses any node running any (sharding) ingress controllers.
Customer has multiple ingress controller instances running in the cluster for route sharding. 

In the vsphere configuration, the keepalived is running on each host as a static pod. 
The static pod manifest is coming from the machine config.

For the worker machine config, the template of the keepalived configures that the ingress vip should run on the node where the health check script finds router listening on port 1936.

// keepalived.conf.tmpl

~~~
# TODO: Improve this check. The port is assumed to be alive.
# Need to assess what is the ramification if the port is not there.
vrrp_script chk_ingress {
    script "/usr/bin/timeout 0.9 /usr/bin/curl -o /dev/null -Lfs http://localhost:1936/healthz/ready"
    interval 1
    weight 50
}

{{$nonVirtualIP := .NonVirtualIP}}

vrrp_instance {{ .Cluster.Name }}_INGRESS {
    state BACKUP
    interface {{ .VRRPInterface }}
    virtual_router_id {{ .Cluster.IngressVirtualRouterID }}
    priority 40
    advert_int 1
    {{if .EnableUnicast}}
    unicast_src_ip {{.NonVirtualIP}}
    unicast_peer {
        {{range .IngressConfig.Peers}}
        {{if ne $nonVirtualIP .}}{{.}}{{end}}
        {{end}}
    }
    {{end}}
    authentication {
        auth_type PASS
        auth_pass {{ .Cluster.Name }}_ingress_vip
    }
    virtual_ipaddress {
        {{ .Cluster.IngressVIP }}/{{ .Cluster.VIPNetmask }}
    }
    track_script {
        chk_ingress
    }
}
~~~

The script has major flaw in case you use multiple ingress controllers (not only, if you have application running listening port 1936 on hostNetwork under path "/healthz/ready" the keepalived will think it is router).

//Workarounds

The immediate workaround is to remove the other ingress shards so keep-alive runs on the infra nodes hosting the default ingress.

The short workaround is to apply new machine config that removes the keepalived component from machines that do not host the default ingress.

Long term solution would be most likely change in the behavior.


Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.7
Vsphere IPI


How reproducible:
- create cluster with 4 worker nodes on vsphere IPI with keepalived
- create two ingress controllers - default and for sharding (schedule the pods to 2 workers each)
- check on which node the keepalived IP is configured - and confirm if this is the right node
- disable the default ingress controller and check if the IP bounces to the other worker nodes


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 17 Ben Nemec 2021-08-03 17:35:13 UTC

I'm going to duplicate this to 1988102 since that's where the fix is being tracked.

Comment 18 Sinny Kumari 2021-08-19 14:15:50 UTC

Assigning this to Ben as he would know better about this bug or feel free to reassign to relevant people in networking team.

Comment 19 Ben Nemec 2021-08-23 14:30:26 UTC

Hmm, I said I was going to duplicate this and then didn't. :-/

Let's see if I get it right this time...

*** This bug has been marked as a duplicate of bug 1988102 ***