Description of problem:
If you create two ipfailover instances and try to make it so they run on the same node, it fails because both try to use hostPort 1985.
Version-Release number of selected component (if applicable):
Origin 1.5 and below
Steps to Reproduce:
1. Assuming two nodes
2. oadm ipfailover ipf-1 --virtual-ips=10.1.1.1 --replicas=2
3. oadm ipfailover ipf-2 --virtual-ips=10.1.1.2 --replicas=2 --vrrp-id-offset=1
Half of the ipfailover containers will not be scheduled because they collide on port 1985
All four containers should be scheduled and running.
Nothing actually uses port 1985. We think that it was set that way to provide a cheap form of anti-affinity. But we should use the proper pod anti-affinity (it is described for the router in https://docs.openshift.org/latest/admin_guide/manage_nodes.html#pod-anti-affinity BUT I am not certain that it is done with an annotation now, please research if it is supported as a core capability) to spread the pods across nodes.
email@example.com digging around I found:
Pod anti-affinity does not work in Openshift (Jan 2017)
Also port 1985 is not anti-affinity since the pod is started and just gets stuck on the port. The pod is not moved to another node.
Do you know the intended use for port 1985?
@phil - there is nothing binding to the port 1985 if that's what you are asking.
As Ben mentioned, it was a cheap way (and the only way when this was done) to ensure that two ipfailover pods are not placed on the same node/host when kubernetes scheduled the pods. You can't run two keepaliveds on the same node as it would clash managing the same network/interfaces and the VRRP messages [src/dest would be the same for 2 pods].
Also note the ipfailover pod _has_ to run in host networking mode.
@ramr - We can run two _different_ configurations of keepalived (i.e. managing different addresses and with different virtual_router_ids) on the same node, right? The problem is just if you run the same config on one node that it would fight. Phil and I tried the same config and different configs, and with the same config keepalived detects a problem and logs vociferously. With different configs all was good (and we are already setting the virtual_router_id differently).
http://serverfault.com/questions/473058/keepaliveds-virtual-router-id-should-it-be-unique-per-node seems to back up this assessment.
Proposal: Continue to use port number with each ipf config having a different port. The port for a config could be port 1985 (current prot) + the vrrp_id in the config. vrrp_id is in the range 0-255 so the actual port would be in the range 1985-2240 (assuming that range is available.) There is one port per config taken from that range.
When pod affinity, anti-affinity become GA we can switch to that. Affinity in 1.4, 1.5 is alpha using annotations, in 1.6 it becomes beta using a field. When beta arrives alpha is deprecated.
I think we need to figure out an upgrade path for customers that use this. Hopefully there are not very many. The port based configurations would continue work going forward. The affinity base solution would require customer mods to the dc as part of upgrades.
pcameron: That seems reasonable. If we do that in 1.5 (and perhaps also apply the change in 1.4) then we should not have an upgrade problem. We do need to flip to anti-affinity at some point, but the port hack doesn't hurt. So let's do that now, and make a card for the future anti-affinity change so we don't lose track of it.
Please consider the port range we are using... I'm not sure if there's a good reason to start at 1980 (and we need to make sure there's nothing between 1980 and 1980 + 255 that we care about). I suspect there is, so we need to work out if there's a better range to use. Since nothing actually binds to the ports, we could use a high range (in the dynamic assigned area) to avoid conflicts.
In merge queue.
Commit pushed to master at https://github.com/openshift/origin
Allow multiple ipfailover configs on same node
The ipfailover pods for a given configuration must run on
different nodes. We are using the ServicePort as a mechanism
to prevent multiple pods for same configuration from starting
on the same node. Since pods for different configurations can
run on the same node a different ServicePort is used for each
In the future, this may be changed to pod anti-affinity.
Signed-off-by: Phil Cameron <firstname.lastname@example.org>
This has been merged into ocp and is in OCP v220.127.116.11 or newer.
Verified this bug on
When creating two ipfailover pod in same node. both they are working well.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.