Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1794187

Summary:	Documentation for EgressIP contains incomplete information
Product:	OpenShift Container Platform	Reporter:	Eric Sauer <esauer>
Component:	Documentation	Assignee:	Jason Boxman <jboxman>
Status:	CLOSED CURRENTRELEASE	QA Contact:	huirwang
Severity:	low	Docs Contact:	Vikram Goyal <vigoyal>
Priority:	low
Version:	4.2.0	CC:	aos-bugs, danw, jokerman
Target Milestone:	---
Target Release:	4.2.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-03-24 21:34:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Sauer 2020-01-22 21:04:31 UTC

Document URL: 

https://docs.openshift.com/container-platform/4.2/networking/openshift-sdn/assigning-egress-ips.html#nw-enabling-automatic-egress-ips_egress-ips

Section Number and Name: 

Enabling automatically assigned egress IPs for a namespace

Describe the issue: 

The `Procedure` section walks through setting up egressIPs in a namespace, and contains several points that are incorrect.

1. They state that for each project you must "Specify a single egress IP address. Using multiple IP addresses is not supported." This is untrue, at least when running in public cloud environments, where you MUST specify multiple EgressIPs per project in order to achieve high availability of the solution.

2. They state that you can configure egressIPs to be hosted across multiple nodes. The example given is to apply the same CIDR to `node1` and `node2`. These instructions only work in on-prem clusters, where OpenShift can use the operating system to talk to switches and dynamically bind IP addresses. In public cloud the opposite is true, you must use the provided networking APIs to get IPs provisioned and statically bound to an instance. This means that one IP can only be bound to a single instance.


Suggestions for improvement: 

I believe we need to document different EgressIP scenarios for on-prem/traditional infrastructure vs. public cloud/IaaS as whats supported and what's not is almost the exact opposite.

Additional information:

Comment 1 Eric Sauer 2020-01-22 21:14:52 UTC

Another note on this.. We refer to the two "approaches" in the Egress doc as "the automated way" and "the manual way". I would argue that they are both equally automated, but what you automate and what you define are different. In the "Automated way" you assign a single IP to the project, and depend on the platform to float that IP dynamically across multiple nodes. In the "Manual way" you stick specific IP addresses to specific nodes, but you provide a list of IPs to the project, and depend on the cluster to automatically select one based on availability. Perhaps a better way to label them is that one is more applicable for traditional infrastructure, the other for cloud/iaas.

Comment 2 Jason Boxman 2020-03-03 02:32:10 UTC

Hi Dan,

Does Eric's suggestion make sense?

Thanks!

Comment 3 Dan Winship 2020-03-03 13:19:20 UTC

Hm... I answered this question somewhere else recently... maybe someone had asked me on Slack?

There are two modes to egress IPs, which we used to call "semi automatic" and "fully automatic" although I realize now that makes them sound like machine guns, but anyway...

In the "semi automatic" mode, the administrator directly assigns particular egress IP addresses to particular nodes, by editing the egressIPs field on the nodes' HostSubnet objects. In this mode, as Eric notes, you can assign multiple egressIPs to a NetNamespace, where each egressIP has to be on a different node, and then if the first egressIP's node goes down, OCP will switch to using the second egressIP, etc.

In the "fully automatic" mode, the administrator instead sets the "egressCIDRs" field on the HostSubnets, and then OCP assigns IPs to nodes based on that. In this mode, each NetNamespace can only have a single egress IP, and if the node hosting that egress IP goes down, OCP will move that egress IP to another node that also has a compatible egressCIDRs value.

The "fully automatic" mode is nicer, easier to use, requires fewer total spare IP addresses, etc, but it's not really cloud friendly because OCP doesn't know how to make the cloud move the IP address assignment from one node to another.

Comment 4 Jason Boxman 2020-03-04 17:23:23 UTC

Hi Eric,

With regard to:

> 1. They state that for each project you must "Specify a single egress IP address. Using multiple IP addresses is not supported."
> This is untrue, at least when running in public cloud environments, where you MUST specify multiple EgressIPs per project
> in order to achieve high availability of the solution.

According to this BZ[0] it is not possible to use multiple IP addresses with netnamespace egressIPs.

But maybe I misunderstand what I'm reading? Or it depends on the platform the cluster is installed on? In any case, that documentation update was suggested and approved by Casey after surfacing from QE. So that's how that clarification came to be.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1709254

Comment 5 Dan Winship 2020-03-04 19:27:50 UTC

To clarify, the exact rule is:

- If you are using "fully-automatic" egress IPs (setting the egressCIDRs field on HostSubnets and letting OCP fill in the HostSubnet's egressIPs field), then the corresponding NetNamespaces must have only a single value in egressIPs

- If you are using "semi-automatic" egress IPs (leaving egressCIDRs unset and setting the egressIPs field on HostSubnet directly), then you can have NetNamespaces with multiple egressIPs values.

Comment 7 Jason Boxman 2020-03-05 19:32:05 UTC

I created a PR[0] for this bug.

[0] https://github.com/openshift/openshift-docs/pull/20238

Comment 8 Jason Boxman 2020-03-09 18:52:50 UTC

I've made some progress on this and I think the result will be clearer than what we have currently.

I'm waiting on an engineering review, then it goes through a peer review process and a review by QE.

Thanks!