Bug 1607619

Summary: HA is required for Auto Egress IP
Product: OpenShift Container Platform Reporter: Marc Curry <mcurry>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact: brice <bfallonf>
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, bbilgin, henrychiang, hpeng, pasik, smunilla, vigoyal, xtian
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-31 06:18:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marc Curry 2018-07-23 20:22:36 UTC
Description of problem:
I want automatically-assigned Egress IPs to be highly-available,
So that if a node goes down, egress traffic from the namespace keeps working.

Version-Release number of selected component (if applicable):
3.10.0

How reproducible:
Always

Steps to Reproduce:
1. Create an egress IP for a namespace
2. Fail the node holding the IP

Actual results:
The egress IP is no longer available.

Expected results:
If a namespace has multiple egress IPs, then it can be made highly-available by having egress IPs on multiple nodes, and using all of them, and ensuring that egress IPs on dead nodes are automatically removed/migrated.

Additional info:

Comment 1 Marc Curry 2018-07-23 20:24:39 UTC
PR:  https://github.com/openshift/ose/pull/1338

Comment 2 Jason Peng 2018-08-13 09:53:19 UTC
Found the same issue here with my OCP v3.10.14 with netwokrpolicy CNI plugin.

I'd tried in the scenario:

step1:
project with netnamespace set to [192.168.0.71, 192.168.0.70] and set hostsubnet with the two IPs on different nodes  --> calling external ap success (with 192.168.0.71)

step2:
shutdown node with egressIP 192.168.0.71 --> calling external ap fail (timeout)

Comment 3 Samuel Munilla 2018-08-22 17:04:57 UTC
Moving to MODIFIED to let it be attached to an advisory

Comment 5 Meng Bo 2018-08-23 10:04:07 UTC
I have tested the ha egressIP feature on build v3.10.35

The feature works fine, and no regression issue.

Verify the bug.

Comment 7 errata-xmlrpc 2018-08-31 06:18:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2376