Bug 1802557 - EgressIP multiple static IPs, node with the egressIP will detect itself as offline
Summary: EgressIP multiple static IPs, node with the egressIP will detect itself as of...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.5.0
Assignee: Juan Luis de Sousa-Valadas
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-13 11:58 UTC by Juan Luis de Sousa-Valadas
Modified: 2020-07-13 17:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Node detects its self IP incorrectly. Consequence: Node won't own the egressIP it's assigned. Fix: Get the nodeIP from the K8S API instead. Result: Problem fixed in 4.5
Clone Of:
Environment:
Last Closed: 2020-07-13 17:15:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 535 0 None closed Bug 1802557: Pass --node-name and --node-ip to openshift-sdn-node 2021-01-31 10:30:57 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:15:44 UTC

Description Juan Luis de Sousa-Valadas 2020-02-13 11:58:01 UTC
Description of problem:
$ cat oc-hostsubnet.txt | grep infra0[12]
infra01-<removed>   infra01-<removed>    192.168.52.153   10.129.4.0/23    []             [192.168.52.166, 192.168.52.168, 192.168.52.233]
infra02-<removed>   infra02-<removed>    192.168.52.154   10.129.6.0/23    []             [192.168.52.167, 192.168.52.174, 192.168.52.234]

$ grep <project> oc-netnamespace.txt 
<project>   1716405    [192.168.52.168, 192.168.52.174]

1716405 = 0x1a30b5

$ cat ovs.dump-flows.txt | grep 0x1a30b5 | grep table=100
 cookie=0x0, duration=42740.175s, table=100, n_packets=65820, n_bytes=4911204, priority=100,ip,reg0=0x1a30b5 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:192.168.52.154->tun_dst,output:1

Traffic is sent through the other node.

checking the logs, the node detects itself as offline

$ grep egress -i sdn.logs.txt | grep -v 'firewall egress network policy' | fgrep -v 'Watch close - *v1.EgressNetworkPolicy' | tail -5
I0130 09:06:22.339162   12757 egressip.go:419] VNID 420850 cannot use egress IP 192.168.52.233 on offline node 192.168.52.153
I0130 09:06:22.339190   12757 egressip.go:419] VNID 12988830 cannot use egress IP 192.168.52.166 on offline node 192.168.52.153
I0130 09:36:22.338826   12757 egressip.go:419] VNID 1716405 cannot use egress IP 192.168.52.168 on offline node 192.168.52.153
I0130 09:36:22.338874   12757 egressip.go:419] VNID 420850 cannot use egress IP 192.168.52.233 on offline node 192.168.52.153
I0130 09:36:22.338959   12757 egressip.go:419] VNID 12988830 cannot use egress IP 192.168.52.166 on offline node 192.168.52.153


Version-Release number of selected component (if applicable):
3.11.129 I still have to check if there are fixes for this in ocp 4 and newer 3.11 releases

How reproducible:
Happens every now and then, unknown at this point

Steps to Reproduce:
????

Actual results:
infra01 must always detects itself as offline

Expected results:
infra01 must always detect infra01 as online.

Comment 6 Dan Winship 2020-03-03 21:07:48 UTC
The egress IP code only monitors the health of *other* nodes (pkg/network/node/egressip.go:ClaimEgressIP(); a node is only added to the vxlanMonitor if its IP is not eip.localIP). So... the node is failing to recognize its own IP here and considering its own egressIPs to be foreign...

They're probably doing something slightly unusual with internal vs external node IPs or something which is confusing the egressip code. (Not sure if this is going to be something they can fix by changing their configuration or if this will require a bugfix to the egressIP code.)

Comment 12 huirwang 2020-03-23 07:56:37 UTC
New logs with the nodeName and nodeIP
oc logs sdn-prw9k -n openshift-sdn
I0323 07:29:22.272826  292779 node.go:147] Initializing SDN node "ip-10-0-173-216.us-east-2.compute.internal" (10.0.173.216) of type "redhat/openshift-ovs-networkpolicy"

Comment 19 errata-xmlrpc 2020-07-13 17:15:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.