Bug 1607395
| Summary: | Need to update egress IPs when node changes IP | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
| Component: | Networking | Assignee: | Dan Winship <danw> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Meng Bo <bmeng> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.11.0 | CC: | aos-bugs, cdc, dcbw, xtian |
| Target Milestone: | --- | ||
| Target Release: | 3.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The Egress IP code did not handle node IPs changing.
Consequence: In some cloud environments, sometimes when a node is removed and then brought back later, it will be given a different IP address. If there were egress IPs hosted on that node, then other nodes would not update their OVS flows to use the new node IP. (This bug was noticed during code review and may not have actually affected any customers.)
Fix: The Egress IP code tracks changes to node IPs now.
Result: If an egress node changes IP, other nodes will update their rules accordingly.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-12-21 15:16:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Is this a valid scenario? When the node IP changed, there are some other things will be broken too. Eg, the cert which master generated for node will be unavailable and will cause the tls handshake error on node to master communications. Hm... HostSubnets definitely get renumbered sometimes. If you have a cloud deployment with nodes coming and going, and nodes being dynamically assigned IPs via DHCP or something, then you can get something where two nodes reboot, and then come back up with their IP addresses swapped. We've had bugs involving that before. I don't know what happens with certificates in that case... (In reply to Dan Winship from comment #3) > Hm... HostSubnets definitely get renumbered sometimes. If you have a cloud > deployment with nodes coming and going, and nodes being dynamically assigned > IPs via DHCP or something, then you can get something where two nodes > reboot, and then come back up with their IP addresses swapped. We've had > bugs involving that before. I don't know what happens with certificates in > that case... Great question. I know I've seen this happen in the past too, but I don't know about the cert issue. Could the issue be add/delete nodes via Ansible? Don't we only support Ansible as the mechanism for modifying your cluster, including adding/removing nodes? Checked on v3.11.0-0.32.0 Issue has been fixed. The tun_dst field value will be updated according to the node ip changed. Before the egress node IP changed: # ovs-ofctl dump-flows br0 -O openflow13 | grep table=100 cookie=0x0, duration=1507.103s, table=100, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=10.66.140.72,tp_dst=53 actions=output:2 cookie=0x0, duration=1507.103s, table=100, n_packets=0, n_bytes=0, priority=200,udp,nw_dst=10.66.140.72,tp_dst=53 actions=output:2 cookie=0x0, duration=170.514s, table=100, n_packets=0, n_bytes=0, priority=100,ip,reg0=0x673eb6 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.140.77->tun_dst,output:1 cookie=0x0, duration=1507.103s, table=100, n_packets=0, n_bytes=0, priority=0 actions=goto_table:101 After the egress node IP changed: # ovs-ofctl dump-flows br0 -O openflow13 | grep table=100 cookie=0x0, duration=2294.917s, table=100, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=10.66.140.72,tp_dst=53 actions=output:2 cookie=0x0, duration=2294.917s, table=100, n_packets=0, n_bytes=0, priority=200,udp,nw_dst=10.66.140.72,tp_dst=53 actions=output:2 cookie=0x0, duration=190.496s, table=100, n_packets=0, n_bytes=0, priority=100,ip,reg0=0x673eb6 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.140.211->tun_dst,output:1 cookie=0x0, duration=2294.917s, table=100, n_packets=0, n_bytes=0, priority=0 actions=goto_table:101 Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content. |
Description of problem: If a node with EgressIPs reboots and comes up with a different IP address, other nodes will not update their OVS flows for the egress IP to point to the new node IP. Steps to Reproduce: 1. Create two nodes ("egress node" and "app node"), assign an egress IP to the egress node 2. Create a namespace, assign the egress IP to that namespace, create a pod in that namespace on the app node, confirm that egress traffic from the pod gets sent to the egress node to use the egress IP. 3. Stop the egress node's atomic-openshift-node service, change the node's IP address, and reboot it Actual results: The app node will still have OVS flows in table 100 pointing to the original egress node IP. The pod will no longer be able send egress traffic. Expected results: The app node updates itself to reflect the node IP change, and pod egress traffic keeps working.