Bug 2107309

Summary: IPTable Chains that are created by OVNK are flushed on startup; causes traffic disruption
Product: OpenShift Container Platform Reporter: Surya Seetharaman <surya>
Component: NetworkingAssignee: Surya Seetharaman <surya>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DEFERRED Docs Contact:
Severity: medium    
Priority: medium CC: akaris, mduarted, trozet
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:24:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Surya Seetharaman 2022-07-14 17:47:32 UTC
Description of problem:

Everytime we restart ovnkube-node chains are flushed or rules in chains are flushed. Let's be smarter...

W0714 15:49:33.531277   13288 gateway_iptables.go:72] SURYA REINSERTING: OVN-KUBE-SNAT-MGMTPORT:--:nat:--:[-p TCP --dport 31248 -j RETURN]                                   
W0714 15:49:33.554023   13288 gateway_iptables.go:72] SURYA REINSERTING: OVN-KUBE-NODEPORT:--:nat:--:[-p TCP -m addrtype --dst-type LOCAL --dport 31248 -j DNAT --to-destinat
ion 10.96.42.192:80]                                                                                                                                                         
W0714 15:49:33.579462   13288 gateway_iptables.go:72] SURYA REINSERTING: OVN-KUBE-ETP:--:nat:--:[-p TCP -m addrtype --dst-type LOCAL --dport 31248 -j DNAT --to-destination 1
69.254.169.3:31248]                                                                                                                                                          
W0714 15:49:33.600767   13288 gateway_iptables.go:72] SURYA REINSERTING: OVN-KUBE-SNAT-MGMTPORT:--:nat:--:[-p TCP --dport 31248 -j RETURN]                                   
W0714 15:49:33.632875   13288 management-port_linux.go:234] SURYA CHAIN NOT FOUND OVN-KUBE-SNAT-MGMTPORT:--:false:--:<nil>:--:ovn-k8s-mp0:--:10.244.0.2:--:[-o ovn-k8s-mp0 -m comment --comment OVN SNAT to Management Port -j SNAT --to-source 10.244.0.2]                                                                                               
W0714 15:49:33.635952   13288 management-port_linux.go:312] missing management port nat rule in chain OVN-KUBE-SNAT-MGMTPORT, adding it


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Surya Seetharaman 2022-07-14 20:12:25 UTC
We seem to be having two problems:

1) Flushing in two places, SyncServices and tearDownManagementPortConfig.
2) Every 30seconds we are trying to run setupManagementPortIPFamilyConfig

[surya@hidden-temple github.com]$ oc logs -n ovn-kubernetes ovnkube-node-l9njw ovnkube-node | grep SURYA                                                                     
I0714 20:05:06.113419     874 management-port_linux.go:150] SURYA: cleaning chain                                                                                            
I0714 20:05:06.117330     874 management-port_linux.go:254] SURYA: setupManagementPortConfig running 10.244.2.1...                                                           
I0714 20:05:06.117364     874 management-port_linux.go:167] SURYA: setupManagementPortIPFamilyConfig running ovn-k8s-mp0...                                                  
W0714 20:05:06.126949     874 management-port_linux.go:234] SURYA: Rule to check if Exists: [-o ovn-k8s-mp0 -j SNAT --to-source 10.244.2.2 -m comment --comment OVN SNAT to Management Port]
W0714 20:05:06.131727     874 management-port_linux.go:238] SURYA: Reinserting rule [-o ovn-k8s-mp0 -j SNAT --to-source 10.244.2.2 -m comment --comment OVN SNAT to Management Port]:
I0714 20:05:07.280129     874 gateway_shared_intf.go:639] SURYA: Recreating IP Table Rules for chain OVN-KUBE-ITP: []
I0714 20:05:07.280140     874 gateway_iptables.go:397] SURYA: cleaning chain OVN-KUBE-ITP
I0714 20:05:07.284499     874 gateway_shared_intf.go:639] SURYA: Recreating IP Table Rules for chain OVN-KUBE-NODEPORT: []
I0714 20:05:07.284529     874 gateway_iptables.go:397] SURYA: cleaning chain OVN-KUBE-NODEPORT
I0714 20:05:07.290503     874 gateway_shared_intf.go:639] SURYA: Recreating IP Table Rules for chain OVN-KUBE-EXTERNALIP: []
I0714 20:05:07.290525     874 gateway_iptables.go:397] SURYA: cleaning chain OVN-KUBE-EXTERNALIP
I0714 20:05:07.294458     874 gateway_shared_intf.go:639] SURYA: Recreating IP Table Rules for chain OVN-KUBE-ETP: []
I0714 20:05:07.294483     874 gateway_iptables.go:397] SURYA: cleaning chain OVN-KUBE-ETP
I0714 20:05:07.298037     874 gateway_shared_intf.go:639] SURYA: Recreating IP Table Rules for chain OVN-KUBE-SNAT-MGMTPORT: []
I0714 20:05:07.298062     874 gateway_iptables.go:397] SURYA: cleaning chain OVN-KUBE-SNAT-MGMTPORT
I0714 20:05:07.301732     874 gateway_iptables.go:397] SURYA: cleaning chain OVN-KUBE-ITP
I0714 20:05:07.562033     874 management-port.go:105] SURYA: CheckManagementPortHealth startup with, let's kick start the goroutine...
I0714 20:05:07.562081     874 management-port_linux.go:314] SURYA: checkManagementPortHealth running...
I0714 20:05:07.562255     874 management-port_linux.go:254] SURYA: setupManagementPortConfig running 10.244.2.1...
I0714 20:05:07.562281     874 management-port_linux.go:167] SURYA: setupManagementPortIPFamilyConfig running ovn-k8s-mp0...
W0714 20:05:07.569380     874 management-port_linux.go:234] SURYA: Rule to check if Exists: [-o ovn-k8s-mp0 -j SNAT --to-source 10.244.2.2 -m comment --comment OVN SNAT to Management Port]
W0714 20:05:07.572651     874 management-port_linux.go:238] SURYA: Reinserting rule [-o ovn-k8s-mp0 -j SNAT --to-source 10.244.2.2 -m comment --comment OVN SNAT to Management Port]:
I0714 20:05:37.576018     874 management-port_linux.go:314] SURYA: checkManagementPortHealth running...
I0714 20:05:37.576112     874 management-port_linux.go:254] SURYA: setupManagementPortConfig running 10.244.2.1...
I0714 20:05:37.576155     874 management-port_linux.go:167] SURYA: setupManagementPortIPFamilyConfig running ovn-k8s-mp0...
W0714 20:05:37.597227     874 management-port_linux.go:234] SURYA: Rule to check if Exists: [-o ovn-k8s-mp0 -j SNAT --to-source 10.244.2.2 -m comment --comment OVN SNAT to Management Port]
I0714 20:06:07.602437     874 management-port_linux.go:314] SURYA: checkManagementPortHealth running...
I0714 20:06:07.602471     874 management-port_linux.go:254] SURYA: setupManagementPortConfig running 10.244.2.1...
I0714 20:06:07.602484     874 management-port_linux.go:167] SURYA: setupManagementPortIPFamilyConfig running ovn-k8s-mp0...
W0714 20:06:07.607201     874 management-port_linux.go:234] SURYA: Rule to check if Exists: [-o ovn-k8s-mp0 -j SNAT --to-source 10.244.2.2 -m comment --comment OVN SNAT to Management Port]
I0714 20:06:37.609740     874 management-port_linux.go:314] SURYA: checkManagementPortHealth running...
I0714 20:06:37.609767     874 management-port_linux.go:254] SURYA: setupManagementPortConfig running 10.244.2.1...
I0714 20:06:37.609782     874 management-port_linux.go:167] SURYA: setupManagementPortIPFamilyConfig running ovn-k8s-mp0...
W0714 20:06:37.615162     874 management-port_linux.go:234] SURYA: Rule to check if Exists: [-o ovn-k8s-mp0 -j SNAT --to-source 10.244.2.2 -m comment --comment OVN SNAT to Management Port]

Comment 2 Tim Rozet 2022-11-17 21:45:40 UTC
Miguel, Surya, is this still an issue or should we close this?

Comment 3 Miguel Duarte Barroso 2022-11-18 09:23:07 UTC
(In reply to Tim Rozet from comment #2)
> Miguel, Surya, is this still an issue or should we close this?

I think we can close; this would be an optimization to the current workflow, but as it is, I think we can live with it.

I've pushed draft PR [0] to get some feedback originally - it fell through the cracks ... Never the less, there's possibly something salvageable there.

[0] - https://github.com/ovn-org/ovn-kubernetes/pull/3106

Comment 4 Miguel Duarte Barroso 2022-11-18 09:24:57 UTC
Opening an issue upstream with this info would be positive IMO.

If Surya is OK with closing this bugzilla, I can open the upstream issue.

Comment 5 Surya Seetharaman 2022-11-18 09:31:05 UTC
I don't think we should close this. This will have a great impact on our CI upgrades downstream. This is still a issue.
Having said that Miguel if you do't have cycles please reassign to default assignee and we can see if someone else has cycles. If you want to continue to work on this that's fine too, Tim is just actively trying to purge our backlog.

Comment 6 Surya Seetharaman 2022-11-18 09:38:31 UTC
Also its partly my fault for not having reviewed your draft which is up already... Once you rebase I will try to prioritize this. I think this is a needed cleanup which should be backported to 4.12 where OVNK is default.

Comment 7 Shiftzilla 2023-03-09 01:24:48 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9397