Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2054838

Summary: ICNI2.0 removing spk namespace and recreating it stop sending to external Gateway
Product: OpenShift Container Platform Reporter: Mohamed Mahmoud <mmahmoud>
Component: NetworkingAssignee: Mohamed Mahmoud <mmahmoud>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: kholtz, manrodri, raperez, surya, trozet, u.alonsocamaro
Version: 4.7   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-16 11:58:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must-gather none

Description Mohamed Mahmoud 2022-02-15 19:29:20 UTC
Created attachment 1861342 [details]
must-gather

Created attachment 1861342 [details]
must-gather

Created attachment 1861342 [details]
must-gather

Description of problem:

using OCP 4.7.24 release with manually deleteing spk namespace then recreate it, traffic no longer goes to external Gateway
Version-Release number of selected component (if applicable):

Topology
                    (2)            (3)
          coreDNS ------> SPK-DNS46 ------> External DNS
           / 
          /   
   (1)  / webhook 
      /
    /      (4)              (5)           (6)              (7)
test POD -------> SPK-DATA ------> NAT46 -------> IPv6 VS -------> IPv4 HTTP pool 
(mttool)                                                            (hello world)

(4) is the problematic traffic path 
How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Mohamed Mahmoud 2022-02-15 19:57:29 UTC
1- when test started
I0207 16:30:20.842505       1 kube.go:63] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.131.0.74/23"],"mac_address":"0a:58:0a:83:00:4a","gateway_ips":["10.131.0.1"],"ip_address":"10.131.0.74/23","gateway_ip":"10.131.0.1"}}] on pod spk-test/mttool

2022-02-07T16:30:20.928Z|06637|nbctl|INFO|Running command run -- add address_set 8d7c6de8-af1d-412a-b32f-c7a774362139 addresses "\"10.131.0.74\""

I0207 16:30:20.935650       1 pods.go:302] [spk-test/mttool] addLogicalPort took 93.874082ms

.....
2022-02-07T16:36:29.882Z|06681|nbctl|INFO|Running command run --may-exist --policy=src-ip --ecmp-symmetric-reply -- lr-route-add GR_worker-0 10.131.0.74/32 192.168.56.181

2022-02-07T16:36:29.897Z|06682|nbctl|INFO|Running command run -- lr-policy-add ovn_cluster_router 501 "inport == \"rtos-worker-0\" && ip4.src == 10.131.0.74 && ip4.dst != 10.128.0.0/14" reroute 100.64.0.5

2- the spk namespace is deleted 
I0208 13:59:35.576859       1 pods.go:91] Deleting pod: spk-test/mttool
2022-02-08T13:59:35.583Z|17256|nbctl|INFO|Running command run -- remove address_set 8d7c6de8-af1d-412a-b32f-c7a774362139 addresses "\"10.131.0.74\""
2022-02-08T13:59:35.601Z|17257|nbctl|INFO|Running command run -- lr-policy-del ovn_cluster_router 501 "inport == \"rtos-worker-0\" && ip4.src == 10.131.0.74 && ip4.dst != 10.128.0.0/14"
2022-02-08T13:59:35.611Z|17258|nbctl|INFO|Running command run --if-exists --policy=src-ip -- lr-route-del GR_worker-0 10.131.0.74/32 192.168.56.181

3- then readded
I0208 14:54:09.770323       1 kube.go:63] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.129.2.139/23"],"mac_address":"0a:58:0a:81:02:8b","gateway_ips":["10.129.2.1"],"ip_address":"10.129.2.139/23","gateway_ip":"10.129.2.1"}}] on pod spk-test/mttool
2022-02-08T14:54:09.857Z|20105|nbctl|INFO|Running command run -- add address_set 16cd2322-0490-4192-ae33-85f6afac345d addresses "\"10.129.2.139\""
I0208 14:54:09.864570       1 pods.go:302] [spk-test/mttool] addLogicalPort took 94.850926ms
....
2022-02-08T16:53:56.274Z|21101|nbctl|INFO|Running command run --may-exist --policy=src-ip --ecmp-symmetric-reply -- lr-route-add GR_worker-2 10.129.2.139/32 192.168.56.181
2022-02-08T16:53:56.288Z|21102|nbctl|INFO|Running command run -- lr-policy-add ovn_cluster_router 501 "inport == \"rtos-worker-2\" && ip4.src == 10.129.2.139 && ip4.dst != 10.128.0.0/14" reroute 100.64.0.7

Comment 4 Mohamed Mahmoud 2022-02-15 22:43:34 UTC
I0214 20:52:59.053922       1 egressgw.go:38] External gateway pod: f5-tmm-cf5bcbfc5-b68pq, detected for namespace(s) spk-test
E0214 20:53:09.054337       1 ovn.go:647] timeout waiting for namespace event
I0214 20:53:09.054439       1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"spk-data", Name:"f5-tmm-cf5bcbfc5-b68pq", UID:"441a06c7-d7ca-4212-af57-b22290934cdc", APIVersion:"v1", ResourceVersion:"7407544", FieldPath:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' timeout waiting for namespace event

I0214 20:54:09.342332       1 ovn.go:584] [441a06c7-d7ca-4212-af57-b22290934cdc/spk-data/f5-tmm-cf5bcbfc5-b68pq] retry pod setup
I0214 20:54:09.342412       1 pods.go:338] LSP already exists for port: spk-data_f5-tmm-cf5bcbfc5-b68pq
I0214 20:54:09.346778       1 egressgw.go:38] External gateway pod: f5-tmm-cf5bcbfc5-b68pq, detected for namespace(s) spk-test
I0214 20:54:19.347183       1 pods.go:302] [spk-data/f5-tmm-cf5bcbfc5-b68pq] addLogicalPort took 10.00478674s
E0214 20:54:19.347238       1 ovn.go:635] failed to handle external GW check: timeout waiting for namespace event
I0214 20:54:19.347289       1 ovn.go:590] [441a06c7-d7ca-4212-af57-b22290934cdc/spk-data/f5-tmm-cf5bcbfc5-b68pq] setup retry failed; will try again later
I0214 20:54:19.347406       1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"spk-data", Name:"f5-tmm-cf5bcbfc5-b68pq", UID:"441a06c7-d7ca-4212-af57-b22290934cdc", APIVersion:"v1", ResourceVersion:"7407544", FieldPath:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' failed to handle external GW check: timeout waiting for namespace event

……………

I0214 21:16:19.451081       1 ovn.go:584] [441a06c7-d7ca-4212-af57-b22290934cdc/spk-data/f5-tmm-cf5bcbfc5-b68pq] retry pod setup
I0214 21:16:19.451265       1 pods.go:338] LSP already exists for port: spk-data_f5-tmm-cf5bcbfc5-b68pq
I0214 21:16:19.456325       1 egressgw.go:38] External gateway pod: f5-tmm-cf5bcbfc5-b68pq, detected for namespace(s) spk-test
I0214 21:16:19.457157       1 pods.go:302] [spk-data/f5-tmm-cf5bcbfc5-b68pq] addLogicalPort took 6.021397ms
I0214 21:16:19.457191       1 ovn.go:587] [441a06c7-d7ca-4212-af57-b22290934cdc/spk-data/f5-tmm-cf5bcbfc5-b68pq] pod setup successful

so it stayed with those timeout events waiting for ns for > 16 mins

Comment 6 Mohamed Mahmoud 2022-02-16 11:58:31 UTC

*** This bug has been marked as a duplicate of bug 1991445 ***