1790407 – [3.11] No new ovs flows add to table 80 after restart sdn pod and create a allow-all networkpolicy

Bug 1790407 - [3.11] No new ovs flows add to table 80 after restart sdn pod and create a allow-all networkpolicy

Summary: [3.11] No new ovs flows add to table 80 after restart sdn pod and create a al...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Juan Luis de Sousa-Valadas
QA Contact:	huirwang
Docs Contact:
URL:
Whiteboard:	SDN-QA-IMPACT
Depends On:	1790440 1790805 1821986
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-13 10:25 UTC by huirwang
Modified:	2020-10-22 11:02 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: During SDN restart the network policy cache was not initialized if a project had a previously created deny all rule in it. Consequence: Rules created after the SDN pod was restarted weren't detected on that scenario. Fix: Fix networkPolicy initialization for that scenario by seting npNameSpace.inUse = true Result: This scenario works as expected.
Clone Of:
Clones:	1790440 (view as bug list)
Environment:
Last Closed:	2020-10-22 11:02:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
sdn log, openflow (136.13 KB, application/gzip) 2020-01-13 10:30 UTC, huirwang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 24620	0	None	closed	Bug 1790407: [3.11] Fix reinitialization of deny-all NetworkPolicy state on restart	2021-01-22 11:40:15 UTC
Red Hat Product Errata	RHBA-2020:4170	0	None	None	None	2020-10-22 11:02:46 UTC

Description huirwang 2020-01-13 10:25:41 UTC

Description of problem:
No new ovs flows add to table 80 after restart sdn pod and create a allow-all networkpolicy

Version-Release number of selected component (if applicable):
Versions:
oc v3.11.160
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://huirwang-311master-etcd-nfs-1.int.0113-qmp.qe.rhcloud.com:8443
openshift v3.11.161
kubernetes v1.11.0+d4cacc0

How reproducible:
Always

Steps to Reproduce:
1. Create a project test3
2. Create two pods in this project.
oc get pods -o wide
NAME            READY     STATUS    RESTARTS   AGE       IP            NODE                 NOMINATED NODE
test-rc-9xmc6   1/1       Running   0          8s        10.129.0.53   huirwang-311node-1   <none>
test-rc-w9krb   1/1       Running   0          9s        10.129.0.52   huirwang-311node-1   <none>

3. Try to  access another pod from one pod.
oc rsh test-rc-9xmc6
/ $ ping 10.129.0.52
PING 10.129.0.52 (10.129.0.52) 56(84) bytes of data.
64 bytes from 10.129.0.52: icmp_seq=1 ttl=64 time=0.422 ms
64 bytes from 10.129.0.52: icmp_seq=2 ttl=64 time=0.069 ms

4. Create a deny-all networkpolicy in the project.
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/networkpolicy/defaultdeny-v1-semantic.yaml 
networkpolicy.extensions/default-deny created

5. Restart SDN pod which above pods located.
oc get pods -n openshift-sdn -o wide -l app=sdn
NAME        READY     STATUS    RESTARTS   AGE       IP             NODE                                 NOMINATED NODE
sdn-ffj6p   1/1       Running   0          18m       10.0.148.24    huirwang-311node-1                   <none>
sdn-l4mpk   1/1       Running   9          4h        10.0.151.45    huirwang-311master-etcd-nfs-1        <none>
sdn-qtv77   1/1       Running   10         4h        10.0.149.247   huirwang-311node-registry-router-1   <none>

oc delete pod sdn-ffj6p -n openshift-sdn
pod "sdn-ffj6p" deleted

6. After new sdn pod running, create a allow-all networkpolicy in the project.
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/networkpolicy/allow-all.yaml 
networkpolicy.extensions/allow-all created

oc get networkpolicy -n test3
NAME           POD-SELECTOR   AGE
allow-all      <none>         25m
default-deny   <none>         27m

7.  Try to  access another pod from one pod again.
oc rsh test-rc-9xmc6
/ $ ping 10.129.0.52
PING 10.129.0.52 (10.129.0.52) 56(84) bytes of data.
^C
--- 10.129.0.52 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Actual Result:
The pods in the test3 project cannot talk to each other.

Check the openflows  for test3 project. There is no new OVS flows added to table 80 for project test3.
oc get netnamespace test3
NAME      NETID     EGRESS IPS
test3     2873715   []
ovs-ofctl -O OpenFlow13 dump-flows br0 |  grep 2bd973
 cookie=0x0, duration=231.884s, table=20, n_packets=4, n_bytes=168, priority=100,arp,in_port=21,arp_spa=10.129.0.52,arp_sha=00:00:0a:81:00:34/00:00:ff:ff:ff:ff actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.820s, table=20, n_packets=4, n_bytes=168, priority=100,arp,in_port=22,arp_spa=10.129.0.53,arp_sha=00:00:0a:81:00:35/00:00:ff:ff:ff:ff actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.884s, table=20, n_packets=2, n_bytes=196, priority=100,ip,in_port=21,nw_src=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.820s, table=20, n_packets=6, n_bytes=588, priority=100,ip,in_port=22,nw_src=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.884s, table=25, n_packets=0, n_bytes=0, priority=100,ip,nw_src=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:30
 cookie=0x0, duration=231.820s, table=25, n_packets=0, n_bytes=0, priority=100,ip,nw_src=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:30
 cookie=0x0, duration=231.884s, table=70, n_packets=6, n_bytes=588, priority=100,ip,nw_dst=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG1[],load:0x15->NXM_NX_REG2[],goto_table:80
 cookie=0x0, duration=231.820s, table=70, n_packets=2, n_bytes=196, priority=100,ip,nw_dst=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG1[],load:0x16->NXM_NX_REG2[],goto_table:80


Actual results:
The pods cannot talk to each other.

Expected results:
The pods should talk to each in that project after add allow-all policy.

Additional info:
Note:
 If repeat above steps without restart sdn pod,  then no such issue.

Comment 1 huirwang 2020-01-13 10:30:48 UTC

Created attachment 1651809 [details]
sdn log, openflow

Comment 2 Juan Luis de Sousa-Valadas 2020-01-13 15:22:32 UTC

Hi Huir,
Do you still have this cluster running? If so can you please give me access?

Comment 3 huirwang 2020-01-14 03:22:21 UTC

Hi Juan,
Sent the env information to you in the mail.

Comment 4 Juan Luis de Sousa-Valadas 2020-01-15 15:26:36 UTC

Environment is gone. Can you reproduce it again, preferably on 4.4?
I'll need the environment a few days in order to understand the issue.

Comment 15 errata-xmlrpc 2020-10-22 11:02:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.306 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4170

Note You need to log in before you can comment on or make changes to this bug.