Bug 1790407

Summary: [3.11] No new ovs flows add to table 80 after restart sdn pod and create a allow-all networkpolicy
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Juan Luis de Sousa-Valadas <jdesousa>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: anusaxen, bbennett, jdesousa, scuppett
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SDN-QA-IMPACT
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: During SDN restart the network policy cache was not initialized if a project had a previously created deny all rule in it. Consequence: Rules created after the SDN pod was restarted weren't detected on that scenario. Fix: Fix networkPolicy initialization for that scenario by seting npNameSpace.inUse = true Result: This scenario works as expected.
Story Points: ---
Clone Of:
: 1790440 (view as bug list) Environment:
Last Closed: 2020-10-22 11:02:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1790440, 1790805, 1821986    
Bug Blocks:    
Attachments:
Description Flags
sdn log, openflow none

Description huirwang 2020-01-13 10:25:41 UTC
Description of problem:
No new ovs flows add to table 80 after restart sdn pod and create a allow-all networkpolicy

Version-Release number of selected component (if applicable):
Versions:
oc v3.11.160
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://huirwang-311master-etcd-nfs-1.int.0113-qmp.qe.rhcloud.com:8443
openshift v3.11.161
kubernetes v1.11.0+d4cacc0

How reproducible:
Always

Steps to Reproduce:
1. Create a project test3
2. Create two pods in this project.
oc get pods -o wide
NAME            READY     STATUS    RESTARTS   AGE       IP            NODE                 NOMINATED NODE
test-rc-9xmc6   1/1       Running   0          8s        10.129.0.53   huirwang-311node-1   <none>
test-rc-w9krb   1/1       Running   0          9s        10.129.0.52   huirwang-311node-1   <none>

3. Try to  access another pod from one pod.
oc rsh test-rc-9xmc6
/ $ ping 10.129.0.52
PING 10.129.0.52 (10.129.0.52) 56(84) bytes of data.
64 bytes from 10.129.0.52: icmp_seq=1 ttl=64 time=0.422 ms
64 bytes from 10.129.0.52: icmp_seq=2 ttl=64 time=0.069 ms

4. Create a deny-all networkpolicy in the project.
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/networkpolicy/defaultdeny-v1-semantic.yaml 
networkpolicy.extensions/default-deny created

5. Restart SDN pod which above pods located.
oc get pods -n openshift-sdn -o wide -l app=sdn
NAME        READY     STATUS    RESTARTS   AGE       IP             NODE                                 NOMINATED NODE
sdn-ffj6p   1/1       Running   0          18m       10.0.148.24    huirwang-311node-1                   <none>
sdn-l4mpk   1/1       Running   9          4h        10.0.151.45    huirwang-311master-etcd-nfs-1        <none>
sdn-qtv77   1/1       Running   10         4h        10.0.149.247   huirwang-311node-registry-router-1   <none>

oc delete pod sdn-ffj6p -n openshift-sdn
pod "sdn-ffj6p" deleted

6. After new sdn pod running, create a allow-all networkpolicy in the project.
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/networkpolicy/allow-all.yaml 
networkpolicy.extensions/allow-all created

oc get networkpolicy -n test3
NAME           POD-SELECTOR   AGE
allow-all      <none>         25m
default-deny   <none>         27m

7.  Try to  access another pod from one pod again.
oc rsh test-rc-9xmc6
/ $ ping 10.129.0.52
PING 10.129.0.52 (10.129.0.52) 56(84) bytes of data.
^C
--- 10.129.0.52 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Actual Result:
The pods in the test3 project cannot talk to each other.

Check the openflows  for test3 project. There is no new OVS flows added to table 80 for project test3.
oc get netnamespace test3
NAME      NETID     EGRESS IPS
test3     2873715   []
ovs-ofctl -O OpenFlow13 dump-flows br0 |  grep 2bd973
 cookie=0x0, duration=231.884s, table=20, n_packets=4, n_bytes=168, priority=100,arp,in_port=21,arp_spa=10.129.0.52,arp_sha=00:00:0a:81:00:34/00:00:ff:ff:ff:ff actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.820s, table=20, n_packets=4, n_bytes=168, priority=100,arp,in_port=22,arp_spa=10.129.0.53,arp_sha=00:00:0a:81:00:35/00:00:ff:ff:ff:ff actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.884s, table=20, n_packets=2, n_bytes=196, priority=100,ip,in_port=21,nw_src=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.820s, table=20, n_packets=6, n_bytes=588, priority=100,ip,in_port=22,nw_src=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.884s, table=25, n_packets=0, n_bytes=0, priority=100,ip,nw_src=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:30
 cookie=0x0, duration=231.820s, table=25, n_packets=0, n_bytes=0, priority=100,ip,nw_src=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:30
 cookie=0x0, duration=231.884s, table=70, n_packets=6, n_bytes=588, priority=100,ip,nw_dst=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG1[],load:0x15->NXM_NX_REG2[],goto_table:80
 cookie=0x0, duration=231.820s, table=70, n_packets=2, n_bytes=196, priority=100,ip,nw_dst=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG1[],load:0x16->NXM_NX_REG2[],goto_table:80


Actual results:
The pods cannot talk to each other.

Expected results:
The pods should talk to each in that project after add allow-all policy.

Additional info:
Note:
 If repeat above steps without restart sdn pod,  then no such issue.

Comment 1 huirwang 2020-01-13 10:30:48 UTC
Created attachment 1651809 [details]
sdn log, openflow

Comment 2 Juan Luis de Sousa-Valadas 2020-01-13 15:22:32 UTC
Hi Huir,
Do you still have this cluster running? If so can you please give me access?

Comment 3 huirwang 2020-01-14 03:22:21 UTC
Hi Juan,
Sent the env information to you in the mail.

Comment 4 Juan Luis de Sousa-Valadas 2020-01-15 15:26:36 UTC
Environment is gone. Can you reproduce it again, preferably on 4.4?
I'll need the environment a few days in order to understand the issue.

Comment 15 errata-xmlrpc 2020-10-22 11:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.306 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4170