Bug 1790407 - [3.11] No new ovs flows add to table 80 after restart sdn pod and create a allow-all networkpolicy
Summary: [3.11] No new ovs flows add to table 80 after restart sdn pod and create a al...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.11.z
Assignee: Juan Luis de Sousa-Valadas
QA Contact: huirwang
URL:
Whiteboard: SDN-QA-IMPACT
Depends On: 1790440 1790805 1821986
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-13 10:25 UTC by huirwang
Modified: 2020-10-22 11:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: During SDN restart the network policy cache was not initialized if a project had a previously created deny all rule in it. Consequence: Rules created after the SDN pod was restarted weren't detected on that scenario. Fix: Fix networkPolicy initialization for that scenario by seting npNameSpace.inUse = true Result: This scenario works as expected.
Clone Of:
: 1790440 (view as bug list)
Environment:
Last Closed: 2020-10-22 11:02:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sdn log, openflow (136.13 KB, application/gzip)
2020-01-13 10:30 UTC, huirwang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24620 0 None closed Bug 1790407: [3.11] Fix reinitialization of deny-all NetworkPolicy state on restart 2021-01-22 11:40:15 UTC
Red Hat Product Errata RHBA-2020:4170 0 None None None 2020-10-22 11:02:46 UTC

Description huirwang 2020-01-13 10:25:41 UTC
Description of problem:
No new ovs flows add to table 80 after restart sdn pod and create a allow-all networkpolicy

Version-Release number of selected component (if applicable):
Versions:
oc v3.11.160
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://huirwang-311master-etcd-nfs-1.int.0113-qmp.qe.rhcloud.com:8443
openshift v3.11.161
kubernetes v1.11.0+d4cacc0

How reproducible:
Always

Steps to Reproduce:
1. Create a project test3
2. Create two pods in this project.
oc get pods -o wide
NAME            READY     STATUS    RESTARTS   AGE       IP            NODE                 NOMINATED NODE
test-rc-9xmc6   1/1       Running   0          8s        10.129.0.53   huirwang-311node-1   <none>
test-rc-w9krb   1/1       Running   0          9s        10.129.0.52   huirwang-311node-1   <none>

3. Try to  access another pod from one pod.
oc rsh test-rc-9xmc6
/ $ ping 10.129.0.52
PING 10.129.0.52 (10.129.0.52) 56(84) bytes of data.
64 bytes from 10.129.0.52: icmp_seq=1 ttl=64 time=0.422 ms
64 bytes from 10.129.0.52: icmp_seq=2 ttl=64 time=0.069 ms

4. Create a deny-all networkpolicy in the project.
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/networkpolicy/defaultdeny-v1-semantic.yaml 
networkpolicy.extensions/default-deny created

5. Restart SDN pod which above pods located.
oc get pods -n openshift-sdn -o wide -l app=sdn
NAME        READY     STATUS    RESTARTS   AGE       IP             NODE                                 NOMINATED NODE
sdn-ffj6p   1/1       Running   0          18m       10.0.148.24    huirwang-311node-1                   <none>
sdn-l4mpk   1/1       Running   9          4h        10.0.151.45    huirwang-311master-etcd-nfs-1        <none>
sdn-qtv77   1/1       Running   10         4h        10.0.149.247   huirwang-311node-registry-router-1   <none>

oc delete pod sdn-ffj6p -n openshift-sdn
pod "sdn-ffj6p" deleted

6. After new sdn pod running, create a allow-all networkpolicy in the project.
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/networkpolicy/allow-all.yaml 
networkpolicy.extensions/allow-all created

oc get networkpolicy -n test3
NAME           POD-SELECTOR   AGE
allow-all      <none>         25m
default-deny   <none>         27m

7.  Try to  access another pod from one pod again.
oc rsh test-rc-9xmc6
/ $ ping 10.129.0.52
PING 10.129.0.52 (10.129.0.52) 56(84) bytes of data.
^C
--- 10.129.0.52 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Actual Result:
The pods in the test3 project cannot talk to each other.

Check the openflows  for test3 project. There is no new OVS flows added to table 80 for project test3.
oc get netnamespace test3
NAME      NETID     EGRESS IPS
test3     2873715   []
ovs-ofctl -O OpenFlow13 dump-flows br0 |  grep 2bd973
 cookie=0x0, duration=231.884s, table=20, n_packets=4, n_bytes=168, priority=100,arp,in_port=21,arp_spa=10.129.0.52,arp_sha=00:00:0a:81:00:34/00:00:ff:ff:ff:ff actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.820s, table=20, n_packets=4, n_bytes=168, priority=100,arp,in_port=22,arp_spa=10.129.0.53,arp_sha=00:00:0a:81:00:35/00:00:ff:ff:ff:ff actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.884s, table=20, n_packets=2, n_bytes=196, priority=100,ip,in_port=21,nw_src=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.820s, table=20, n_packets=6, n_bytes=588, priority=100,ip,in_port=22,nw_src=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=231.884s, table=25, n_packets=0, n_bytes=0, priority=100,ip,nw_src=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:30
 cookie=0x0, duration=231.820s, table=25, n_packets=0, n_bytes=0, priority=100,ip,nw_src=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG0[],goto_table:30
 cookie=0x0, duration=231.884s, table=70, n_packets=6, n_bytes=588, priority=100,ip,nw_dst=10.129.0.52 actions=load:0x2bd973->NXM_NX_REG1[],load:0x15->NXM_NX_REG2[],goto_table:80
 cookie=0x0, duration=231.820s, table=70, n_packets=2, n_bytes=196, priority=100,ip,nw_dst=10.129.0.53 actions=load:0x2bd973->NXM_NX_REG1[],load:0x16->NXM_NX_REG2[],goto_table:80


Actual results:
The pods cannot talk to each other.

Expected results:
The pods should talk to each in that project after add allow-all policy.

Additional info:
Note:
 If repeat above steps without restart sdn pod,  then no such issue.

Comment 1 huirwang 2020-01-13 10:30:48 UTC
Created attachment 1651809 [details]
sdn log, openflow

Comment 2 Juan Luis de Sousa-Valadas 2020-01-13 15:22:32 UTC
Hi Huir,
Do you still have this cluster running? If so can you please give me access?

Comment 3 huirwang 2020-01-14 03:22:21 UTC
Hi Juan,
Sent the env information to you in the mail.

Comment 4 Juan Luis de Sousa-Valadas 2020-01-15 15:26:36 UTC
Environment is gone. Can you reproduce it again, preferably on 4.4?
I'll need the environment a few days in order to understand the issue.

Comment 15 errata-xmlrpc 2020-10-22 11:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.306 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4170


Note You need to log in before you can comment on or make changes to this bug.