Bug 1411712 - [3.3] [Critical] Need help in investigating SF#01767652. "openvswitch rules are not applied"
Summary: [3.3] [Critical] Need help in investigating SF#01767652. "openvswitch rules a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.3.1
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-10 11:15 UTC by Alexander Koksharov
Modified: 2017-02-28 07:59 UTC (History)
7 users (show)

Fixed In Version: atomic-openshift-3.3.1.11-1.git.0.cba037c.el7
Doc Type: Bug Fix
Doc Text:
Previously, the EgressNetworkPolicy functionality might stop working on a node after restarting the node service. This has been fixed.
Clone Of:
Environment:
Last Closed: 2017-01-26 20:43:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ose pull 560 0 None None None 2017-01-18 13:58:29 UTC
Red Hat Product Errata RHBA-2017:0199 0 normal SHIPPED_LIVE OpenShift Container Platform 3.3.1.11 and 3.2.1.23 bug fix update 2017-01-27 01:41:56 UTC

Comment 1 Alexander Koksharov 2017-01-10 11:31:51 UTC
Description of problem:

Customer is creating egressNetworkPolicy in a project. But system is just removing all rules from OVS for the netnamespace and adding a single 
"drop all" rule. In node logs we see:
atomic-openshift-node[39469]: E0106 17:40:05.187734   39469 controller.go:506] multiple EgressNetworkPolicies in same network namespace (vwc-rec:default, m4d-rec:default) is not allowed; dropping all traffic

have checked:
- no global projects have egress policy defined.
- there are no joined projects.
- none of the projects have more than one egress policy defined.

Two separate environments (3.3.1.7 and 3.3.1.3) do suffer from the issue. 
At the beginning only one node was affected by this. But now both nodes have this issue. It looks like more project related.

Please advise on what to check/trace.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Dan Winship 2017-01-10 14:18:46 UTC
This is https://github.com/openshift/origin/pull/12045 and it's fixed in 3.4 (v3.4.0.32). We did not backport the fix to 3.3. The relevant code didn't change much between 3.3 and 3.4 so it would be possible to do, but I don't know what the policy is for 3.3 bugfixes at this point...

(There is no way to work around the bug other than backporting the bugfix.)

Comment 11 Meng Bo 2017-01-22 08:08:34 UTC
Tested on OCP 3.3.1.11

After adding multiple egresspolicy to a single namespace, the existing openflow rules will not be affected. And will add a new one to drop the traffic for specific project.

From node log:
Jan 22 03:03:10 node1 atomic-openshift-node[27026]: E0122 03:03:10.885757   27026 controller.go:506] multiple EgressNetwor
kPolicies in same network namespace (bmengp1:default, bmengp1:default2) is not allowed; dropping all traffic
Jan 22 03:03:10 node1 atomic-openshift-node[27026]: I0122 03:03:10.885809   27026 ovs.go:37] Executing: /usr/bin/ovs-ofctl
 -O OpenFlow13 del-flows br0 table=9, reg0=720494
Jan 22 03:03:10 node1 atomic-openshift-node[27026]: I0122 03:03:10.891489   27026 ovs.go:37] Executing: /usr/bin/ovs-ofctl
 -O OpenFlow13 add-flow br0 table=9, reg0=720494, priority=1, actions=drop


Check the openflow rules:
# ovs-ofctl dump-flows br0 -O openflow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=221.113s, table=0, n_packets=0, n_bytes=0, priority=200,arp,in_port=1,arp_spa=10.1.0.0/16,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]-
>NXM_NX_REG0[],goto_table:1
 cookie=0x0, duration=221.110s, table=0, n_packets=0, n_bytes=0, priority=200,ip,in_port=1,nw_src=10.1.0.0/16,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NX
M_NX_REG0[],goto_table:1 
 cookie=0x0, duration=221.105s, table=0, n_packets=45, n_bytes=1890, priority=200,arp,in_port=2,arp_spa=10.1.1.1,arp_tpa=10.1.0.0/16 actions=goto_table:5
 cookie=0x0, duration=221.102s, table=0, n_packets=3871, n_bytes=2493051, priority=200,ip,in_port=2 actions=goto_table:5
 cookie=0x0, duration=221.095s, table=0, n_packets=2, n_bytes=84, priority=200,arp,in_port=3,arp_spa=10.1.1.0/24 actions=goto_table:5
 cookie=0x0, duration=221.085s, table=0, n_packets=0, n_bytes=0, priority=200,ip,in_port=3,nw_src=10.1.1.0/24 actions=goto_table:5
 cookie=0x0, duration=221.108s, table=0, n_packets=0, n_bytes=0, priority=150,in_port=1 actions=drop
 cookie=0x0, duration=221.098s, table=0, n_packets=16, n_bytes=1296, priority=150,in_port=2 actions=drop
 cookie=0x0, duration=221.058s, table=0, n_packets=38, n_bytes=3132, priority=150,in_port=3 actions=drop
 cookie=0x0, duration=221.050s, table=0, n_packets=41, n_bytes=1722, priority=100,arp actions=goto_table:2
 cookie=0x0, duration=221.044s, table=0, n_packets=2231, n_bytes=239877, priority=100,ip actions=goto_table:2
 cookie=0x0, duration=221.004s, table=0, n_packets=45, n_bytes=3558, priority=0 actions=drop
 cookie=0x0, duration=220.782s, table=1, n_packets=0, n_bytes=0, priority=100,tun_src=10.8.174.9 actions=goto_table:5
 cookie=0x0, duration=221.001s, table=1, n_packets=0, n_bytes=0, priority=0 actions=drop
 cookie=0x0, duration=220.613s, table=2, n_packets=2, n_bytes=84, priority=100,arp,in_port=11,arp_spa=10.1.1.5,arp_sha=02:42:0a:01:01:05 actions=load:0->NXM_NX_REG0[],goto_table:5
 cookie=0x0, duration=220.604s, table=2, n_packets=318, n_bytes=28460, priority=100,ip,in_port=11,nw_src=10.1.1.5 actions=load:0->NXM_NX_REG0[],goto_table:3
 cookie=0x0, duration=220.992s, table=2, n_packets=0, n_bytes=0, priority=0 actions=drop
 cookie=0x0, duration=220.990s, table=3, n_packets=299, n_bytes=75681, priority=100,ip,nw_dst=172.30.0.0/16 actions=goto_table:4
 cookie=0x0, duration=220.981s, table=3, n_packets=1932, n_bytes=164196, priority=0 actions=goto_table:5
 cookie=0x0, duration=220.958s, table=4, n_packets=299, n_bytes=75681, priority=200,reg0=0 actions=output:2
...
...
...
 cookie=0x0, duration=220.913s, table=8, n_packets=0, n_bytes=0, priority=0 actions=drop
 cookie=0x0, duration=22.721s, table=9, n_packets=0, n_bytes=0, priority=1,reg0=0xafe6e actions=drop
 cookie=0x0, duration=220.911s, table=9, n_packets=496, n_bytes=35890, priority=0 actions=output:2
 cookie=0x0, duration=220.822s, table=253, n_packets=0, n_bytes=0, actions=note:01.01.00.00.00.00

Comment 12 Meng Bo 2017-01-22 11:05:32 UTC
Please ignore the comment#11 above.

Tested with following steps

To reproduce, tested on build 3.3.1.9
1. Create 10 projects
2. Add egress policy to each project
3. Check the openflow 
4. Restart openshift node service
5. Check the openflow again
Result:
In step 3, the openflow rules for the project created in table9 with following contents,
 cookie=0x0, duration=1.722s, table=9, n_packets=0, n_bytes=0, priority=2,ip,reg0=0x5d687c,nw_dst=172.16.120.0/24 actions=output:2
 cookie=0x0, duration=1.715s, table=9, n_packets=0, n_bytes=0, priority=1,ip,reg0=0x5d687c,nw_dst=10.66.140.0/24 actions=drop
In step 5, the openflow rules are changed by the restart to
 cookie=0x0, duration=1.704s, table=9, n_packets=0, n_bytes=0, priority=1,reg0=0x5d687c actions=drop

To verify, tested with the same steps above on build 3.3.1.11
The openflow rules for the project with egressnetworkpolicy will not be corrupted by the restart.

Comment 16 errata-xmlrpc 2017-01-26 20:43:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0199


Note You need to log in before you can comment on or make changes to this bug.