Bug 1619650

Summary: Networkpolicy rules don't update reliably after a service restart
Product: OpenShift Container Platform Reporter: Sunil Choudhary <schoudha>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED CURRENTRELEASE QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.1CC: adeshpan, andreas_rother, aos-bugs, bbennett, danw, mhepburn, rkant, tmanor, weliang
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1694704 (view as bug list) Environment:
Last Closed: 2019-04-25 13:36:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sunil Choudhary 2018-08-21 11:53:31 UTC
Description of problem:

After few days, the tcp-connections between newly created namespaces and already existing namespaces are failing, although the corresponding namespace-label  and network-policy is correctly applied. A recreation of the destination pod seems to solve the issue though.

This looks like a bug where the ovs-flow-table doesn't get updated reliably when a new namespace with a matching label gets deployed.

Version-Release number of selected component (if applicable):

OCP 3.7.42-1

How reproducible:

It is random and after few days.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 7 Andreas Rother 2018-09-17 07:47:41 UTC
Any news about it?

Comment 8 Dan Winship 2018-09-19 17:28:33 UTC
Sorry there hasn't been any update here.

The sosreport doesn't reveal much more than what the reporter already figured out; for some reason, nodes are sometimes missing OVS flows that they ought to have. At default debug levels there isn't enough information logged to figure out why. We'll have to try to reproduce this locally.

Comment 10 Ben Bennett 2019-03-26 19:23:57 UTC
Weibin, can you please try to reproduce this?

Comment 12 Weibin Liang 2019-03-27 22:01:23 UTC
Dan,

Follow up your above steps, the final check failed in 3.7.70 but passed in 4.0.

Thanks for your detailed information.

Comment 14 Dan Winship 2019-04-25 13:36:37 UTC
fixed in master, and the customer case is closed