Bug 1464250

Summary: [3.5] Connections to services are not allowed when network policy enabled
Product: OpenShift Container Platform Reporter: Veer Muchandi <veer>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED NEXTRELEASE QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.0CC: aloughla, aos-bugs, bbennett, danw
Target Milestone: ---Keywords: Reopened, UpcomingRelease
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-29 19:54:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Veer Muchandi 2017-06-22 19:39:06 UTC
Description of problem:
When you connect from a pod to a service, the connection doesn't go through.

Version-Release number of selected component (if applicable):
 $ openshift version
   openshift v3.5.5.26
   kubernetes v1.5.2+43a9be4
   etcd 3.1.0

How reproducible:
can be reproduced

Steps to Reproduce:
1. Deploy a multi-tiered application. As an example I have used the lab and code from here
https://github.com/RedHatWorkshops/openshiftv3-workshop/blob/master/5_Using_templates.md
My intent is add a network policy to allow connects to mysql pod from the frontend pod.

2. Add network policies
a) oc annotate namespace ${ns}
'net.beta.kubernetes.io/network-policy={"ingress":{"isolation":"DefaultDeny"}}'
b) allow router to connect to your frontend pod by allowing traffic on port 8080
# cat allow-8080.yaml 
kind: NetworkPolicy
apiVersion: extensions/v1beta1
metadata:
  name: allow-8080-and-8443 
spec:
  podSelector:
  ingress:
  - ports:
    - protocol: TCP
      port: 8080
c) allow your mysql pod connections from the frontend pod. Remember to replace the frontend pod's label for app: <<your label>>
# cat allow-3306.yaml 
kind: NetworkPolicy
apiVersion: extensions/v1beta1
metadata:
  name: allow-3306
spec:
 podSelector:
 ingress:
 - from: 
   - podSelector:
       matchLabels:
         app: <<your label>>
 - ports:
   - protocol: TCP
     port: 3306



3. If you try to call the application URL. In my case it was http://frontend-networkpolicy.apps.devday.ocpcloud.com/, that would work
But if you try 
http://frontend-networkpolicy.apps.devday.ocpcloud.com/dbtest.php which connects to the database (via MYSQL service that proxies the pod), it won't work

4. Now find the pod's ip address (run oc get pods -o wide) and change the code for dbtest.php by getting into the terminal for the frontend pod to connect to the pod directly. And then test the <<yoururl>>/dbtest.php
The the call will go through

Actual results:

connection via service doesn't work. But connection directly to the pod works

Expected results:
connections via service should work

Additional info:
Look at the recordings to see how I tested it. Note there are 2 chapters. See the first 6 mins in Chapter1 and then Chapter 2.
https://bluejeans.com/s/URf12

Comment 1 Ben Bennett 2017-06-23 13:10:06 UTC
This is fixed in 3.6 by https://github.com/openshift/origin/pull/14466

We aren't back-porting the fix because network policy is in tech preview in 3.5 (and there are some hairy technical problems that make it risky).

Comment 2 Dan Winship 2017-06-23 15:37:09 UTC
(In reply to Ben Bennett from comment #1)
> This is fixed in 3.6 by https://github.com/openshift/origin/pull/14466

The documented limitation in 3.5 is that service IPs did not work if they were only allowed by a NetworkPolicy whose spec.podSelector was non-empty. That's not the case here.

This bug may end up also being fixed by the same change but it's not the known bug that that change was fixing.

Comment 3 Dan Winship 2017-06-29 19:54:51 UTC
OK, we can reproduce the bug on 3.5, but it works fine in 3.6.

(In 3.5, the packets make it from the php pod to the mysql pod, but the response packets get dropped, presumably due to conntrack not working the way we wanted it to. Anyway, the conntrack-related rules are totally redone in 3.6, and we have regression tests that should cover this case now too.)