Bug 1835646 - build failing due to egressnetworkpolicy
Summary: build failing due to egressnetworkpolicy
Keywords:
Status: CLOSED DUPLICATE of bug 1850060
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.6.0
Assignee: Juan Luis de Sousa-Valadas
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-14 09:11 UTC by Sudarshan Chaudhari
Modified: 2024-10-01 16:35 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-07 14:37:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sudarshan Chaudhari 2020-05-14 09:11:30 UTC
Description of problem:
We have the requirement to block access to internet and many addresses from the application pods. 

For this we are using EgresNetworkPolocy Project level Firewall created in the project every time any developer requested the project. 

The default egressnetworkpolicy we are using is:
~~~
apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
  name: default-rules
spec:
  egress:
  - to:
      dnsName: registry.redhat.io
    type: Allow
  - to:
      dnsName: registry.connect.redhat.com
    type: Allow
  - to:
      dnsName: github.com
    type: Allow
  - to:
      dnsName: registry.access.redhat.com
    type: Allow
  - to:
      dnsName: quay.io
    type: Allow
  - to:
      dnsName: image-registry.openshift-image-registry.svc
    type: Allow
  - to:
      cidrSelector: 172.30.0.0/16
    type: Allow
  - to:
      cidrSelector: 10.128.0.0/14
    type: Allow
  - to:
      cidrSelector: 10.0.0.0/16
    type: Allow
  - to:
      cidrSelector: 0.0.0.0/0
    type: Deny
~~~

after the policy is created if we deploy any new all, the build is failing:
~~~
$ oc new-project policy-test
$ oc create -f egresspolicy.yaml 
egressnetworkpolicy.network.openshift.io/default-rules created
$ oc new-app --name=httpd httpd:2.4~https://github.com/sclorg/httpd-ex.git
--> Found image 156bc0f (6 weeks old) in image stream "openshift/httpd" under tag "2.4" for "httpd:2.4"
. . . .
--> Creating resources ...
    imagestream.image.openshift.io "httpd" created
    buildconfig.build.openshift.io "httpd" created
    deploymentconfig.apps.openshift.io "httpd" created
    service "httpd" created
--> Success
    Build scheduled, use 'oc logs -f bc/httpd' to track its progress.
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose svc/httpd' 
    Run 'oc status' to view your app.
$ oc get pods
NAME                 READY   STATUS       RESTARTS   AGE
httpd-1-build        0/1     Init:0/2     0          2m42s
$ oc logs httpd-1-build
Error from server (BadRequest): container "sti-build" in pod "httpd-1-build" is waiting to start: PodInitializing
~~~

the pods will stay in initializing state and will eventually fail. 

Version-Release number of selected component (if applicable):
Any OCP version(tested this on OCP 4.4 and OCP 4.3)

How reproducible:
Always

Actual results:
App deployment is failing.

Expected results:
the app should be deployed successfully. 


Additional info:
The documentation seems to provide an overview of how to add porject level firewall but fails to mention which default networks should be added in order to make sure the apps are deployed properly. 

Is there any other dnsName or cidr that needs to be added? we have tried adding ClusterCIDR, MachineCIDR, and ServiceCIDR and tried to add other registry URL which are used to access the application resources.

Comment 6 Gabe Montero 2020-05-20 13:42:15 UTC
I was able to reproduce this locally using the customer's configuration.

It did in fact get hung up in the git clone.

Based on the error messages, it may be a misconfiguration of the egressnetworkpolicy.
I am attempting to correct it now.

If I can, I'll post my working copy here.

Otherwise, I'll send to the SDN team for assistance.

Comment 7 Gabe Montero 2020-05-20 14:14:47 UTC
OK sending to the SDN team for guidance.  What I attempted:

0) so the error message from the build pod is F0520 13:37:55.646394       1 helpers.go:115] error: fatal: unable to access 'https://github.com/sclorg/httpd-ex.git/': Failed connect to github.com:443; Connection timed out

so I suspect there is an issue with mapping "github.com:443" with entry in the EgressNetworkPolicy "dnsName: github.com"

1) I could not successfully create an EgressNetworkPolicy with a port after reviewing https://github.com/openshift/api/blob/master/network/v1/types.go#L201-L211 and https://docs.openshift.com/container-platform/4.4/networking/openshift_sdn/configuring-egress-firewall.html#egressnetworkpolicy-example_configuring-an-egress-firewall

I tried 

a) still using dnsName, specifying a port via "github.com/443" and "github.com:443"

b) I did a nslookup of github.com, and tried various flavors replacing segments of that IP with 0, followed by a /443, but could not get that to work

2) I do see in the upstream https://kubernetes.io/docs/concepts/services-networking/network-policies/ examples that allow for explicit citing of a port ... should the customer use that?

3) Lastly, I tried updating the BC so the git url was http:// based vs. https:// based to try and remove the port element.  The git operation still failed, though with a very odd error reported:

          F0520 14:07:12.827653       1 helpers.go:115] error: RPC failed; result=7, HTTP code = 0
          fatal: The remote end hung up unexpectedly


Again, for repro's
- oc new-app --name=httpd httpd:2.4~https://github.com/sclorg/httpd-ex.git
will initially create the build/deployment artifacts
- to launch subsequent builds, you can run 

oc start-build httpd --build-loglevel=10

running oc get pod <build pod name> -o yaml should show the error messages I noted in this update.

Also, oc logs bc/httpd 
will dump the detailed trace of the latest build.

Comment 12 Ricardo Carrillo Cruz 2020-06-16 17:40:48 UTC
Hi there

Apologies, I was out in vaca.
I will try to repro tomorrow, apparently there are issues right now in CI to deploy Azure clusters.

Comment 13 Ricardo Carrillo Cruz 2020-06-22 10:25:05 UTC
I'm unable to provision clusters in Azure for the past few days.
I will raise this internally, maybe I can get one to debug this from QE.

Comment 14 Ricardo Carrillo Cruz 2020-06-23 13:52:18 UTC
I also get the issue while cloning the repo:

[ricky@ricky-laptop ~]$ oc describe pod httpd-1-build                                                                                                                               [227/1033]
Name:         httpd-1-build                                                                                                                                                                   
Namespace:    policy-test                                                                                                                                                                     
Priority:     0                                                                                                                                                                               
Node:         ip-10-0-164-10.us-west-2.compute.internal/10.0.164.10                                                                                                                           
Start Time:   Tue, 23 Jun 2020 15:46:10 +0200                                                                                                                                                 
Labels:       openshift.io/build.name=httpd-1                                                                                                                                                 
Annotations:  k8s.v1.cni.cncf.io/network-status:                                                                                                                                              
                [{                                                                                                                                                                            
                    "name": "openshift-sdn",                                                                                                                                                  
                    "interface": "eth0",                                                                                                                                                      
                    "ips": [                                                                                                                                                                  
                        "10.129.2.10"                                                                                                                                                         
                    ],                                                                                                                                                                        
                    "default": true,                                                                                                                                                          
                    "dns": {}                                                                                                                                                                 
                }]                                                                                                                                                                            
              k8s.v1.cni.cncf.io/networks-status:                                                                                                                                             
                [{                                                                                                                                                                            
                    "name": "openshift-sdn",                                                                                                                                                  
                    "interface": "eth0",                                                                                                                                                      
                    "ips": [                                                                                                                                                                  
                        "10.129.2.10"                                                                                                                                                         
                    ],                                                                                                                                                                        
                    "default": true,                                                                                                                                                          
                    "dns": {}                                                                                                                                                                 
                }]                                                                                                                                                                            
              openshift.io/build.name: httpd-1                                                                                                                                                
              openshift.io/scc: privileged                                                                                                                                                    
Status:       Failed                                                                                                                                                                          
IP:           10.129.2.10                                                                                                                                                                     
IPs:                                                                                                                                                                                          
  IP:           10.129.2.10                                                                                                                                                                   
Controlled By:  Build/httpd-1                                                                                                                                                                 
Init Containers:                                                                               
  git-clone:                                                                                   
    Container ID:  cri-o://2fbfcb603efc557913918e22e89aa7115c082ece040bab01368807954640423e    
    Image:         registry.svc.ci.openshift.org/ocp/4.5-2020-06-23-043949@sha256:579be1a4b551c32690f221641c5f4c18a54022e4571a45055696b3bada85fd1a
    Image ID:      registry.svc.ci.openshift.org/ocp/4.5-2020-06-23-043949@sha256:579be1a4b551c32690f221641c5f4c18a54022e4571a45055696b3bada85fd1a
    Port:          <none>                                                                      
    Host Port:     <none>                                                                      
    Command:                                                                                   
      openshift-git-clone                                                                      
    Args:                                                                                      
      --loglevel=0                                                                             
    State:      Terminated                                                                     
      Reason:   Error                                                                          
      Message:  Cloning "https://github.com/sclorg/httpd-ex.git" ...                           
error: RPC failed; result=7, HTTP code = 0                                                                                                                                                    
fatal: The remote end hung up unexpectedly

Comment 15 Ricardo Carrillo Cruz 2020-06-24 09:44:38 UTC
I think the issue is that github.com resolves to different IPs.
So when you create the policy, the rule resolves to the IP in that moment but it doesn't necessarily
mean when the pod runs the git clone will resolve the same IP.

Egress network policy are typically used when the endpoint is well-known and doesn't change.
As a workaround, I'd run in a loop 'nslookup github.com' from within a pod in a cluster to gather the IPs.
Then add those to the egress network policy as IPs, not as dnsName.

Comment 16 Sudarshan Chaudhari 2020-07-07 03:26:11 UTC
Hello Ricardo, 


I have checked the EgressNetowrkPolicy documentation and the CIDR section in the CRD, it is expected that we explicitly add the network Range and not the specific IP address. 

As per your update, I understand that you are asking to add the specific IP address in the policy. 

Can you please confirm this?
https://github.com/openshift/api/blob/c3161eb8205e1ee8a63b32269ae9d7283041bbfc/network/v1/004-egressnetworkpolicy-crd.yaml#L60-L73

I have tried running the loop for listing the github.com Ip addresses and observed that the github.com IP address was changing frequently within seconds and were from different networks:
############################################
$ for i in {1..100}; do date;host -tA github.com; sleep 5; done
Tue Jul  7 08:44:07 IST 2020
github.com has address 140.82.114.3
Tue Jul  7 08:44:12 IST 2020
github.com has address 140.82.112.4
Tue Jul  7 08:44:17 IST 2020
github.com has address 140.82.112.4
--Skipped-Duplicate--
Tue Jul  7 08:46:09 IST 2020
github.com has address 140.82.114.3
--Skipped-Duplicate--
Tue Jul  7 08:47:10 IST 2020
github.com has address 140.82.112.4
--Skipped-Duplicate--
Tue Jul  7 08:47:30 IST 2020
github.com has address 13.234.176.102
############################################

The curl and git pull/clone referred to the same IP address at a specific time. 
############################################
$ GIT_CURL_VERBOSE=1 git pull origin master
* Couldn't find host github.com in the .netrc file; using defaults
* About to connect() to github.com port 443 (#0)
*   Trying 140.82.112.4...
* Connected to github.com (140.82.112.4) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* 	subject: CN=github.com,O="GitHub, Inc.",L=San Francisco,ST=California,C=US
* 	start date: May 05 00:00:00 2020 GMT
* 	expire date: May 10 12:00:00 2022 GMT
* 	common name: github.com
* 	issuer: CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
> GET /openshift/csi-driver-nfs.git/info/refs?service=git-upload-pack HTTP/1.1
User-Agent: git/1.8.3.1
Host: github.com
Accept: */*
Accept-Encoding: gzip
Pragma: no-cache

< HTTP/1.1 200 OK
< Server: GitHub Babel 2.0
< Content-Type: application/x-git-upload-pack-advertisement
< Transfer-Encoding: chunked
< Expires: Fri, 01 Jan 1980 00:00:00 GMT
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< Vary: Accept-Encoding
< X-GitHub-Request-Id: 0554:037F:F7CDFC:1BD24A9:5F03E7E3
< X-Frame-Options: DENY
< 
* Connection #0 to host github.com left intact
From https://github.com/openshift/csi-driver-nfs
 * branch            master     -> FETCH_HEAD
Already up-to-date.
$ host -tA github.com
github.com has address 140.82.112.4
############################################

Can you please help clarify on:
- How can we add specific Ip addresses in the egressNetworkPolicy as the policy do not allow IP without network mask?
- can you provide any sample config of the policy?

Comment 17 Ricardo Carrillo Cruz 2020-07-14 07:51:57 UTC
Have you tried using /32 mask?

Comment 21 Ricardo Carrillo Cruz 2020-08-03 11:48:28 UTC
Unassigning as I'm on long vacation.

Comment 24 Juan Luis de Sousa-Valadas 2020-08-07 14:37:49 UTC
I'm closing this as a duplicate, once this one is closed please request a backport for that for the version your customer needs.

Code is similar between versions so backporting to older versions shouldn't be too problematic.

*** This bug has been marked as a duplicate of bug 1850060 ***


Note You need to log in before you can comment on or make changes to this bug.