Description of problem: We have the requirement to block access to internet and many addresses from the application pods. For this we are using EgresNetworkPolocy Project level Firewall created in the project every time any developer requested the project. The default egressnetworkpolicy we are using is: ~~~ apiVersion: network.openshift.io/v1 kind: EgressNetworkPolicy metadata: name: default-rules spec: egress: - to: dnsName: registry.redhat.io type: Allow - to: dnsName: registry.connect.redhat.com type: Allow - to: dnsName: github.com type: Allow - to: dnsName: registry.access.redhat.com type: Allow - to: dnsName: quay.io type: Allow - to: dnsName: image-registry.openshift-image-registry.svc type: Allow - to: cidrSelector: 172.30.0.0/16 type: Allow - to: cidrSelector: 10.128.0.0/14 type: Allow - to: cidrSelector: 10.0.0.0/16 type: Allow - to: cidrSelector: 0.0.0.0/0 type: Deny ~~~ after the policy is created if we deploy any new all, the build is failing: ~~~ $ oc new-project policy-test $ oc create -f egresspolicy.yaml egressnetworkpolicy.network.openshift.io/default-rules created $ oc new-app --name=httpd httpd:2.4~https://github.com/sclorg/httpd-ex.git --> Found image 156bc0f (6 weeks old) in image stream "openshift/httpd" under tag "2.4" for "httpd:2.4" . . . . --> Creating resources ... imagestream.image.openshift.io "httpd" created buildconfig.build.openshift.io "httpd" created deploymentconfig.apps.openshift.io "httpd" created service "httpd" created --> Success Build scheduled, use 'oc logs -f bc/httpd' to track its progress. Application is not exposed. You can expose services to the outside world by executing one or more of the commands below: 'oc expose svc/httpd' Run 'oc status' to view your app. $ oc get pods NAME READY STATUS RESTARTS AGE httpd-1-build 0/1 Init:0/2 0 2m42s $ oc logs httpd-1-build Error from server (BadRequest): container "sti-build" in pod "httpd-1-build" is waiting to start: PodInitializing ~~~ the pods will stay in initializing state and will eventually fail. Version-Release number of selected component (if applicable): Any OCP version(tested this on OCP 4.4 and OCP 4.3) How reproducible: Always Actual results: App deployment is failing. Expected results: the app should be deployed successfully. Additional info: The documentation seems to provide an overview of how to add porject level firewall but fails to mention which default networks should be added in order to make sure the apps are deployed properly. Is there any other dnsName or cidr that needs to be added? we have tried adding ClusterCIDR, MachineCIDR, and ServiceCIDR and tried to add other registry URL which are used to access the application resources.
I was able to reproduce this locally using the customer's configuration. It did in fact get hung up in the git clone. Based on the error messages, it may be a misconfiguration of the egressnetworkpolicy. I am attempting to correct it now. If I can, I'll post my working copy here. Otherwise, I'll send to the SDN team for assistance.
OK sending to the SDN team for guidance. What I attempted: 0) so the error message from the build pod is F0520 13:37:55.646394 1 helpers.go:115] error: fatal: unable to access 'https://github.com/sclorg/httpd-ex.git/': Failed connect to github.com:443; Connection timed out so I suspect there is an issue with mapping "github.com:443" with entry in the EgressNetworkPolicy "dnsName: github.com" 1) I could not successfully create an EgressNetworkPolicy with a port after reviewing https://github.com/openshift/api/blob/master/network/v1/types.go#L201-L211 and https://docs.openshift.com/container-platform/4.4/networking/openshift_sdn/configuring-egress-firewall.html#egressnetworkpolicy-example_configuring-an-egress-firewall I tried a) still using dnsName, specifying a port via "github.com/443" and "github.com:443" b) I did a nslookup of github.com, and tried various flavors replacing segments of that IP with 0, followed by a /443, but could not get that to work 2) I do see in the upstream https://kubernetes.io/docs/concepts/services-networking/network-policies/ examples that allow for explicit citing of a port ... should the customer use that? 3) Lastly, I tried updating the BC so the git url was http:// based vs. https:// based to try and remove the port element. The git operation still failed, though with a very odd error reported: F0520 14:07:12.827653 1 helpers.go:115] error: RPC failed; result=7, HTTP code = 0 fatal: The remote end hung up unexpectedly Again, for repro's - oc new-app --name=httpd httpd:2.4~https://github.com/sclorg/httpd-ex.git will initially create the build/deployment artifacts - to launch subsequent builds, you can run oc start-build httpd --build-loglevel=10 running oc get pod <build pod name> -o yaml should show the error messages I noted in this update. Also, oc logs bc/httpd will dump the detailed trace of the latest build.
Hi there Apologies, I was out in vaca. I will try to repro tomorrow, apparently there are issues right now in CI to deploy Azure clusters.
I'm unable to provision clusters in Azure for the past few days. I will raise this internally, maybe I can get one to debug this from QE.
I also get the issue while cloning the repo: [ricky@ricky-laptop ~]$ oc describe pod httpd-1-build [227/1033] Name: httpd-1-build Namespace: policy-test Priority: 0 Node: ip-10-0-164-10.us-west-2.compute.internal/10.0.164.10 Start Time: Tue, 23 Jun 2020 15:46:10 +0200 Labels: openshift.io/build.name=httpd-1 Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.2.10" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.2.10" ], "default": true, "dns": {} }] openshift.io/build.name: httpd-1 openshift.io/scc: privileged Status: Failed IP: 10.129.2.10 IPs: IP: 10.129.2.10 Controlled By: Build/httpd-1 Init Containers: git-clone: Container ID: cri-o://2fbfcb603efc557913918e22e89aa7115c082ece040bab01368807954640423e Image: registry.svc.ci.openshift.org/ocp/4.5-2020-06-23-043949@sha256:579be1a4b551c32690f221641c5f4c18a54022e4571a45055696b3bada85fd1a Image ID: registry.svc.ci.openshift.org/ocp/4.5-2020-06-23-043949@sha256:579be1a4b551c32690f221641c5f4c18a54022e4571a45055696b3bada85fd1a Port: <none> Host Port: <none> Command: openshift-git-clone Args: --loglevel=0 State: Terminated Reason: Error Message: Cloning "https://github.com/sclorg/httpd-ex.git" ... error: RPC failed; result=7, HTTP code = 0 fatal: The remote end hung up unexpectedly
I think the issue is that github.com resolves to different IPs. So when you create the policy, the rule resolves to the IP in that moment but it doesn't necessarily mean when the pod runs the git clone will resolve the same IP. Egress network policy are typically used when the endpoint is well-known and doesn't change. As a workaround, I'd run in a loop 'nslookup github.com' from within a pod in a cluster to gather the IPs. Then add those to the egress network policy as IPs, not as dnsName.
Hello Ricardo, I have checked the EgressNetowrkPolicy documentation and the CIDR section in the CRD, it is expected that we explicitly add the network Range and not the specific IP address. As per your update, I understand that you are asking to add the specific IP address in the policy. Can you please confirm this? https://github.com/openshift/api/blob/c3161eb8205e1ee8a63b32269ae9d7283041bbfc/network/v1/004-egressnetworkpolicy-crd.yaml#L60-L73 I have tried running the loop for listing the github.com Ip addresses and observed that the github.com IP address was changing frequently within seconds and were from different networks: ############################################ $ for i in {1..100}; do date;host -tA github.com; sleep 5; done Tue Jul 7 08:44:07 IST 2020 github.com has address 140.82.114.3 Tue Jul 7 08:44:12 IST 2020 github.com has address 140.82.112.4 Tue Jul 7 08:44:17 IST 2020 github.com has address 140.82.112.4 --Skipped-Duplicate-- Tue Jul 7 08:46:09 IST 2020 github.com has address 140.82.114.3 --Skipped-Duplicate-- Tue Jul 7 08:47:10 IST 2020 github.com has address 140.82.112.4 --Skipped-Duplicate-- Tue Jul 7 08:47:30 IST 2020 github.com has address 13.234.176.102 ############################################ The curl and git pull/clone referred to the same IP address at a specific time. ############################################ $ GIT_CURL_VERBOSE=1 git pull origin master * Couldn't find host github.com in the .netrc file; using defaults * About to connect() to github.com port 443 (#0) * Trying 140.82.112.4... * Connected to github.com (140.82.112.4) port 443 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: * subject: CN=github.com,O="GitHub, Inc.",L=San Francisco,ST=California,C=US * start date: May 05 00:00:00 2020 GMT * expire date: May 10 12:00:00 2022 GMT * common name: github.com * issuer: CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US > GET /openshift/csi-driver-nfs.git/info/refs?service=git-upload-pack HTTP/1.1 User-Agent: git/1.8.3.1 Host: github.com Accept: */* Accept-Encoding: gzip Pragma: no-cache < HTTP/1.1 200 OK < Server: GitHub Babel 2.0 < Content-Type: application/x-git-upload-pack-advertisement < Transfer-Encoding: chunked < Expires: Fri, 01 Jan 1980 00:00:00 GMT < Pragma: no-cache < Cache-Control: no-cache, max-age=0, must-revalidate < Vary: Accept-Encoding < X-GitHub-Request-Id: 0554:037F:F7CDFC:1BD24A9:5F03E7E3 < X-Frame-Options: DENY < * Connection #0 to host github.com left intact From https://github.com/openshift/csi-driver-nfs * branch master -> FETCH_HEAD Already up-to-date. $ host -tA github.com github.com has address 140.82.112.4 ############################################ Can you please help clarify on: - How can we add specific Ip addresses in the egressNetworkPolicy as the policy do not allow IP without network mask? - can you provide any sample config of the policy?
Have you tried using /32 mask?
Unassigning as I'm on long vacation.
I'm closing this as a duplicate, once this one is closed please request a backport for that for the version your customer needs. Code is similar between versions so backporting to older versions shouldn't be too problematic. *** This bug has been marked as a duplicate of bug 1850060 ***