Description of problem: After upgrading to RHOCP v4.8.17, EgressIP breaks due to default network policy. Version-Release number of selected component (if applicable): v4.8.20 How reproducible: Always Steps to Reproduce: 1. The issue can be reproduced using the default network policy YAML attached to the Bugzilla. Actual results: EgressIP does not work when network policies are there. If we remove the policy, the application starts to work using EgressIP. Expected results: EgressIP should work in the presence of default network policy Additional info: As asked by the engineering team, I am raising this Bugzilla. The previous discussion has been done on this BZ[https://bugzilla.redhat.com/show_bug.cgi?id=2008987].
Verified in 4.10.0-0.nightly-2021-11-24-030137, EgressIP worked with the networkpolicy configured $ oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS qe-huirwang1124b-x2z4m-master-0 qe-huirwang1124b-x2z4m-master-0 172.31.249.55 10.128.0.0/23 qe-huirwang1124b-x2z4m-master-1 qe-huirwang1124b-x2z4m-master-1 172.31.249.160 10.129.0.0/23 qe-huirwang1124b-x2z4m-master-2 qe-huirwang1124b-x2z4m-master-2 172.31.249.121 10.130.0.0/23 qe-huirwang1124b-x2z4m-worker-5rh7p qe-huirwang1124b-x2z4m-worker-5rh7p 172.31.249.32 10.128.2.0/23 ["172.31.249.0/24"] ["172.31.249.201"] qe-huirwang1124b-x2z4m-worker-8x5c5 qe-huirwang1124b-x2z4m-worker-8x5c5 172.31.249.3 10.131.0.0/23 $ oc get netnamespace test NAME NETID EGRESS IPS test 3821487 ["172.31.249.201"] $ oc get networkpolicy -n test -oyaml apiVersion: v1 items: - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: creationTimestamp: "2021-11-24T08:47:06Z" generation: 1 managedFields: - apiVersion: networking.k8s.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:ingress: {} f:policyTypes: {} manager: kubectl-create operation: Update time: "2021-11-24T08:47:06Z" name: test-podselector-and-ipblock namespace: test resourceVersion: "33366" uid: b920fada-395d-4039-94cd-0d777bdc87dd spec: ingress: - from: - ipBlock: cidr: 10.129.2.32/32 - ipBlock: cidr: 10.131.0.0/24 - ipBlock: cidr: 10.128.2.38/32 podSelector: {} policyTypes: - Ingress kind: List metadata: resourceVersion: "" selfLink: "" $ oc rsh -n test test-rc-897bw ~ $ curl 172.31.249.80:9095 172.31.249.201~ $ curl www.google.com -I HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." Date: Wed, 24 Nov 2021 08:50:54 GMT Server: gws X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Transfer-Encoding: chunked Expires: Wed, 24 Nov 2021 08:50:54 GMT Cache-Control: private Set-Cookie: 1P_JAR=2021-11-24-08; expires=Fri, 24-Dec-2021 08:50:54 GMT; path=/; domain=.google.com; Secure Set-Cookie: NID=511=ZchGIK5lR5eNtv-2BdT8K277sBGv9JhR9wDeAxdHpp77mT78NUzUJ6KGkt0kcwBIGZ5DX4TBqCFpPYOx0-DTX8O5_4zkDhYvzuMuhvinKeh7VV0SYsnj7oiB2bAaKrHIDsUEKAxNlJm0gUxxC8NlXsH__YoK2MUdktyR6Ob2ec8; expires=Thu, 26-May-2022 08:50:54 GMT; path=/; domain=.google.com; HttpOnly
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? example: Up to 2 minute disruption in edge routing example: Up to 90seconds of API downtime example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? example: Issue resolves itself after five minutes example: Admin uses oc to fix things example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? example: No, itβs always been like this we just never noticed example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
> Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? Customers using Egress IPs in namespaces with the network policies applied which do not explicitly allow access from the endpoints the pods in the namespaces are trying to connect to > What is the impact? Is it serious enough to warrant blocking edges? The egress IP matching pods will not have external connectivity unless the network policy is a removed or modified to explicitly allow connectivity from those endpoints > How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? The only remediation is removing the network policy or modifying it > Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? All 4.8.z and 4.9.z versions are impacted and customers upgrading from 4.7 to 4.8 will most likely hit this issue if the have this network policy configuration
> All 4.8.z and 4.9.z versions are impacted and customers upgrading from 4.7 to 4.8 will most likely hit this issue if the have this network policy configuration Because all 4.8.z releases have this issue and this took this long to surface it seems a small proportion of customers might face this issue. Also removing edges to all of 4.8.z will impact customers very negatively as we have edges present for so long. So we are not planning to block upgrade edges for this bug. However if the bug starts impacting more clusters we will reconsider blocking the edge.
Hi I have tested in my OCP4.8.10, egressIP not work within below network policy ```yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-same-namespace namespace: NAMESPACE spec: ingress: - from: - podSelector: {} podSelector: {} policyTypes: - Ingress ``` Only after modify the networkPolicy to allow from the default namespace, the egressIP start to work, please refer to this https://bugzilla.redhat.com/show_bug.cgi?id=1700431 , looks like it not yet fix
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056