Bug 2024880
Summary: | Egress IP breaks when network policies are applied | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mridul Markandey <mmarkand> |
Component: | Networking | Assignee: | Ben Bennett <bbennett> |
Networking sub component: | openshift-sdn | QA Contact: | huirwang |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | agabriel, alchan, anbhat, huirwang, jnordell, jwennerberg, lmohanty, nsu, pbertera, sdodson, shujadha, shzhou, vrutkovs, wking |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-10 16:29:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2026302 |
Description
Mridul Markandey
2021-11-19 11:24:42 UTC
Verified in 4.10.0-0.nightly-2021-11-24-030137, EgressIP worked with the networkpolicy configured $ oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS qe-huirwang1124b-x2z4m-master-0 qe-huirwang1124b-x2z4m-master-0 172.31.249.55 10.128.0.0/23 qe-huirwang1124b-x2z4m-master-1 qe-huirwang1124b-x2z4m-master-1 172.31.249.160 10.129.0.0/23 qe-huirwang1124b-x2z4m-master-2 qe-huirwang1124b-x2z4m-master-2 172.31.249.121 10.130.0.0/23 qe-huirwang1124b-x2z4m-worker-5rh7p qe-huirwang1124b-x2z4m-worker-5rh7p 172.31.249.32 10.128.2.0/23 ["172.31.249.0/24"] ["172.31.249.201"] qe-huirwang1124b-x2z4m-worker-8x5c5 qe-huirwang1124b-x2z4m-worker-8x5c5 172.31.249.3 10.131.0.0/23 $ oc get netnamespace test NAME NETID EGRESS IPS test 3821487 ["172.31.249.201"] $ oc get networkpolicy -n test -oyaml apiVersion: v1 items: - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: creationTimestamp: "2021-11-24T08:47:06Z" generation: 1 managedFields: - apiVersion: networking.k8s.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:ingress: {} f:policyTypes: {} manager: kubectl-create operation: Update time: "2021-11-24T08:47:06Z" name: test-podselector-and-ipblock namespace: test resourceVersion: "33366" uid: b920fada-395d-4039-94cd-0d777bdc87dd spec: ingress: - from: - ipBlock: cidr: 10.129.2.32/32 - ipBlock: cidr: 10.131.0.0/24 - ipBlock: cidr: 10.128.2.38/32 podSelector: {} policyTypes: - Ingress kind: List metadata: resourceVersion: "" selfLink: "" $ oc rsh -n test test-rc-897bw ~ $ curl 172.31.249.80:9095 172.31.249.201~ $ curl www.google.com -I HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." Date: Wed, 24 Nov 2021 08:50:54 GMT Server: gws X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Transfer-Encoding: chunked Expires: Wed, 24 Nov 2021 08:50:54 GMT Cache-Control: private Set-Cookie: 1P_JAR=2021-11-24-08; expires=Fri, 24-Dec-2021 08:50:54 GMT; path=/; domain=.google.com; Secure Set-Cookie: NID=511=ZchGIK5lR5eNtv-2BdT8K277sBGv9JhR9wDeAxdHpp77mT78NUzUJ6KGkt0kcwBIGZ5DX4TBqCFpPYOx0-DTX8O5_4zkDhYvzuMuhvinKeh7VV0SYsnj7oiB2bAaKrHIDsUEKAxNlJm0gUxxC8NlXsH__YoK2MUdktyR6Ob2ec8; expires=Thu, 26-May-2022 08:50:54 GMT; path=/; domain=.google.com; HttpOnly We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? example: Up to 2 minute disruption in edge routing example: Up to 90seconds of API downtime example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? example: Issue resolves itself after five minutes example: Admin uses oc to fix things example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? example: No, it’s always been like this we just never noticed example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1 > Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? Customers using Egress IPs in namespaces with the network policies applied which do not explicitly allow access from the endpoints the pods in the namespaces are trying to connect to > What is the impact? Is it serious enough to warrant blocking edges? The egress IP matching pods will not have external connectivity unless the network policy is a removed or modified to explicitly allow connectivity from those endpoints > How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? The only remediation is removing the network policy or modifying it > Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? All 4.8.z and 4.9.z versions are impacted and customers upgrading from 4.7 to 4.8 will most likely hit this issue if the have this network policy configuration > All 4.8.z and 4.9.z versions are impacted and customers upgrading from 4.7 to 4.8 will most likely hit this issue if the have this network policy configuration
Because all 4.8.z releases have this issue and this took this long to surface it seems a small proportion of customers might face this issue. Also removing edges to all of 4.8.z will impact customers very negatively as we have edges present for so long. So we are not planning to block upgrade edges for this bug. However if the bug starts impacting more clusters we will reconsider blocking the edge.
Hi I have tested in my OCP4.8.10, egressIP not work within below network policy ```yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-same-namespace namespace: NAMESPACE spec: ingress: - from: - podSelector: {} podSelector: {} policyTypes: - Ingress ``` Only after modify the networkPolicy to allow from the default namespace, the egressIP start to work, please refer to this https://bugzilla.redhat.com/show_bug.cgi?id=1700431 , looks like it not yet fix Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |