Bug 1860774 - csr for vSphere egress nodes were not approved automatically during cert renewal
Summary: csr for vSphere egress nodes were not approved automatically during cert renewal
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Joel Speed
QA Contact: Milind Yadav
URL:
Whiteboard:
: 1901873 1917690 1942120 (view as bug list)
Depends On:
Blocks: 2024216
TreeView+ depends on / blocked
 
Reported: 2020-07-27 06:05 UTC by Avinash Bodhe
Modified: 2023-10-06 21:15 UTC (History)
36 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Egress IPs on vSphere were picked up by the vSphere cloud provider within the Kubelet. These were not expected by the CSR approver. Consequence: Nodes with EgressIPs would not have their CSR renewals approved Fix: Allow the CSR approver to account for EgressIPs Result: Nodes with EgressIPs on vSphere SDN clusters now continue to function and have valid CSR renewals
Clone Of:
: 2024216 (view as bug list)
Environment:
Last Closed: 2022-03-12 04:34:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CSR List (7.57 KB, text/plain)
2020-07-27 06:05 UTC, Avinash Bodhe
no flags Details
machine-approver-controller-pod.log (2.08 MB, text/plain)
2020-07-27 06:07 UTC, Avinash Bodhe
no flags Details
Pending CSR (127.40 KB, text/plain)
2020-07-27 06:08 UTC, Avinash Bodhe
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-machine-approver pull 137 0 None open Bug 1860774: Allow fallback to serving cert renewal accounting for egress IPs on SDN 2021-10-18 13:22:29 UTC
Red Hat Knowledge Base (Solution) 5251441 0 None None None 2021-06-23 14:27:42 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:34:54 UTC

Description Avinash Bodhe 2020-07-27 06:05:42 UTC
Created attachment 1702472 [details]
CSR List

Created attachment 1702472 [details]
CSR List

Description of problem:
csr for egress nodes were not approved automatically during renewal.
The cluster has 2 dedicated nodes for egress.  So all egress IP's are assigned to the egress nodes as additional IP's. 
When the node requests new csr during renewal, the csr are not honored as the previous SAN name in csr is not matching in SAN names in new csr. there are differences in egress IP's(addedd/removed).Need to manually approve the certs. 

This issue only occurs when there are egress IP's assigned to node or the additional IP's(egress) changes compared to previously approved cert. We can see in logs that the SAN name check fails between current and new cert due to change in additional IP's.

How reproducible:

Steps to Reproduce:
1. Deploy 4.4 cluster
2. wait until successful deployment
3. verify all nodes in Ready state
4. reniew the crts
5. verify csrs status: oc get csr

Actual results:
The renewal certs are not auto-approved only for the egress node where there are additional Egress IP's,

Expected results:
machine-approver-controller-pod.log" which are not auto-approved for certs

Additional info:
Please see attached logs and below output form "machine-approver-controller-pod.log" logs

I0723 07:23:45.321710       1 csr_check.go:183] Falling back to machine-api authorization for x18sospsee102a.iocp1e.uat.dbs.com
I0723 07:23:45.321727       1 main.go:181] CSR csr-x7sjn not authorized: No target machine for node "x18sospsee102a.iocp1e.uat.dbs.com"
I0723 07:23:45.321745       1 main.go:217] Error syncing csr csr-x7sjn: No target machine for node "x18sospsee102a.iocp1e.uat.dbs.com"
I0723 07:25:07.242173       1 main.go:146] CSR csr-x7sjn added
I0723 07:25:07.298395       1 csr_check.go:418] retrieving serving cert from x18sospsee102a.iocp1e.uat.dbs.com (10.93.84.2:10250)
I0723 07:25:07.303832       1 csr_check.go:163] Found existing serving cert for x18sospsee102a.iocp1e.uat.dbs.com
W0723 07:25:07.304101       1 csr_check.go:172] Could not use current serving cert for renewal: CSR Subject Alternate Name values do not match current certificate
W0723 07:25:07.304126       1 csr_check.go:173] Current SAN Values: [x18sospsee102a.iocp1e.uat.dbs.com 10.93.84.2 10.93.84.35], CSR SAN Values: [x18sospsee102a.iocp1e.uat.dbs.com 10.93.84.15 10.93.84.2]

Comment 1 Avinash Bodhe 2020-07-27 06:07:56 UTC
Created attachment 1702473 [details]
machine-approver-controller-pod.log

Comment 2 Avinash Bodhe 2020-07-27 06:08:42 UTC
Created attachment 1702474 [details]
Pending CSR

Comment 4 Alberto 2020-07-29 09:57:07 UTC
>I0723 07:23:45.321710       1 csr_check.go:183] Falling back to machine-api authorization for x18sospsee102a.iocp1e.uat.dbs.com
I0723 07:23:45.321727       1 main.go:181] CSR csr-x7sjn not authorized: No target machine for node "x18sospsee102a.iocp1e.uat.dbs.com"
I0723 07:23:45.321745       1 main.go:217] Error syncing csr csr-x7sjn: No target machine for node "x18sospsee102a.iocp1e.uat.dbs.com"

I assume this is a UPI cluster as the machine approver fallback can't find a backing machine either?

>This issue only occurs when there are egress IP's assigned to node or the additional IP's(egress) changes compared to previously approved cert

This seems a fair assumption for the approver to do and to not explicitly approve for this scenario. When that happens you should get an alert for too many CSRs so you can manual approve them. This is by design the gating criteria for the renewal on UPI to relaxing it would defeat its purpose.

Avinash I'm closing this for now, please feel free to reopen if my reasoning above does not makes sense or you have any farther questions.

Comment 5 Leonid Titov 2020-09-16 13:58:19 UTC
Hello,

As I have got similar question, I feel necessary to reopen this bug.

Described issue happens when a node shows additional IP addresses in its Subject Alternate Name values. These additional IP addresses are assigned to that node by a cluster operator, so this assignment can be verified - if they are legitimate IPs for that specific node.

Failures with CSR autoapprovement in such situations brings operational overhead and introduce security risks due to possible human error - cluster admin can miss illegal node in CSR pending queue and approve it occasionally.

So I kindly ask you to review this bugreport and introduce CSR autoapprovement for egressip-enabled nodes.

Thanks in advance,
Leonid.

Comment 6 Michael Gugino 2020-09-21 23:45:35 UTC
Moving this to the node team.  If the kubelet is going to pick up new IPs in this scenario, they need to either exclude them in the CSR request or provide some interface for use to interact with.

Comment 7 Ryan Phillips 2020-10-23 15:17:48 UTC
As Alberto has mention, this is a manual process to approve CSRs, and an administrator has to be diligent on what they are approving.

Closing as NOTABUG.

Comment 8 Michael Gugino 2020-10-23 16:24:56 UTC
This is most definitely a bug.  If it's an IPI install, there is no expectation that users ever manually approve CSRs.

Comment 9 Ryan Phillips 2020-10-23 18:15:52 UTC
Alberto: Do you think we can make the certificate approver allow for more IP addresses?

Comment 10 Joel Speed 2020-10-26 11:51:00 UTC
> These additional IP addresses are assigned to that node by a cluster operator, so this assignment can be verified 

@ltitov Do you know which operator is assinging these IPs? Are they noted anywhere in a status that we could check? I know the Machine has IP Addresses for the Node synced to its status, not sure if we keep that up to date or not.

I guess it would be safe if the SANs had changed and we could verify that the IP address list was up to date (check against the cloud provider - ideally via status on some object) to still approve the CSR

Comment 11 Michael Gugino 2020-10-26 13:43:39 UTC
Sending this back to kubelet team to determine why these egress IPs are ending up in certs.

Comment 12 Leonid Titov 2020-11-02 16:41:45 UTC
@jspeed I suppose it is SDN plugin who assigns egressIP: 

- https://github.com/openshift/sdn/blob/7150d119798d915068b6f896ff0d3ffa3b4065f5/pkg/network/node/egressip.go#L161
- https://github.com/openshift/ovn-kubernetes/blob/a7a4c587be818bf1e5a88e555172a32de9d19ca3/go-controller/pkg/ovn/egressip.go#L35

and I expect similar should be implemented in other plugins.

Thanks!

Comment 13 Ryan Phillips 2020-11-03 20:00:50 UTC
The CSR code within Kubernetes and the Kubelet dynamically set the egress and local IP addresses within the CSR request. The machine approver needs to be resilient to see the updated addresses and allow all the IPs within the CSR. 

https://github.com/kubernetes/kubernetes/blob/4fdfbc718cfd854178c037e0609a1917b4647841/pkg/kubelet/kubelet_node_status.go#L516

Comment 14 Eric Rich 2020-11-03 23:19:17 UTC
Relevant product docs on adding Egress IP's https://docs.openshift.com/container-platform/4.6/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.html#nw-egress-ips-node-assignment_configuring-egress-ips-ovn 
It's important to denote this as docs improvements or call outs may be needed to improve the current customer user experience if your using this feature.

Comment 15 Michael Gugino 2020-11-04 00:17:16 UTC
These egress IPs are not meant for communicating to the node's server ports, instead they are for pod traffic.  As such, I don't think they should be included in the node's CSR request.

Comment 16 Ryan Phillips 2020-11-10 16:41:11 UTC
Egress IPs are new in 4.6: https://docs.openshift.com/container-platform/4.6/networking/ovn_kubernetes_network_provider/configuring-egress

The Kubelet should not have the Egress IPs in the CSR, but the kubelet does not have a way to filter out egress IPs. Since OVN is using a CRD (EgressIP), the Kubelet would need custom code to list the Egress IPs and filter them out of the CSR request.

I do see a short term path forward here for the Kubelet, and it will require a change to the machine approver.

Comment 17 Joel Speed 2020-11-10 17:34:16 UTC
@rphillips Could you expand on your idea and the changes you expect to see in the cluster-machine-approver so that we know what to expect?

Comment 18 Ryan Phillips 2020-12-01 14:33:13 UTC
The kubelet does not have the means to filter out the egress IPs. It is not ideal for the egress IPs to be in the CSR, but the machine-approver will need to allow the IP update for the time being.

Comment 19 Ryan Phillips 2020-12-01 14:51:39 UTC
Upstream issue: https://github.com/kubernetes/kubernetes/issues/96981

Comment 20 Michael McCune 2020-12-04 19:01:41 UTC
adding UpcomingSprint tag, this is still under investigation.

Comment 21 Michael Gugino 2020-12-04 19:29:01 UTC
I disagree that the kubelet does not have the means to filter out these IPs.  Since these are serving certs, the kubelet's client is already part of the cluster, therefor you could use the API or some other means to discover this information.

An alternative would be to have whatever is configuring these IPs to indicate somewhere/somehow that they are egress IPs so the kubelet can filter them.  EG, network thing that is making changes to interfaces adds all egress IPs to file in /var/lib/kubernetes/egressips or whatever.

Comment 22 Michael Gugino 2020-12-15 15:03:39 UTC
*** Bug 1901873 has been marked as a duplicate of this bug. ***

Comment 26 Alexander Constantinescu 2021-02-24 14:27:49 UTC
FWIW: I haven't seen this bug earlier, but I just want to point out that __I think__ this only applies to openshift-sdn (though I see OVN-Kubernetes mentioned in several comments). OVN-Kubernetes never adds an egress IP to the node's interfaces (any egress IP configuration made by OVN-Kubernetes is only done within the OVN "dataplane"), so I don't see how it would be picked up by the kubelet. 

Adding the egress IPs to the interfaces, as in `ip addr add $EGRESS_IP dev eth0` is a technique openshift-sdn uses. Now, openshift-sdn has done this since 3.7, so I am not sure why it is a problem today and in version >= 4.4...but I just wanted to raise this point.

Comment 27 Michael Gugino 2021-02-24 16:37:25 UTC
(In reply to Alexander Constantinescu from comment #26)
> FWIW: I haven't seen this bug earlier, but I just want to point out that __I
> think__ this only applies to openshift-sdn (though I see OVN-Kubernetes
> mentioned in several comments). OVN-Kubernetes never adds an egress IP to
> the node's interfaces (any egress IP configuration made by OVN-Kubernetes is
> only done within the OVN "dataplane"), so I don't see how it would be picked
> up by the kubelet. 
> 
> Adding the egress IPs to the interfaces, as in `ip addr add $EGRESS_IP dev
> eth0` is a technique openshift-sdn uses. Now, openshift-sdn has done this
> since 3.7, so I am not sure why it is a problem today and in version >=
> 4.4...but I just wanted to raise this point.

3.x uses a much different CSR approval process.

In 4.x, if the IP shows up on the cloud provider, we'll add it to the machine record and there will be no problem.  This probably isn't the case for all platforms.

Perhaps adding a virtual interface like eth0:1 would prevent to kubelet from picking up this superfluous IPs?

Comment 28 Alexander Constantinescu 2021-02-24 20:10:59 UTC
> Perhaps adding a virtual interface like eth0:1 would prevent to kubelet from picking up this superfluous IPs?

I will do some experiments and see if that might work with egress IP. But openshift-sdn SNATs egress packets on nodes using iptables on eth0, which are coming from its tun0 interface. I am thus not sure if 1) it will work, 2) what the extent of such a refactor would mean for openshift-sdn 

I don't have the kubelet nor knowledge regarding the CSR process to completely tell, but this bug seems somewhat related to https://bugzilla.redhat.com/show_bug.cgi?id=1872632 (where OVN was setting up additional interfaces and IPs on the node, which later, were picked up by the kubelet on bare-metal deployments and causing the `node.status.addresses[?(@.type=="InternalIP")].address` to be the OVN configured IP and not the correct node IP. I'm not sure if the kubelet mechanism used for setting that address field is related to the "kubelet filtering" mentioned above needed for the CSR approval process, but: could we re-use that solution for this problem? 

If this problem is completely unrelated from that one, then feel free to overlook this comment

From [1]:

> it seems like there needs to be some way for platform-none UPI users to override an incorrect guess by kubelet regarding the node IP. Currently we allowing overriding it (based on nodeip-configuration.service) in bare metal IPI, openstack, and vsphere, but not anywhere else.

[1]: https://github.com/openshift/machine-config-operator/pull/2088

Comment 30 Dan Winship 2021-03-04 17:36:22 UTC
The problem here is that we never meant to claim that egress IPs work or were supported on clouds, but some people discovered that you could make it more or less work by adding IPs to the node behind OCP's back using AWS/Azure APIs directly. But then kubelet sees those new IPs being reported by the CloudProvider and so adds them to node.Status.Addresses, which results in them being added to the CSR, etc.

So, the problem (AFAICT) is that people are using egress IPs in an unsupported way (ie, on clouds), and it turns out that this doesn't work. Neither kubelet nor the CSR approver is doing anything wrong.

Comment 31 Dan Winship 2021-03-04 18:04:18 UTC
(In reply to Michael Gugino from comment #27)
> In 4.x, if the IP shows up on the cloud provider, we'll add it to the
> machine record and there will be no problem.  This probably isn't the case
> for all platforms.

Oh, wait, that seems to break my theory; I was assuming that the problem was that they added the IP in the cloud provider, and it thus got added to the Node but NOT the Machine record. Hm...

> Perhaps adding a virtual interface like eth0:1 would prevent to kubelet from
> picking up this superfluous IPs?

No, when using AWS, GCP, Azure, or OpenStack, kubelet copies the list of IPs from the cloud provider to node.status.addresses, and I don't think it even bothers to look at the actual IPs on the node.

When using bare metal, kubelet puts exactly one IP (or two dual-stack IPs) into node.status.addresses regardless of how many IPs and how many interfaces there are.

IIRC vSphere's behavior is that publishes some or all of the actual interface IPs to node.status.addresses. I'm not sure what sets the Machine addresses under vSphere though...

Comment 32 Dan Winship 2021-03-04 18:05:46 UTC
What platform are the customers who reported the bug using? (eg, AWS, vSphere, bare metal. Also IPI vs UPI)

Comment 33 Michael Gugino 2021-03-04 19:22:48 UTC
The CSR approver doesn't consider Node.status.addresses, only what the machine-api discovers via the cloud.  The reason for this is security related, if someone hijacks the kubelet and adds nefarious entries onto node.status.addresses, we don't want to hand out a certificate for hijacked-domain.com.  If we're going to look at node.status.addresses, then we might as well look at nothing and approve all serving cert requests as long as the username matches the kubelet (this is injected by the admission controller for CSRs)

Comment 34 Leonid Titov 2021-03-04 20:08:13 UTC
@danw mine customer reported this on VMWare UPI. I don't see any unsupported things in their setup - they are just not getting auto-approved CSRs for nodes where EgressIP was allocated.

Comment 35 Dan Winship 2021-03-04 21:02:11 UTC
Blah. OK, so on VMware, Node.Status.Addresses (and thus the kubelet CSR) will include all local IPs (or more specifically, all local IPs that are on a network interface that was created by VMware). So if the Machine object only includes IPs known to the VMware API, then it would not match the CSR when there were manually-added IPs (such as egress IPs). I wonder how this never blew up before now?

I'm not sure how to fix this... I'm also not sure why the vSphere CloudProvider behaves this way; it has code to get the list of IPs from the VMware API instead of getting them from the network interfaces, but it intentionally doesn't use that code for this case. But if we think that behavior doesn't make sense in OCP (given that it will make CSR approval fail, and we don't want to change that) then maybe we should patch the vSphere CloudProvider to work the "right" way in OCP?

Comment 36 Michael Gugino 2021-03-05 01:51:41 UTC
Resolvable hostnames are a requirement in OCP 4.x.  The kubelet should only use it's hostname in the CSR request, there's no reason for any other data to be there.  All communication should be happening behind that hostname.

In the case of UPI clusters, there might not be a machine-api machine object to reference.  We built some logic into the CSR approver to approve any serving cert renewal that has identical SANs as the previous serving-cert.

Comment 37 Michael Gugino 2021-03-25 14:38:47 UTC
Second virtual NIC is causing the kubelet to add the second NIC's IP address to the CSR request.  There's absolutely no way for the machine-api to detect this.

https://bugzilla.redhat.com/show_bug.cgi?id=1942120

Comment 38 Elana Hashman 2021-03-26 23:33:33 UTC
Can someone provide a summary of possible fixes with pros/cons? This discussion is somewhat difficult for me to follow as I lack the context.

Comment 39 Michael Gugino 2021-03-27 13:20:54 UTC
Since we now have evidence the kubelet is picking up secondary NICs during CSR requests ( https://bugzilla.redhat.com/show_bug.cgi?id=1942120 ), something the cluster would have no way to detect, the only reasonable fix is to patch the kubelet and only provide the nodename.  This could be done with a CLI flag easily.  DNS is a mandatory requirement for OCP 4, so it should work in 100% of cases.

Comment 40 Elana Hashman 2021-04-05 21:12:04 UTC
This would require an upstream fix. Assigned https://github.com/kubernetes/kubernetes/issues/96981

Kubernetes development is currently closed for the 1.21 freeze but should reopen sometime next week or so.

Comment 42 Elana Hashman 2021-04-15 22:31:38 UTC
I think this is a feature request. The addition of a new feature in 4.6 (egress IPs) caused unexpected side effects that has led to this need. I believe the kubelet is working as intended.

Filed an RFE: https://issues.redhat.com/browse/RFE-1793

Please check that over to ensure I've filed the request accurately.

Closing this bug.

Comment 43 Leonid Titov 2021-04-16 06:59:17 UTC
@ehashman sorry, I would insist on keeping this as a bug. Here is nothing specific to egress IP in OVN-Kubernetes, which was introduced in 4.6. Bug report was opened for OCP4.4 and OpenShift-SDN, and EgressIP functionality exists also in OCP3.

I'm pretty sure that this bug can be discovered in the very first OCP4.x release where CSR auto-approving was introduced (I guess it was 4.2), and this auto-approving procedure is not working as expected for nodes where egress IP is allocated.

Please, reiterate this BZ, considering those comments.

Thanks!

Comment 44 Alexander Constantinescu 2021-04-16 08:11:39 UTC
Also, this bug is not only caused by egress IP as #comment 39 points out.

Comment 48 Michael Gugino 2021-04-22 14:38:39 UTC
*** Bug 1942120 has been marked as a duplicate of this bug. ***

Comment 54 Elana Hashman 2021-05-10 20:11:13 UTC
Dan, Ryan and I met today and agreed that as a short term solution, we will need to fix this by making the vSphere Cloud Provider not return the egress IPs, and backport that fix to earlier versions of OpenShift.

For a longer term, upstream solution, we won't be able to get a fix into 1.22, so we will go through prioritization for the next cycle and propose the possible solutions to upstream Kubernetes (make kubelet ignore most of the addresses returned from the CloudProvider and only set 1-2 IPs in node.Status.Addresses, don't change node.Status.Addresses but make CSR generation use 1-2 IPs, add a config option to not include IPs in CSRs).

We also considered making a change to the certificate approver to be more flexible with automatic approvers (e.g. approve all IPs on the subnet in a CSR for kubelet), but this has security implications so we would rather not go down that route without security review.


Transferring this bug to cloud provider for the short-term fix. Long term work is tracked in https://issues.redhat.com/browse/RFE-1793

Comment 55 Michael Gugino 2021-05-10 20:20:40 UTC
We're not patching the cloud provider, this affects other platforms.  Sending back to kubelet.

Comment 57 Dan Winship 2021-05-11 13:25:35 UTC
So to clarify, the problem is that this is going to require a KEP-level change to kubelet, and we don't want to fork kubelet, and we don't want to wait a year to get the fix. So it seemed like the best plan was to make vSphere not include the egress IPs, while the change to kubelet gets figured out. That's also likely to be a local fork, but it's more like a bugfix and doesn't add API, etc.

(There are two possible vSphere fixes; one to make it only return IPs that are known to the vSphere API, which brings it into alignment with all the other cloud providers, so maybe upstream would want that fix. The simpler fix would be to just add a local hack to the vSphere provider in OCP to recognize and ignore egress IPs specifically (which is easy. I could write the PR for that.))

Comment 60 Joel Speed 2021-06-25 13:27:43 UTC
*** Bug 1917690 has been marked as a duplicate of this bug. ***

Comment 72 Christian Affolter 2021-09-14 15:00:40 UTC
Is there any update on this? We're still facing the same issue on 4.7. Thanks a lot!

Comment 75 Joel Speed 2021-09-22 11:35:44 UTC
An update from a customer case I'm working on, with a potential (unverified) workaround that might be useful to others

> From the logs, we can see that the CSR approver is correctly fetching the current certificate from the hosts. It has a mechanism built in so that if the CSR requests an identical set of IPs/SANs, it will auto approve because this isn't an escalation of privilege.
> So, in this case, because the egress IPs were added after the first serving certificates were created, the CSR approver is rightfully saying that this CSR doesn't match the previous certificate, as such kubelet is asking for more privilege and therefore, shouldn't be auto-approved.
> If the customer were to manually validate the current set of CSRs, and the egress IPs were to remain the same going forward, then the CSRs will be auto approved in the future as the CSR will then match the existing issued certificate

Comment 103 Milind Yadav 2021-11-11 04:29:32 UTC
Validated on - 
[miyadav@miyadav ~]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-10-212548   True        False         110m    Cluster version is 4.10.0-0.nightly-2021-11-10-212548

Steps :
[miyadav@miyadav ~]$ oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.jima1111a.qe.devcluster.openshift.com:6443".

[miyadav@miyadav ~]$ oc create ns testproject
namespace/testproject created

[miyadav@miyadav ~]$ oc patch netnamespace testproject --type=merge -p '{"egressIPs": ["10.128.2.1"]}'
netnamespace.network.openshift.io/testproject patched

[miyadav@miyadav ~]$ oc get netnamespace testproject 
NAME          NETID     EGRESS IPS
testproject   1280607   ["10.128.2.1"]

[miyadav@miyadav ~]$ oc get hostsubnet
NAME              HOST              HOST IP         SUBNET          EGRESS CIDRS   EGRESS IPS
compute-0         compute-0         172.31.246.45   10.131.0.0/23                  
compute-1         compute-1         172.31.246.26   10.128.2.0/23                  
control-plane-0   control-plane-0   172.31.246.25   10.130.0.0/23                  
control-plane-1   control-plane-1   172.31.246.44   10.128.0.0/23                  
control-plane-2   control-plane-2   172.31.246.48   10.129.0.0/23    
              
[miyadav@miyadav ~]$ oc patch hostsubnet compute-1 --type=merge -p '{"egressCIDRs": ["10.128.2.1/24"]}' 
hostsubnet.network.openshift.io/compute-1 patched

[miyadav@miyadav ~]$ oc get hostsubnet
NAME              HOST              HOST IP         SUBNET          EGRESS CIDRS        EGRESS IPS
compute-0         compute-0         172.31.246.45   10.131.0.0/23                       
compute-1         compute-1         172.31.246.26   10.128.2.0/23   ["10.128.2.1/24"]   
control-plane-0   control-plane-0   172.31.246.25   10.130.0.0/23                       
control-plane-1   control-plane-1   172.31.246.44   10.128.0.0/23                       
control-plane-2   control-plane-2   172.31.246.48   10.129.0.0/23  
                     
[miyadav@miyadav ~]$ oc get csr
No resources found

[miyadav@miyadav ~]$ oc debug node/compute-1
Starting pod/compute-1-debug ...
To use host binaries, run `chroot /host`
chroot /host
if conPod IP: 172.31.246.26
fIf you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ifconfig | grep 10.128.2.1
        inet 10.128.2.1  netmask 255.255.254.0  broadcast 10.128.3.255


Additional Info:
Below are machine-approver logs - 
https://url.corp.redhat.com/machine-approverlogs

@Joel , Thanks for reviewing steps earlier , Please review the logs also . I am not sure if no csr generated is expected or csr would be generated and approved immediately .

Comment 105 Joel Speed 2021-11-11 10:22:16 UTC
Ahh I can see the issue with the test, you need to use an IP in the host IP range for this issue to manifest. So if you could try again but with the egress IP as something like 172.31.246.200, this should end up on the same interface as the host interface and cause the issue to manifest

Comment 107 Milind Yadav 2021-11-11 10:45:06 UTC
Joel , if I use the range you suggested not getting expected result .

Starting pod/compute-1-debug ...
To use host binaries, run `chroot /host`
chPod IP: 172.31.246.26
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ifconfig | grep 172.31.246.200
sh-4.4# exit
exit
sh-4.4# exit
exit

Comment 108 Milind Yadav 2021-11-11 12:08:07 UTC
After checking with Joel and adding the IPs within the hostsubnet range , I was able to get csr approved on the same cluster as mentioned in earlier comment

[miyadav@miyadav ~]$ oc get csr
NAME        AGE    SIGNERNAME                      REQUESTOR               REQUESTEDDURATION   CONDITION
csr-l4tlm   114s   kubernetes.io/kubelet-serving   system:node:compute-0   <none>              Approved,Issued
csr-nr5z8   47m    kubernetes.io/kubelet-serving   system:node:compute-1   <none>              Approved,Issued
csr-zh6bp   43m    kubernetes.io/kubelet-serving   system:node:compute-0   <none>              Approved,Issued


Additional Info:
Added testcase as well . Moving to VERIFIED.

Comment 126 errata-xmlrpc 2022-03-12 04:34:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.