2030733 – wrong IP selected to connect to the nodes when ExternalCloudProvider enabled

Bug 2030733 - wrong IP selected to connect to the nodes when ExternalCloudProvider enabled

Summary: wrong IP selected to connect to the nodes when ExternalCloudProvider enabled

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Matthew Booth
QA Contact:	rlobillo
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2043659 2045493 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-09 15:30 UTC by rlobillo
Modified:	2022-08-10 10:40 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 10:40:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubernetes kubernetes pull 107750	None	open	Prefer user-provided node IP	2022-01-26 13:03:59 UTC
Github	openshift cloud-provider-openstack pull 114	None	open	Bug 2030733: CARRY [occm] Bump k8s deps to 0.24.0-beta.0 for --node-ip fix	2022-03-31 17:08:17 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 10:40:47 UTC

Description rlobillo 2021-12-09 15:30:23 UTC

Description of problem: Wrong interface is selected to connect to the worker.

The openstack servers created are:

$ openstack server list
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| ID                                   | Name                        | Status | Networks                                                     | Image              | Flavor |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| c9eb09a2-adf5-4906-b57b-d0cdd0835ee9 | ostest-xp2wj-worker-0-vtnvj | ACTIVE | StorageNFS=172.17.5.213; ostest-xp2wj-openshift=10.196.1.42  | ostest-xp2wj-rhcos |        |
| 50dee8e8-9bd4-46d2-89f4-1939108e9a48 | ostest-xp2wj-worker-0-8kp42 | ACTIVE | StorageNFS=172.17.5.181; ostest-xp2wj-openshift=10.196.1.247 | ostest-xp2wj-rhcos |        |
| b6c16079-8117-48db-a777-2e10545587e9 | ostest-xp2wj-worker-0-5nbxp | ACTIVE | StorageNFS=172.17.5.199; ostest-xp2wj-openshift=10.196.1.151 | ostest-xp2wj-rhcos |        |
| 7c43bc0a-bcca-429c-bbd3-fabe9901dd35 | ostest-xp2wj-master-2       | ACTIVE | ostest-xp2wj-openshift=10.196.3.145                          | ostest-xp2wj-rhcos |        |
| 3cbb090c-96c5-4f0b-98a8-75707504d3d7 | ostest-xp2wj-master-1       | ACTIVE | ostest-xp2wj-openshift=10.196.0.41                           | ostest-xp2wj-rhcos |        |
| a6bc10d9-866f-4864-b9c6-e54b5853d0ed | ostest-xp2wj-master-0       | ACTIVE | ostest-xp2wj-openshift=10.196.2.254                          | ostest-xp2wj-rhcos |        |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+

The workers have two IPs, one for StorageNFS (for manila), and the other one for regular machine Subnet. However, one of the workers are using the StorageNFS network to create the debug pod on it (and failing):

$ oc debug node/ostest-xp2wj-worker-0-vtnvj
Starting pod/ostest-xp2wj-worker-0-vtnvj-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.17.5.213
If you don't see a command prompt, try pressing enter.

Removing debug pod ...
Error from server: error dialing backend: dial tcp 172.17.5.213:10250: i/o timeout

Moreover, the pods running on that problematic worker cannot be accessed:

$ oc get pods -n demo -o wide
NAME                    READY   STATUS    RESTARTS   AGE    IP            NODE                          NOMINATED NODE   READINESS GATES
demo-7897db69cc-4zlvj   1/1     Running   0          3h3m   10.131.0.26   ostest-xp2wj-worker-0-8kp42   <none>           <none>
demo-7897db69cc-d2g2n   1/1     Running   0          3h3m   10.129.2.46   ostest-xp2wj-worker-0-vtnvj   <none>           <none>
demo-7897db69cc-zdngv   1/1     Running   0          3h3m   10.128.2.13   ostest-xp2wj-worker-0-5nbxp   <none>           <none>
(shiftstack) [stack@undercloud-0 ~]$ oc rsh -n demo demo-7897db69cc-d2g2n
Error from server: error dialing backend: dial tcp 172.17.5.213:10250: i/o timeout
(shiftstack) [stack@undercloud-0 ~]$ 


The other two workers work fine:

$ oc debug node/ostest-xp2wj-worker-0-5nbxp
Starting pod/ostest-xp2wj-worker-0-5nbxp-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.196.1.151
If you don't see a command prompt, try pressing enter.
sh-4.4# 


I observed they have their IPs defined in different order in their status section:
- Problematic worker:

(shiftstack) [stack@undercloud-0 ~]$ oc get node/ostest-xp2wj-worker-0-vtnvj -o json | jq .status.addresses                                                                                                                                  
[
  {
    "address": "172.17.5.213",
    "type": "InternalIP"
  },
  {
    "address": "10.196.1.42",
    "type": "InternalIP"
  },
  {
    "address": "ostest-xp2wj-worker-0-vtnvj",
    "type": "Hostname"
  }
]
- the other one:

$ oc get node/ostest-xp2wj-worker-0-5nbxp -o json | jq .status.addresses
[
  {
    "address": "10.196.1.151",
    "type": "InternalIP"
  },
  {
    "address": "172.17.5.199",
    "type": "InternalIP"
  },
  {
    "address": "ostest-xp2wj-worker-0-5nbxp",
    "type": "Hostname"
  }
]


It is also observed that node has only one interface on its status when externallCCM is not enabled:
$ openstack server list
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| ID                                   | Name                        | Status | Networks                                                     | Image              | Flavor |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| 6192993a-cc9d-4e65-b0e3-ddf4828e2e24 | ostest-ngz6v-worker-0-f7xzq | ACTIVE | StorageNFS=172.17.5.158; ostest-ngz6v-openshift=10.196.1.154 | ostest-ngz6v-rhcos |        | 
and:
$ oc get node/ostest-ngz6v-worker-0-f7xzq -o json | jq .status.addresses
[
  {
    "address": "10.196.1.154",
    "type": "InternalIP"
  },
  {
    "address": "ostest-ngz6v-worker-0-f7xzq",
    "type": "Hostname"
  }
]


Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-06-123512


How reproducible: Random


Steps to Reproduce: Install OCP4.10 enabling externalCCM.

Actual results: worker node cannot be reached.

Expected results: worker node and the pods running on it can be accessed.


Additional info: must gather: http://file.rdu.redhat.com/rlobillo/must-gather-install.tar.gz

Comment 4 Matthew Booth 2021-12-10 11:07:21 UTC

This bug feels like a can of worms. Firstly I don't actually know how oc debug works, but to make my assumptions explicit, I am guessing it:

* Starts a pod on a node with host networking
* Uses the PodIP assigned to that pod by kubelet
* Connects to the PodIP internally from... presumably an apiserver.

The docs [1] describe InternalIP as "Typically the IP address of the node that is routable only within the cluster", but also "The usage of these fields varies depending on your cloud provider or bare metal configuration.". This seems a bit vague, but my interpretation is that it lists endpoint addresses for this node for internal communication, and a storage network exposed on a subset of nodes probably doesn't meet that criterion.

The comment on GetNodeHostIPs() in kubernetes/kubernetes [2] suggests it's the immediate source of oc debug's errant IP.

We should separate the 2 different lists of Addresses here:

The Machine object has a list of Addresses. These are not directly used by kubernetes, but define a list of IP addresses which will be approved if requested in a CSR generated by the kubelet running on the host. This list is generated by CAPO.

The Node object has a list of Addresses. These are the actual addresses used by kubernetes. These must be a (non-strict) subset of the Addresses defined on the Machine object, or kubelet will fail to come up when its CSR is not approved. These addresses are written by either kubelet (legacy cloud provider) or the CCM (external cloud provider).

This bug concerns the list of addresses on the Node object, and is therefore a cloud provider issue, not a CAPO issue.

For investigation:

* What's the implementation difference between legacy cloud provider (OpenStack) and CCM (OpenStack)
* What metadata is available to CCM to distinguish 'cluster' network(s) from infra networks?

[1] https://kubernetes.io/docs/concepts/architecture/nodes/#addresses
[2] https://github.com/kubernetes/kubernetes/blob/cc6f12583f2b611e9469a6b2e0247f028aae246b/pkg/util/node/node.go#L89-L93

Comment 6 Pierre Prinetti 2021-12-22 15:27:54 UTC

Setting the priority as "medium". This must be properly investigated (and possibly resolved) before GA, which is not imminent.

Comment 7 Matthew Booth 2022-01-06 17:26:05 UTC

NodeAddresses are generated quite differently on legacy vs external cloud provider.

Legacy cloud provider:
https://github.com/kubernetes/legacy-cloud-providers/blob/1a100831c5a0718b3ef6ae88bf506d383d387b45/openstack/openstack.go#L565-L626


External cloud provider:
https://github.com/kubernetes/cloud-provider-openstack/blob/d46aa87534042ad1e26b812d1ef1aa140317a25e/pkg/openstack/instances.go#L458-L565

where interfaces is provided by:
https://github.com/kubernetes/cloud-provider-openstack/blob/d46aa87534042ad1e26b812d1ef1aa140317a25e/pkg/openstack/instances.go#L611-L629

Comment 8 Matthew Booth 2022-01-06 17:44:37 UTC

From git spelunking, this seems relevant: https://github.com/kubernetes/cloud-provider-openstack/issues/407

Comment 9 Matthew Booth 2022-01-06 18:09:15 UTC

Ok, we need to fix this by setting internal-network-name in cloud.conf.

We currently don't have a mechanism to customise cloud.conf, so it's not yet possible to fix. However, we need to allow this anyway as a matter of urgency before we can GA this feature. Once we have the ability to customise cloud.conf this should be a relatively simple fix.

Comment 10 ShiftStack Bugwatcher 2022-01-07 07:03:37 UTC

Removing the Triaged keyword because:
* the QE automation assessment (flag qe_test_coverage) is missing

Comment 15 Matthew Booth 2022-01-26 13:03:38 UTC

*** Bug 2045493 has been marked as a duplicate of this bug. ***

Comment 19 Matthew Booth 2022-02-02 16:24:04 UTC

*** Bug 2043659 has been marked as a duplicate of this bug. ***

Comment 24 rlobillo 2022-04-11 15:56:02 UTC

Verified on 4.11.0-0.nightly-2022-04-08-205307 on top of RHOS-16.2-RHEL-8-20220311.n.1 installing OCP cluster with eCCM enabled on featureGate:

$ oc get featureGate/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
  creationTimestamp: "2022-04-11T13:29:12Z"
  generation: 1
  name: cluster
  resourceVersion: "1379"
  uid: 5766ece9-3dcc-4982-b355-f4c37d739ab9
spec:
  customNoUpgrade:
    enabled:
    - ExternalCloudProvider
  featureSet: CustomNoUpgrade

$ oc get pods -n openshift-cloud-controller-manager
NAME                                                  READY   STATUS    RESTARTS   AGE
openstack-cloud-controller-manager-7f7f67c5f8-lb4wx   1/1     Running   0          65m
openstack-cloud-controller-manager-7f7f67c5f8-sh9h9   1/1     Running   0          65m

---------------------------

Once the cluster is deployed through IPI, all the machines includes the primarySubnet:

$ oc get machine -n openshift-machine-api -o json | jq .items[].spec.providerSpec.value.primarySubnet
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"

And the workers have two networks defined:

$ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=worker -o json | jq .items[].spec.providerSpec.value.networks
[
  {
    "filter": {},
    "subnets": [
      {
        "filter": {},
        "uuid": "f94ecb70-604a-447f-896b-6fc40b045e4c"
      }
    ]
  },
  {
    "filter": {},
    "noAllowedAddressPairs": true,
    "uuid": "b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac"
  }
]
[
  {
    "filter": {},
    "subnets": [
      {
        "filter": {},
        "uuid": "f94ecb70-604a-447f-896b-6fc40b045e4c"
      }
    ]
  },
  {
    "filter": {},
    "noAllowedAddressPairs": true,
    "uuid": "b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac"
  }
]
[
  {
    "filter": {},
    "subnets": [
      {
        "filter": {},
        "uuid": "f94ecb70-604a-447f-896b-6fc40b045e4c"
      }
    ]
  },
  {
    "filter": {},
    "noAllowedAddressPairs": true,
    "uuid": "b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac"
  }
]

where the primarySubnet is:

$ openstack subnet list | grep f94ecb70-604a-447f-896b-6fc40b045e4c
| f94ecb70-604a-447f-896b-6fc40b045e4c | restricted_subnet     | 059e58b8-fd1c-41d1-b44c-d7fced04d078 | 172.16.0.0/24  |

and the secondary network is the one used for integrating with Manila:

$ openstack subnet list | grep b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac
| 5dbe57da-73ea-457f-b044-8f05459d9368 | StorageNFSSubnet      | b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac | 172.17.5.0/24  |


As expected, nodes only include the IP defined as primary:

$ oc get nodes -o json | jq '.items[].status.addresses[]'                                                                                                                                                                            
{
  "address": "172.16.0.67",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-master-0",
  "type": "Hostname"
}
{
  "address": "172.16.0.50",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-master-1",
  "type": "Hostname"
}
{
  "address": "172.16.0.87",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-master-2",
  "type": "Hostname"
}
{
  "address": "172.16.0.59",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-worker-0-5m8x2",
  "type": "Hostname"
}
{
  "address": "172.16.0.210",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-worker-0-7tzcq",
  "type": "Hostname"
}
{
  "address": "172.16.0.203",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-worker-0-rh8x8",
  "type": "Hostname"
}

and the openstack instances have the two interfaces attached:

$ openstack server list -c Name -c Networks
+-----------------------------+-----------------------------------------------------------------------------------+
| Name                        | Networks                                                                          |
+-----------------------------+-----------------------------------------------------------------------------------+
| ostest-f97g7-worker-0-7tzcq | StorageNFS=172.17.5.220; restricted_network=172.16.0.210                          |
| ostest-f97g7-worker-0-5m8x2 | StorageNFS=172.17.5.175; restricted_network=172.16.0.59                           |
| ostest-f97g7-worker-0-rh8x8 | StorageNFS=172.17.5.162; restricted_network=172.16.0.203                          |
| ostest-f97g7-master-2       | restricted_network=172.16.0.87                                                    |
| ostest-f97g7-master-1       | restricted_network=172.16.0.50                                                    |
| ostest-f97g7-master-0       | restricted_network=172.16.0.67                                                    |
| installer_host              | installer_host-network=172.16.40.208, 10.46.44.182; restricted_network=172.16.0.3 |
+-----------------------------+-----------------------------------------------------------------------------------+

the cluster is fully operational, as observed running the tests.

The same is working with all the networkTypes: OpenShiftSDN, Kuryr and OVNKubernetes.

Comment 26 errata-xmlrpc 2022-08-10 10:40:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.