Bug 2030733

Summary:	wrong IP selected to connect to the nodes when ExternalCloudProvider enabled
Product:	OpenShift Container Platform	Reporter:	rlobillo
Component:	Cloud Compute	Assignee:	Matthew Booth <mbooth>
Cloud Compute sub component:	OpenStack Provider	QA Contact:	rlobillo
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, m.andre, mbooth, mdulko, mfedosin, mfojtik, pprinett, stephenfin
Version:	4.10	Keywords:	Triaged
Target Milestone:	---
Target Release:	4.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-08-10 10:40:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description rlobillo 2021-12-09 15:30:23 UTC

Description of problem: Wrong interface is selected to connect to the worker.

The openstack servers created are:

$ openstack server list
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| ID                                   | Name                        | Status | Networks                                                     | Image              | Flavor |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| c9eb09a2-adf5-4906-b57b-d0cdd0835ee9 | ostest-xp2wj-worker-0-vtnvj | ACTIVE | StorageNFS=172.17.5.213; ostest-xp2wj-openshift=10.196.1.42  | ostest-xp2wj-rhcos |        |
| 50dee8e8-9bd4-46d2-89f4-1939108e9a48 | ostest-xp2wj-worker-0-8kp42 | ACTIVE | StorageNFS=172.17.5.181; ostest-xp2wj-openshift=10.196.1.247 | ostest-xp2wj-rhcos |        |
| b6c16079-8117-48db-a777-2e10545587e9 | ostest-xp2wj-worker-0-5nbxp | ACTIVE | StorageNFS=172.17.5.199; ostest-xp2wj-openshift=10.196.1.151 | ostest-xp2wj-rhcos |        |
| 7c43bc0a-bcca-429c-bbd3-fabe9901dd35 | ostest-xp2wj-master-2       | ACTIVE | ostest-xp2wj-openshift=10.196.3.145                          | ostest-xp2wj-rhcos |        |
| 3cbb090c-96c5-4f0b-98a8-75707504d3d7 | ostest-xp2wj-master-1       | ACTIVE | ostest-xp2wj-openshift=10.196.0.41                           | ostest-xp2wj-rhcos |        |
| a6bc10d9-866f-4864-b9c6-e54b5853d0ed | ostest-xp2wj-master-0       | ACTIVE | ostest-xp2wj-openshift=10.196.2.254                          | ostest-xp2wj-rhcos |        |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+

The workers have two IPs, one for StorageNFS (for manila), and the other one for regular machine Subnet. However, one of the workers are using the StorageNFS network to create the debug pod on it (and failing):

$ oc debug node/ostest-xp2wj-worker-0-vtnvj
Starting pod/ostest-xp2wj-worker-0-vtnvj-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.17.5.213
If you don't see a command prompt, try pressing enter.

Removing debug pod ...
Error from server: error dialing backend: dial tcp 172.17.5.213:10250: i/o timeout

Moreover, the pods running on that problematic worker cannot be accessed:

$ oc get pods -n demo -o wide
NAME                    READY   STATUS    RESTARTS   AGE    IP            NODE                          NOMINATED NODE   READINESS GATES
demo-7897db69cc-4zlvj   1/1     Running   0          3h3m   10.131.0.26   ostest-xp2wj-worker-0-8kp42   <none>           <none>
demo-7897db69cc-d2g2n   1/1     Running   0          3h3m   10.129.2.46   ostest-xp2wj-worker-0-vtnvj   <none>           <none>
demo-7897db69cc-zdngv   1/1     Running   0          3h3m   10.128.2.13   ostest-xp2wj-worker-0-5nbxp   <none>           <none>
(shiftstack) [stack@undercloud-0 ~]$ oc rsh -n demo demo-7897db69cc-d2g2n
Error from server: error dialing backend: dial tcp 172.17.5.213:10250: i/o timeout
(shiftstack) [stack@undercloud-0 ~]$ 


The other two workers work fine:

$ oc debug node/ostest-xp2wj-worker-0-5nbxp
Starting pod/ostest-xp2wj-worker-0-5nbxp-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.196.1.151
If you don't see a command prompt, try pressing enter.
sh-4.4# 


I observed they have their IPs defined in different order in their status section:
- Problematic worker:

(shiftstack) [stack@undercloud-0 ~]$ oc get node/ostest-xp2wj-worker-0-vtnvj -o json | jq .status.addresses                                                                                                                                  
[
  {
    "address": "172.17.5.213",
    "type": "InternalIP"
  },
  {
    "address": "10.196.1.42",
    "type": "InternalIP"
  },
  {
    "address": "ostest-xp2wj-worker-0-vtnvj",
    "type": "Hostname"
  }
]
- the other one:

$ oc get node/ostest-xp2wj-worker-0-5nbxp -o json | jq .status.addresses
[
  {
    "address": "10.196.1.151",
    "type": "InternalIP"
  },
  {
    "address": "172.17.5.199",
    "type": "InternalIP"
  },
  {
    "address": "ostest-xp2wj-worker-0-5nbxp",
    "type": "Hostname"
  }
]


It is also observed that node has only one interface on its status when externallCCM is not enabled:
$ openstack server list
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| ID                                   | Name                        | Status | Networks                                                     | Image              | Flavor |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------------+--------------------+--------+
| 6192993a-cc9d-4e65-b0e3-ddf4828e2e24 | ostest-ngz6v-worker-0-f7xzq | ACTIVE | StorageNFS=172.17.5.158; ostest-ngz6v-openshift=10.196.1.154 | ostest-ngz6v-rhcos |        | 
and:
$ oc get node/ostest-ngz6v-worker-0-f7xzq -o json | jq .status.addresses
[
  {
    "address": "10.196.1.154",
    "type": "InternalIP"
  },
  {
    "address": "ostest-ngz6v-worker-0-f7xzq",
    "type": "Hostname"
  }
]


Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-06-123512


How reproducible: Random


Steps to Reproduce: Install OCP4.10 enabling externalCCM.

Actual results: worker node cannot be reached.

Expected results: worker node and the pods running on it can be accessed.


Additional info: must gather: http://file.rdu.redhat.com/rlobillo/must-gather-install.tar.gz

Comment 4 Matthew Booth 2021-12-10 11:07:21 UTC

This bug feels like a can of worms. Firstly I don't actually know how oc debug works, but to make my assumptions explicit, I am guessing it:

* Starts a pod on a node with host networking
* Uses the PodIP assigned to that pod by kubelet
* Connects to the PodIP internally from... presumably an apiserver.

The docs [1] describe InternalIP as "Typically the IP address of the node that is routable only within the cluster", but also "The usage of these fields varies depending on your cloud provider or bare metal configuration.". This seems a bit vague, but my interpretation is that it lists endpoint addresses for this node for internal communication, and a storage network exposed on a subset of nodes probably doesn't meet that criterion.

The comment on GetNodeHostIPs() in kubernetes/kubernetes [2] suggests it's the immediate source of oc debug's errant IP.

We should separate the 2 different lists of Addresses here:

The Machine object has a list of Addresses. These are not directly used by kubernetes, but define a list of IP addresses which will be approved if requested in a CSR generated by the kubelet running on the host. This list is generated by CAPO.

The Node object has a list of Addresses. These are the actual addresses used by kubernetes. These must be a (non-strict) subset of the Addresses defined on the Machine object, or kubelet will fail to come up when its CSR is not approved. These addresses are written by either kubelet (legacy cloud provider) or the CCM (external cloud provider).

This bug concerns the list of addresses on the Node object, and is therefore a cloud provider issue, not a CAPO issue.

For investigation:

* What's the implementation difference between legacy cloud provider (OpenStack) and CCM (OpenStack)
* What metadata is available to CCM to distinguish 'cluster' network(s) from infra networks?

[1] https://kubernetes.io/docs/concepts/architecture/nodes/#addresses
[2] https://github.com/kubernetes/kubernetes/blob/cc6f12583f2b611e9469a6b2e0247f028aae246b/pkg/util/node/node.go#L89-L93

Comment 6 Pierre Prinetti 2021-12-22 15:27:54 UTC

Setting the priority as "medium". This must be properly investigated (and possibly resolved) before GA, which is not imminent.

Comment 7 Matthew Booth 2022-01-06 17:26:05 UTC

NodeAddresses are generated quite differently on legacy vs external cloud provider.

Legacy cloud provider:
https://github.com/kubernetes/legacy-cloud-providers/blob/1a100831c5a0718b3ef6ae88bf506d383d387b45/openstack/openstack.go#L565-L626


External cloud provider:
https://github.com/kubernetes/cloud-provider-openstack/blob/d46aa87534042ad1e26b812d1ef1aa140317a25e/pkg/openstack/instances.go#L458-L565

where interfaces is provided by:
https://github.com/kubernetes/cloud-provider-openstack/blob/d46aa87534042ad1e26b812d1ef1aa140317a25e/pkg/openstack/instances.go#L611-L629

Comment 8 Matthew Booth 2022-01-06 17:44:37 UTC

From git spelunking, this seems relevant: https://github.com/kubernetes/cloud-provider-openstack/issues/407

Comment 9 Matthew Booth 2022-01-06 18:09:15 UTC

Ok, we need to fix this by setting internal-network-name in cloud.conf.

We currently don't have a mechanism to customise cloud.conf, so it's not yet possible to fix. However, we need to allow this anyway as a matter of urgency before we can GA this feature. Once we have the ability to customise cloud.conf this should be a relatively simple fix.

Comment 10 ShiftStack Bugwatcher 2022-01-07 07:03:37 UTC

Removing the Triaged keyword because:
* the QE automation assessment (flag qe_test_coverage) is missing

Comment 15 Matthew Booth 2022-01-26 13:03:38 UTC

*** Bug 2045493 has been marked as a duplicate of this bug. ***

Comment 19 Matthew Booth 2022-02-02 16:24:04 UTC

*** Bug 2043659 has been marked as a duplicate of this bug. ***

Comment 24 rlobillo 2022-04-11 15:56:02 UTC

Verified on 4.11.0-0.nightly-2022-04-08-205307 on top of RHOS-16.2-RHEL-8-20220311.n.1 installing OCP cluster with eCCM enabled on featureGate:

$ oc get featureGate/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
  creationTimestamp: "2022-04-11T13:29:12Z"
  generation: 1
  name: cluster
  resourceVersion: "1379"
  uid: 5766ece9-3dcc-4982-b355-f4c37d739ab9
spec:
  customNoUpgrade:
    enabled:
    - ExternalCloudProvider
  featureSet: CustomNoUpgrade

$ oc get pods -n openshift-cloud-controller-manager
NAME                                                  READY   STATUS    RESTARTS   AGE
openstack-cloud-controller-manager-7f7f67c5f8-lb4wx   1/1     Running   0          65m
openstack-cloud-controller-manager-7f7f67c5f8-sh9h9   1/1     Running   0          65m

---------------------------

Once the cluster is deployed through IPI, all the machines includes the primarySubnet:

$ oc get machine -n openshift-machine-api -o json | jq .items[].spec.providerSpec.value.primarySubnet
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"
"f94ecb70-604a-447f-896b-6fc40b045e4c"

And the workers have two networks defined:

$ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=worker -o json | jq .items[].spec.providerSpec.value.networks
[
  {
    "filter": {},
    "subnets": [
      {
        "filter": {},
        "uuid": "f94ecb70-604a-447f-896b-6fc40b045e4c"
      }
    ]
  },
  {
    "filter": {},
    "noAllowedAddressPairs": true,
    "uuid": "b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac"
  }
]
[
  {
    "filter": {},
    "subnets": [
      {
        "filter": {},
        "uuid": "f94ecb70-604a-447f-896b-6fc40b045e4c"
      }
    ]
  },
  {
    "filter": {},
    "noAllowedAddressPairs": true,
    "uuid": "b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac"
  }
]
[
  {
    "filter": {},
    "subnets": [
      {
        "filter": {},
        "uuid": "f94ecb70-604a-447f-896b-6fc40b045e4c"
      }
    ]
  },
  {
    "filter": {},
    "noAllowedAddressPairs": true,
    "uuid": "b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac"
  }
]

where the primarySubnet is:

$ openstack subnet list | grep f94ecb70-604a-447f-896b-6fc40b045e4c
| f94ecb70-604a-447f-896b-6fc40b045e4c | restricted_subnet     | 059e58b8-fd1c-41d1-b44c-d7fced04d078 | 172.16.0.0/24  |

and the secondary network is the one used for integrating with Manila:

$ openstack subnet list | grep b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac
| 5dbe57da-73ea-457f-b044-8f05459d9368 | StorageNFSSubnet      | b37bbd3d-e5f9-45ce-a9f9-6934f8f7d0ac | 172.17.5.0/24  |


As expected, nodes only include the IP defined as primary:

$ oc get nodes -o json | jq '.items[].status.addresses[]'                                                                                                                                                                            
{
  "address": "172.16.0.67",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-master-0",
  "type": "Hostname"
}
{
  "address": "172.16.0.50",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-master-1",
  "type": "Hostname"
}
{
  "address": "172.16.0.87",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-master-2",
  "type": "Hostname"
}
{
  "address": "172.16.0.59",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-worker-0-5m8x2",
  "type": "Hostname"
}
{
  "address": "172.16.0.210",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-worker-0-7tzcq",
  "type": "Hostname"
}
{
  "address": "172.16.0.203",
  "type": "InternalIP"
}
{
  "address": "ostest-f97g7-worker-0-rh8x8",
  "type": "Hostname"
}

and the openstack instances have the two interfaces attached:

$ openstack server list -c Name -c Networks
+-----------------------------+-----------------------------------------------------------------------------------+
| Name                        | Networks                                                                          |
+-----------------------------+-----------------------------------------------------------------------------------+
| ostest-f97g7-worker-0-7tzcq | StorageNFS=172.17.5.220; restricted_network=172.16.0.210                          |
| ostest-f97g7-worker-0-5m8x2 | StorageNFS=172.17.5.175; restricted_network=172.16.0.59                           |
| ostest-f97g7-worker-0-rh8x8 | StorageNFS=172.17.5.162; restricted_network=172.16.0.203                          |
| ostest-f97g7-master-2       | restricted_network=172.16.0.87                                                    |
| ostest-f97g7-master-1       | restricted_network=172.16.0.50                                                    |
| ostest-f97g7-master-0       | restricted_network=172.16.0.67                                                    |
| installer_host              | installer_host-network=172.16.40.208, 10.46.44.182; restricted_network=172.16.0.3 |
+-----------------------------+-----------------------------------------------------------------------------------+

the cluster is fully operational, as observed running the tests.

The same is working with all the networkTypes: OpenShiftSDN, Kuryr and OVNKubernetes.

Comment 26 errata-xmlrpc 2022-08-10 10:40:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069