Description of problem: Starting from OpenShift 4.8, there is ability for creating cluster with machineset which doesn't have networks like in the example below: spec: template: spec: metadata: {} providerSpec: value: networks: - filter: {} subnets: - filter: name: <machines_subnet_name> but, instead ports: spec: template: spec: metadata: {} providerSpec: value: ports: - networkID: <radio_network_UUID> nameSuffix: radio fixedIPs: - subnetID: <radio_subnet_UUID> tags: - sriov - radio vnicType: direct portSecurity: false primarySubnet: <machines_subnet_UUID> primarySubnet should be defined in case of multiple networks. Currently kuryr-kubernetes only supports providerSpec.value.networks definition, while it should also ports and primarySubnet. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Failed on 4.10.0-0.nightly-2021-11-02-191632 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with Kuryr and manila enabled. While creating below machineset, the new worker is correctly added to the cluster but it is defining the primary subnet based on the position in the list (1st one): $ oc get machineset/ostest-7p4jn-worker-new -n openshift-machine-api -o json | jq .spec.template.spec.providerSpec.value { "apiVersion": "openstackproviderconfig.openshift.io/v1alpha1", "cloudName": "openstack", "cloudsSecret": { "name": "openstack-cloud-credentials", "namespace": "openshift-machine-api" }, "flavor": "m4.xlarge", "image": "ostest-7p4jn-rhcos", "kind": "OpenstackProviderSpec", "metadata": { "creationTimestamp": null }, "ports": [ { "fixedIPs": [ { "subnetID": "f77b86cd-2c7d-4ef6-bbb2-368279925354" # ostest-7p4jn-nodes } ], "networkID": "6c9dfdf8-1bdd-465d-bfbe-9ec3643f1866", "securityGroups": [ "fce9960a-76e9-4ba3-b631-338f712e2a1c" ] }, { "fixedIPs": [ { "subnetID": "33f48333-caa1-415c-8aea-1ddfad7f2318" # StorageNFSSubnet } ], "networkID": "671634d1-c06f-433f-878f-745244a1f803", "securityGroups": [ "fce9960a-76e9-4ba3-b631-338f712e2a1c" ] } ], "primarySubnet": "f77b86cd-2c7d-4ef6-bbb2-368279925354", # ostest-7p4jn-nodes "serverGroupName": "ostest-7p4jn-worker", "serverMetadata": { "Name": "ostest-7p4jn-worker", "openshiftClusterID": "ostest-7p4jn" }, "tags": [ "openshiftClusterID=ostest-7p4jn" ], "trunk": true, "userDataSecret": { "name": "worker-user-data" } } NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ostest-7p4jn-master-0 Running m4.xlarge regionOne nova 5h57m openshift-machine-api ostest-7p4jn-master-1 Running m4.xlarge regionOne nova 5h57m openshift-machine-api ostest-7p4jn-master-2 Running m4.xlarge regionOne nova 5h57m openshift-machine-api ostest-7p4jn-worker-0-cwhtl Running m4.xlarge regionOne nova 5h48m openshift-machine-api ostest-7p4jn-worker-0-zhwc4 Running m4.xlarge regionOne nova 5h48m openshift-machine-api ostest-7p4jn-worker-new-hqqcj Running m4.xlarge regionOne nova 15m NAME STATUS ROLES AGE VERSION ostest-7p4jn-master-0 Ready master 5h55m v1.22.1+674f31e ostest-7p4jn-master-1 Ready master 5h54m v1.22.1+674f31e ostest-7p4jn-master-2 Ready master 5h55m v1.22.1+674f31e ostest-7p4jn-worker-0-cwhtl Ready worker 5h38m v1.22.1+674f31e ostest-7p4jn-worker-0-zhwc4 Ready worker 5h37m v1.22.1+674f31e ostest-7p4jn-worker-new-hqqcj Ready worker 6m4s v1.22.1+674f31e [core@ostest-7p4jn-worker-new-hqqcj ~]$ ip r default via 10.196.0.1 dev ens3 proto dhcp metric 100 default via 172.17.5.1 dev ens4 proto dhcp metric 101 10.196.0.0/16 dev ens3 proto kernel scope link src 10.196.0.239 metric 100 169.254.169.254 via 10.196.0.10 dev ens3 proto dhcp metric 100 169.254.169.254 via 172.17.5.150 dev ens4 proto dhcp metric 101 172.17.5.0/24 dev ens4 proto kernel scope link src 172.17.5.167 metric 101 where: $ openstack subnet list | grep StorageNFS | 33f48333-caa1-415c-8aea-1ddfad7f2318 | StorageNFSSubnet | 671634d1-c06f-433f-878f-745244a1f803 | 172.17.5.0/24 | $ openstack subnet list | grep ostest-7p4jn-nodes | f77b86cd-2c7d-4ef6-bbb2-368279925354 | ostest-7p4jn-nodes | 6c9dfdf8-1bdd-465d-bfbe-9ec3643f1866 | 10.196.0.0/16 | The worker is operation and can handle pods running on it: $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES demo-66cdc7b66-57wrp 1/1 Running 0 3h34m 10.128.135.81 ostest-7p4jn-worker-0-zhwc4 <none> <none> demo-66cdc7b66-9s64x 1/1 Running 0 4h19m 10.128.134.55 ostest-7p4jn-worker-0-zhwc4 <none> <none> demo-66cdc7b66-cwwjb 1/1 Running 0 93s 10.128.135.222 ostest-7p4jn-worker-new-hqqcj <none> <none> demo-66cdc7b66-kzkts 1/1 Running 0 4h19m 10.128.135.142 ostest-7p4jn-worker-0-cwhtl <none> <none> However, as stated on [1], the primarySubnet should be considered as the default network, but this is not happening when the definition of the networks is in different order: $ oc get machineset/ostest-7p4jn-worker-new -n openshift-machine-api -o json | jq .spec.template.spec.providerSpec.value { "apiVersion": "openstackproviderconfig.openshift.io/v1alpha1", "cloudName": "openstack", "cloudsSecret": { "name": "openstack-cloud-credentials", "namespace": "openshift-machine-api" }, "flavor": "m4.xlarge", "image": "ostest-7p4jn-rhcos", "kind": "OpenstackProviderSpec", "metadata": { "creationTimestamp": null }, "ports": [ { "fixedIPs": [ { "subnetID": "33f48333-caa1-415c-8aea-1ddfad7f2318" # StorageNFSSubnet } ], "networkID": "671634d1-c06f-433f-878f-745244a1f803", "securityGroups": [ "fce9960a-76e9-4ba3-b631-338f712e2a1c" ] }, { "fixedIPs": [ { "subnetID": "f77b86cd-2c7d-4ef6-bbb2-368279925354" # ostest-7p4jn-nodes } ], "networkID": "6c9dfdf8-1bdd-465d-bfbe-9ec3643f1866", "securityGroups": [ "fce9960a-76e9-4ba3-b631-338f712e2a1c" ] } ], "primarySubnet": "f77b86cd-2c7d-4ef6-bbb2-368279925354", # ostest-7p4jn-nodes "serverGroupName": "ostest-7p4jn-worker", "serverMetadata": { "Name": "ostest-7p4jn-worker", "openshiftClusterID": "ostest-7p4jn" }, "tags": [ "openshiftClusterID=ostest-7p4jn" ], "trunk": true, "userDataSecret": { "name": "worker-user-data" } } In such case, the new worker remains on Provisioned status: +--------------------------------------+-------------------------------+--------+--------------------------------------------------------------+--------------------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-------------------------------+--------+--------------------------------------------------------------+--------------------+--------+ | 2fa30b1b-36ed-47f9-a393-ccf1fcc961c8 | ostest-7p4jn-worker-new-5n9qc | ACTIVE | StorageNFS=172.17.5.237; ostest-7p4jn-openshift=10.196.2.135 | ostest-7p4jn-rhcos | | | 8d2929b8-8d44-42c0-84ec-705f71137889 | ostest-7p4jn-worker-0-zhwc4 | ACTIVE | StorageNFS=172.17.5.217; ostest-7p4jn-openshift=10.196.3.27 | ostest-7p4jn-rhcos | | | c25d92a3-ed7b-441c-9b98-9d3eeeec582f | ostest-7p4jn-worker-0-cwhtl | ACTIVE | StorageNFS=172.17.5.235; ostest-7p4jn-openshift=10.196.2.132 | ostest-7p4jn-rhcos | | | 161f3bb4-eb90-4e95-8031-bc2ad3cd1fc4 | ostest-7p4jn-master-2 | ACTIVE | ostest-7p4jn-openshift=10.196.1.128 | ostest-7p4jn-rhcos | | | b7d5c5ec-92e3-4b6d-bd03-a788ec163677 | ostest-7p4jn-master-1 | ACTIVE | ostest-7p4jn-openshift=10.196.3.30 | ostest-7p4jn-rhcos | | | 47406c5c-4ed0-4c26-a408-5e0cab4d665c | ostest-7p4jn-master-0 | ACTIVE | ostest-7p4jn-openshift=10.196.1.133 | ostest-7p4jn-rhcos | | +--------------------------------------+-------------------------------+--------+--------------------------------------------------------------+--------------------+--------+ NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ostest-7p4jn-master-0 Running m4.xlarge regionOne nova 6h34m openshift-machine-api ostest-7p4jn-master-1 Running m4.xlarge regionOne nova 6h34m openshift-machine-api ostest-7p4jn-master-2 Running m4.xlarge regionOne nova 6h34m openshift-machine-api ostest-7p4jn-worker-0-cwhtl Running m4.xlarge regionOne nova 6h25m openshift-machine-api ostest-7p4jn-worker-0-zhwc4 Running m4.xlarge regionOne nova 6h25m openshift-machine-api ostest-7p4jn-worker-new-5n9qc Provisioned m4.xlarge regionOne nova 6m8s and the worker is using the StorageNFS as the default subnet: [core@localhost ~]$ ip r default via 172.17.5.1 dev ens3 proto dhcp metric 100 default via 10.196.0.1 dev ens4 proto dhcp metric 101 10.196.0.0/16 dev ens4 proto kernel scope link src 10.196.2.135 metric 101 169.254.169.254 via 172.17.5.150 dev ens3 proto dhcp metric 100 169.254.169.254 via 10.196.0.10 dev ens4 proto dhcp metric 101 172.17.5.0/24 dev ens3 proto kernel scope link src 172.17.5.237 metric 100 and from that network it is not possible to reach the outside world: $ journal -f [...] Nov 03 14:54:52 localhost bash[2287]: Error: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c1b47d4bcbaaf9921d918e448c3d40f510d3f0e8dadc8d100fc5d238bf501012: error pinging docker registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.46.0.31:53: read udp 172.17.5.237:46076->10.46.0.31:53: i/o timeout Nov 03 14:54:52 localhost sh[2295]: Error: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ee63562212520714c41f65b1976c61939a94077b2ae325e2b64e94571a215416: error pinging docker registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.46.0.31:53: read udp 172.17.5.237:60250->10.46.0.31:53: i/o timeout Nov 03 14:54:57 localhost bash[2287]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c1b47d4bcbaaf9921d918e448c3d40f510d3f0e8dadc8d100fc5d238bf501012... openshift-kuryr remains stable during this process: $ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-4jczj 1/1 Running 0 6h33m kuryr-cni-9979d 1/1 Running 2 (4h5m ago) 6h16m kuryr-cni-gcvk6 1/1 Running 2 (4h6m ago) 6h17m kuryr-cni-hnv5t 1/1 Running 0 6h33m kuryr-cni-lrfv2 1/1 Running 0 6h33m kuryr-controller-77c56f46d-bsfj7 1/1 Running 3 (4h20m ago) 6h33m [1] https://docs.openshift.com/container-platform/4.8/machine_management/creating_machinesets/creating-machineset-osp.html#machineset-yaml-osp-sr-iov-port-security_creating-machineset-osp
Hi Ramon, I did some testing regarding kuryr and its behavior and seems that (at least) Kuryr works as intended. I have a cluster with single worker node, as follows: $ openstack server list +--------------------------------------+-----------------------------+--------+-------------------------------------+-------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------------------------+--------+-------------------------------------+-------+-----------+ | 635187da-e0e8-4a54-b377-54f9e3a01d86 | ostest-7kgn4-worker-0-mqqt2 | ACTIVE | ostest-7kgn4-openshift=10.196.3.150 | rhcos | m1.xlarge | | 276663f9-f217-41fa-987d-d4f0ee7ac2a3 | ostest-7kgn4-master-2 | ACTIVE | ostest-7kgn4-openshift=10.196.2.3 | rhcos | m1.xlarge | | b8989d53-af9c-4ee5-b9f6-4daba50dfa18 | ostest-7kgn4-master-1 | ACTIVE | ostest-7kgn4-openshift=10.196.3.111 | rhcos | m1.xlarge | | fd5640b1-166e-42ed-b165-091dae38d371 | ostest-7kgn4-master-0 | ACTIVE | ostest-7kgn4-openshift=10.196.3.57 | rhcos | m1.xlarge | +--------------------------------------+-----------------------------+--------+-------------------------------------+-------+-----------+ kuryr controller is healthy and corresponding cni are placed on each nodes: $ kubectl get pod -A |grep kuryr openshift-kuryr kuryr-cni-5hpkz 1/1 Running 0 7d2h openshift-kuryr kuryr-cni-6zmwt 1/1 Running 0 7d2h openshift-kuryr kuryr-cni-tl84v 1/1 Running 0 7d2h openshift-kuryr kuryr-controller-7d797b9d89-75rp4 1/1 Running 0 4m23s Now, I've created test network, (85d22c28-53e5-4f20-814c-11b664b91969) with subnet (637f9b42-cb60-4b10-8490-fc0b99dcd8fa, with cidr 10.197.0.0/16), and attach that subnet to the router. Next, I added primarySubnet to the machineset yaml pointing to newly created subnet: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/memoryMb: "16384" machine.openshift.io/vCPU: "4" creationTimestamp: "2022-01-14T10:16:18Z" generation: 1 labels: machine.openshift.io/cluster-api-cluster: ostest-7kgn4 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker name: ostest-7kgn4-worker-0 namespace: openshift-machine-api spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: ostest-7kgn4 machine.openshift.io/cluster-api-machineset: ostest-7kgn4-worker-0 template: metadata: labels: machine.openshift.io/cluster-api-cluster: ostest-7kgn4 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: ostest-7kgn4-worker-0 spec: metadata: {} providerSpec: value: apiVersion: openstackproviderconfig.openshift.io/v1alpha1 cloudName: openstack cloudsSecret: name: openstack-cloud-credentials namespace: openshift-machine-api flavor: m1.xlarge image: rhcos kind: OpenstackProviderSpec metadata: creationTimestamp: null networks: - filter: {} subnets: - filter: name: ostest-7kgn4-nodes tags: openshiftClusterID=ostest-7kgn4 primarySubnet: 637f9b42-cb60-4b10-8490-fc0b99dcd8fa securityGroups: - filter: {} name: ostest-7kgn4-worker serverGroupName: ostest-7kgn4-worker serverMetadata: Name: ostest-7kgn4-worker openshiftClusterID: ostest-7kgn4 tags: - openshiftClusterID=ostest-7kgn4 trunk: true userDataSecret: name: worker-user-data and applied it on the machine set, scale down, and up, which eventually will recreate worker: $ openstack server list +--------------------------------------+-----------------------------+--------+-------------------------------------+-------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------------------------+--------+-------------------------------------+-------+-----------+ | a900dcfc-9c50-45ae-8e45-b0aef114c191 | ostest-7kgn4-worker-0-xrds5 | ACTIVE | ostest-7kgn4-openshift=10.196.2.182 | rhcos | m1.xlarge | | 276663f9-f217-41fa-987d-d4f0ee7ac2a3 | ostest-7kgn4-master-2 | ACTIVE | ostest-7kgn4-openshift=10.196.2.3 | rhcos | m1.xlarge | | b8989d53-af9c-4ee5-b9f6-4daba50dfa18 | ostest-7kgn4-master-1 | ACTIVE | ostest-7kgn4-openshift=10.196.3.111 | rhcos | m1.xlarge | | fd5640b1-166e-42ed-b165-091dae38d371 | ostest-7kgn4-master-0 | ACTIVE | ostest-7kgn4-openshift=10.196.3.57 | rhcos | m1.xlarge | +--------------------------------------+-----------------------------+--------+-------------------------------------+-------+-----------+ after a while I can see on the controller log this line: 2022-01-21 13:27:03.498 1 INFO kuryr_kubernetes.controller.drivers.node_subnets [-] Adding subnet 637f9b42-cb60-4b10-8490-fc0b99dcd8fa to the worker nodes subnets as machine ostest-7kgn4-worker-0-xrds5 runs in it. which proofs, that Kuryr selects right subnet due to the precedence of primarySubnet key in machineset definition. same for multiple networks but without primarySubnet: $ openstack --os-cloud standalone_openshift subnet create --network 85d22c28-53e5-4f20-814c-11b664b91969 --subnet-range 10.198.0.0/16 test_another_subnet $ cat machineset.yaml apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/memoryMb: "16384" machine.openshift.io/vCPU: "4" creationTimestamp: "2022-01-14T10:16:18Z" generation: 1 labels: machine.openshift.io/cluster-api-cluster: ostest-7kgn4 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker name: ostest-7kgn4-worker-0 namespace: openshift-machine-api spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: ostest-7kgn4 machine.openshift.io/cluster-api-machineset: ostest-7kgn4-worker-0 template: metadata: labels: machine.openshift.io/cluster-api-cluster: ostest-7kgn4 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: ostest-7kgn4-worker-0 spec: metadata: {} providerSpec: value: apiVersion: openstackproviderconfig.openshift.io/v1alpha1 cloudName: openstack cloudsSecret: name: openstack-cloud-credentials namespace: openshift-machine-api flavor: m1.xlarge image: rhcos kind: OpenstackProviderSpec metadata: creationTimestamp: null networks: - filter: {} subnets: - filter: name: test_another_subnet # 8c12c9e2-c031-44ac-b206-67a681699d92 - filter: {} subnets: - filter: name: ostest-7kgn4-nodes # 365bc5c3-fafd-42dc-8e84-d070495f274e tags: openshiftClusterID=ostest-7kgn4 securityGroups: - filter: {} name: ostest-7kgn4-worker serverGroupName: ostest-7kgn4-worker serverMetadata: Name: ostest-7kgn4-worker openshiftClusterID: ostest-7kgn4 tags: - openshiftClusterID=ostest-7kgn4 trunk: true userDataSecret: name: worker-user-data and appropriate line in kuryr logs: 2022-01-21 13:49:12.258 1 INFO kuryr_kubernetes.controller.drivers.node_subnets [-] Adding subnet 8c12c9e2-c031-44ac-b206-67a681699d92 to the worker nodes subnets as machine ostest-7kgn4-worker-0-5hbt4 runs in it. So I believe, kuryr works as intended.
Thanks Roman. I created this bug on MAPO: https://bugzilla.redhat.com/show_bug.cgi?id=2043659
Removing the Triaged keyword because: * the QE automation assessment (flag qe_test_coverage) is missing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056