Version: $ openshift-install version 4.5 Platform: Openstack Please specify: * IPI (automated install with `openshift-install`. What happened? In one of the steps in the docs about replacing an unhealthy etcd members[1] a master Machine needs to be manually recreated. Upon recreation of this Machine Kuryr expects the Machine's port name to have the following pattern e.g. "ostest-vxgmv-master-port-1", however the Port for that Machine is created with the "name" fragment missing on the port's name "ostest-vxgmv-master-1". This breaks the Kuryr detection of the Master Ports and makes it unable to create the members for the API Load Balancer which should have the IPs of those master Ports. [1] https://docs.openshift.com/container-platform/4.6/backup_and_restore/replacing-unhealthy-etcd-member.html#restore-replace-stopped-etcd-member_replacing-unhealthy-etcd-member What did you expect to happen? Master ports created with the following pattern "^%s-master-port-[0-9]" as it's handled on a regular IPI installation without day 2 operations. #Enter text here. How to reproduce it (as minimally and precisely as possible)? After the cluster is UP, copy one master machine manifest: $oc get machines ostest-vxgmv-master-0 -n openshift-machine-api -o yaml > new-master-machine.yaml Remove the status and change the name: apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: name: ostest-vxgmv-master-3 namespace: openshift-machine-api spec: metadata: {} providerSpec: value: apiVersion: openstackproviderconfig.openshift.io/v1alpha1 cloudName: openstack cloudsSecret: name: openstack-cloud-credentials namespace: openshift-machine-api flavor: m4.xlarge image: ostest-vxgmv-rhcos kind: OpenstackProviderSpec metadata: creationTimestamp: null networks: - filter: {} subnets: - filter: name: ostest-vxgmv-nodes tags: openshiftClusterID=ostest-vxgmv securityGroups: - filter: {} name: ostest-vxgmv-master serverGroupName: ostest-vxgmv-master serverMetadata: Name: ostest-vxgmv-master openshiftClusterID: ostest-vxgmv tags: - openshiftClusterID=ostest-vxgmv trunk: true userDataSecret: name: master-user-data Check that port name differs from other ports for Master VM: (shiftstack) [stack@undercloud-0 ~]$ openstack port list |grep master | 242af9fe-6147-40b3-98e8-64e05006d438 | ostest-vxgmv-master-port-1 | fa:16:3e:37:64:f3 | ip_address='x.x.x.132', subnet_id='8bbbeee9-76e9-446c-bfde-1ebe3c2fe3e8' | ACTIVE | | 66876b95-7427-4750-babb-2ae806a2bc30 | ostest-vxgmv-master-port-2 | fa:16:3e:34:58:ac | ip_address='x.x.x.74', subnet_id='8bbbeee9-76e9-446c-bfde-1ebe3c2fe3e8' | ACTIVE | | d441eab2-5acf-463b-8237-23552484f32c | ostest-vxgmv-master-3 | fa:16:3e:13:5e:49 | ip_address='x.x.x.16', subnet_id='8bbbeee9-76e9-446c-bfde-1ebe3c2fe3e8' | ACTIVE | Anything else we need to know? #Enter text here.
As discussed out-of-band: On installation, Terraform names ports "${var.cluster_id}-master-port-${count.index}" [1], while CAPO just reuses the machine name (without "-port") [2]. With this premise, until it's clear what happens in upgrade/downgrade, the ideal Kuryr behaviour is probably to accept both naming patterns... if at all possible. And I agree that now that we know, we should probably make naming consistent anyway. [1]: https://github.com/openshift/installer/blob/7e02fe75a583242e4cbb8c60472b105acf7a8266/data/data/openstack/topology/private-network.tf#L41 [2]: https://github.com/openshift/cluster-api-provider-openstack/blob/c4807294a92e315c766256fb2d691dc6b8a08219/pkg/cloud/openstack/clients/machineservice.go#L490-L492
Hi Pierre, The fix for checking for both names is already on the cluster-network-operator master https://bugzilla.redhat.com/show_bug.cgi?id=1933269. From the ocp upgrade perspective I believe the nodes are not re-created and consequently we wouldn't need to worry, but of course this is something that needs double checking.
Maysa, I have filed a patch[1] for getting the same port names day-1 and day-2. Since the Kuryr patch is already tracked in Bug 1933269, I wanted to check with you: can we close this BZ once this Installer patch is merged? [1]: https://github.com/openshift/installer/pull/4734
Checked with 4.8.0-0.nightly-2021-03-18-000857, and it got fixed now, move to verified. $ oc get machine -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api wj48ios318ay-f6zqd-master-0 Running m1.xlarge regionOne nova 58m openshift-machine-api wj48ios318ay-f6zqd-master-1 Running m1.xlarge regionOne nova 58m openshift-machine-api wj48ios318ay-f6zqd-master-2 Running m1.xlarge regionOne nova 58m openshift-machine-api wj48ios318ay-f6zqd-worker-0-gb65k Running m1.large regionOne nova 55m openshift-machine-api wj48ios318ay-f6zqd-worker-0-w7k6z Running m1.large regionOne nova 55m openshift-machine-api wj48ios318ay-f6zqd-worker-0-z658h Running m1.large regionOne nova 55m # openstack port list |grep -i wj48ios318ay-f6zqd- | 2220d98b-ef0c-4c33-90a3-e2ed843e2315 | wj48ios318ay-f6zqd-worker-0-z658h | fa:16:3e:e0:c1:be | ip_address='192.168.3.21', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 4350e7fa-fdd2-4a74-93a0-5072e3f634ce | wj48ios318ay-f6zqd-master-2 | fa:16:3e:e1:c4:96 | ip_address='192.168.1.62', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 50f9e1f3-7261-455b-807c-046309e38aba | wj48ios318ay-f6zqd-worker-0-w7k6z | fa:16:3e:14:8a:39 | ip_address='192.168.0.81', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 5e671f6f-9ca2-4a71-8195-a7088441b7c7 | wj48ios318ay-f6zqd-master-0 | fa:16:3e:11:33:8d | ip_address='192.168.0.166', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 633445db-1389-44e1-8ff7-3bed2eb68124 | wj48ios318ay-f6zqd-api-port | fa:16:3e:f8:42:7e | ip_address='192.168.0.5', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | DOWN | | 8713b2ab-f518-429b-b59d-cbf317f5336c | wj48ios318ay-f6zqd-master-1 | fa:16:3e:98:20:33 | ip_address='192.168.1.30', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | f45162d8-73ae-421a-bfed-adb787d2fe9d | wj48ios318ay-f6zqd-worker-0-gb65k | fa:16:3e:26:68:c6 | ip_address='192.168.3.35', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | fa111223-cf98-4ec4-82e0-9288eaad2053 | wj48ios318ay-f6zqd-ingress-port | fa:16:3e:91:cd:8f | ip_address='192.168.0.7', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | DOWN | $ oc get machine -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api wj48ios318ay-f6zqd-master-0 Running m1.xlarge regionOne nova 67m openshift-machine-api wj48ios318ay-f6zqd-master-1 Running m1.xlarge regionOne nova 67m openshift-machine-api wj48ios318ay-f6zqd-master-2 Running m1.xlarge regionOne nova 67m openshift-machine-api wj48ios318ay-f6zqd-master-4 Running m1.xlarge regionOne nova 7m8s openshift-machine-api wj48ios318ay-f6zqd-worker-0-gb65k Running m1.large regionOne nova 64m openshift-machine-api wj48ios318ay-f6zqd-worker-0-w7k6z Running m1.large regionOne nova 64m openshift-machine-api wj48ios318ay-f6zqd-worker-0-z658h Running m1.large regionOne nova 64m # openstack port list |grep -i wj48ios318ay-f6zqd- | 2220d98b-ef0c-4c33-90a3-e2ed843e2315 | wj48ios318ay-f6zqd-worker-0-z658h | fa:16:3e:e0:c1:be | ip_address='192.168.3.21', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 4350e7fa-fdd2-4a74-93a0-5072e3f634ce | wj48ios318ay-f6zqd-master-2 | fa:16:3e:e1:c4:96 | ip_address='192.168.1.62', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 50f9e1f3-7261-455b-807c-046309e38aba | wj48ios318ay-f6zqd-worker-0-w7k6z | fa:16:3e:14:8a:39 | ip_address='192.168.0.81', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 5e671f6f-9ca2-4a71-8195-a7088441b7c7 | wj48ios318ay-f6zqd-master-0 | fa:16:3e:11:33:8d | ip_address='192.168.0.166', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | 633445db-1389-44e1-8ff7-3bed2eb68124 | wj48ios318ay-f6zqd-api-port | fa:16:3e:f8:42:7e | ip_address='192.168.0.5', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | DOWN | | 8713b2ab-f518-429b-b59d-cbf317f5336c | wj48ios318ay-f6zqd-master-1 | fa:16:3e:98:20:33 | ip_address='192.168.1.30', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | c4775a20-7085-4694-805f-32db40dd475a | wj48ios318ay-f6zqd-master-4 | fa:16:3e:12:4f:0e | ip_address='192.168.1.209', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | f45162d8-73ae-421a-bfed-adb787d2fe9d | wj48ios318ay-f6zqd-worker-0-gb65k | fa:16:3e:26:68:c6 | ip_address='192.168.3.35', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | ACTIVE | | fa111223-cf98-4ec4-82e0-9288eaad2053 | wj48ios318ay-f6zqd-ingress-port | fa:16:3e:91:cd:8f | ip_address='192.168.0.7', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3' | DOWN |
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438