Bug 1933414 - Machines are created with unexpected name for Ports
Summary: Machines are created with unexpected name for Ports
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Pierre Prinetti
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-27 15:09 UTC by Maysa Macedo
Modified: 2024-10-01 17:34 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:48:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4734 0 None open Bug 1933414: openstack: Consistent port names 2021-03-11 12:17:49 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:48:48 UTC

Description Maysa Macedo 2021-02-27 15:09:12 UTC
Version:

$ openshift-install version
4.5

Platform:

Openstack

Please specify:
* IPI (automated install with `openshift-install`.

What happened?

In one of the steps in the docs about replacing an unhealthy etcd members[1] a 
master Machine needs to be manually recreated. Upon recreation of this Machine Kuryr expects the Machine's port name to have the following pattern e.g. "ostest-vxgmv-master-port-1", however the Port for that Machine is created with the "name" fragment missing on the port's name "ostest-vxgmv-master-1". This breaks the Kuryr detection of the Master Ports and makes it unable to create the members for the API Load Balancer which should have the IPs of those master Ports.

[1] https://docs.openshift.com/container-platform/4.6/backup_and_restore/replacing-unhealthy-etcd-member.html#restore-replace-stopped-etcd-member_replacing-unhealthy-etcd-member

What did you expect to happen?

Master ports created with the following pattern "^%s-master-port-[0-9]" as it's handled on a regular IPI installation without day 2 operations.

#Enter text here.

How to reproduce it (as minimally and precisely as possible)?

After the cluster is UP, copy one master machine manifest:

$oc get machines ostest-vxgmv-master-0 -n openshift-machine-api -o yaml > new-master-machine.yaml

Remove the status and change the name:

apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  name: ostest-vxgmv-master-3
  namespace: openshift-machine-api
spec:
  metadata: {}
  providerSpec:
    value:
      apiVersion: openstackproviderconfig.openshift.io/v1alpha1
      cloudName: openstack
      cloudsSecret:
        name: openstack-cloud-credentials
        namespace: openshift-machine-api
      flavor: m4.xlarge
      image: ostest-vxgmv-rhcos
      kind: OpenstackProviderSpec
      metadata:
        creationTimestamp: null
      networks:
      - filter: {}
        subnets:
        - filter:
            name: ostest-vxgmv-nodes
            tags: openshiftClusterID=ostest-vxgmv
      securityGroups:
      - filter: {}
        name: ostest-vxgmv-master
      serverGroupName: ostest-vxgmv-master
      serverMetadata:
        Name: ostest-vxgmv-master
        openshiftClusterID: ostest-vxgmv
      tags:
      - openshiftClusterID=ostest-vxgmv
      trunk: true
      userDataSecret:
        name: master-user-data

Check that port name differs from other ports for Master VM:

(shiftstack) [stack@undercloud-0 ~]$ openstack port list |grep master
| 242af9fe-6147-40b3-98e8-64e05006d438 | ostest-vxgmv-master-port-1                           | fa:16:3e:37:64:f3 | ip_address='x.x.x.132', subnet_id='8bbbeee9-76e9-446c-bfde-1ebe3c2fe3e8'   | ACTIVE |
| 66876b95-7427-4750-babb-2ae806a2bc30 | ostest-vxgmv-master-port-2                           | fa:16:3e:34:58:ac | ip_address='x.x.x.74', subnet_id='8bbbeee9-76e9-446c-bfde-1ebe3c2fe3e8'    | ACTIVE |
| d441eab2-5acf-463b-8237-23552484f32c | ostest-vxgmv-master-3                                | fa:16:3e:13:5e:49 | ip_address='x.x.x.16', subnet_id='8bbbeee9-76e9-446c-bfde-1ebe3c2fe3e8'    | ACTIVE |

Anything else we need to know?

#Enter text here.

Comment 7 Pierre Prinetti 2021-03-03 16:16:21 UTC
As discussed out-of-band:

On installation, Terraform names ports "${var.cluster_id}-master-port-${count.index}" [1],

while CAPO just reuses the machine name (without "-port") [2].

With this premise, until it's clear what happens in upgrade/downgrade, the ideal Kuryr behaviour is probably to accept both naming patterns... if at all possible. And I agree that now that we know, we should probably make naming consistent anyway.

[1]: https://github.com/openshift/installer/blob/7e02fe75a583242e4cbb8c60472b105acf7a8266/data/data/openstack/topology/private-network.tf#L41

[2]: https://github.com/openshift/cluster-api-provider-openstack/blob/c4807294a92e315c766256fb2d691dc6b8a08219/pkg/cloud/openstack/clients/machineservice.go#L490-L492

Comment 8 Maysa Macedo 2021-03-08 14:56:31 UTC
Hi Pierre,

The fix for checking for both names is already on the cluster-network-operator master https://bugzilla.redhat.com/show_bug.cgi?id=1933269. From the ocp upgrade perspective I believe the nodes are not re-created and consequently we wouldn't need to worry, but of course this is something that needs double checking.

Comment 9 Pierre Prinetti 2021-03-11 12:21:30 UTC
Maysa,
I have filed a patch[1] for getting the same port names day-1 and day-2. Since the Kuryr patch is already tracked in Bug 1933269, I wanted to check with you: can we close this BZ once this Installer patch is merged?

[1]: https://github.com/openshift/installer/pull/4734

Comment 13 weiwei jiang 2021-03-18 07:53:32 UTC
Checked with 4.8.0-0.nightly-2021-03-18-000857, and it got fixed now, move to verified.

$ oc get machine -A                   
NAMESPACE               NAME                                PHASE     TYPE        REGION      ZONE   AGE                                
openshift-machine-api   wj48ios318ay-f6zqd-master-0         Running   m1.xlarge   regionOne   nova   58m
openshift-machine-api   wj48ios318ay-f6zqd-master-1         Running   m1.xlarge   regionOne   nova   58m                                
openshift-machine-api   wj48ios318ay-f6zqd-master-2         Running   m1.xlarge   regionOne   nova   58m                                
openshift-machine-api   wj48ios318ay-f6zqd-worker-0-gb65k   Running   m1.large    regionOne   nova   55m                                                                                                                                                                        
openshift-machine-api   wj48ios318ay-f6zqd-worker-0-w7k6z   Running   m1.large    regionOne   nova   55m                                                                                                                                                                        
openshift-machine-api   wj48ios318ay-f6zqd-worker-0-z658h   Running   m1.large    regionOne   nova   55m              
# openstack port list |grep -i wj48ios318ay-f6zqd-
| 2220d98b-ef0c-4c33-90a3-e2ed843e2315 | wj48ios318ay-f6zqd-worker-0-z658h  | fa:16:3e:e0:c1:be | ip_address='192.168.3.21', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| 4350e7fa-fdd2-4a74-93a0-5072e3f634ce | wj48ios318ay-f6zqd-master-2        | fa:16:3e:e1:c4:96 | ip_address='192.168.1.62', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| 50f9e1f3-7261-455b-807c-046309e38aba | wj48ios318ay-f6zqd-worker-0-w7k6z  | fa:16:3e:14:8a:39 | ip_address='192.168.0.81', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| 5e671f6f-9ca2-4a71-8195-a7088441b7c7 | wj48ios318ay-f6zqd-master-0        | fa:16:3e:11:33:8d | ip_address='192.168.0.166', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                    | ACTIVE |
| 633445db-1389-44e1-8ff7-3bed2eb68124 | wj48ios318ay-f6zqd-api-port        | fa:16:3e:f8:42:7e | ip_address='192.168.0.5', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                      | DOWN   |
| 8713b2ab-f518-429b-b59d-cbf317f5336c | wj48ios318ay-f6zqd-master-1        | fa:16:3e:98:20:33 | ip_address='192.168.1.30', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| f45162d8-73ae-421a-bfed-adb787d2fe9d | wj48ios318ay-f6zqd-worker-0-gb65k  | fa:16:3e:26:68:c6 | ip_address='192.168.3.35', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| fa111223-cf98-4ec4-82e0-9288eaad2053 | wj48ios318ay-f6zqd-ingress-port    | fa:16:3e:91:cd:8f | ip_address='192.168.0.7', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                      | DOWN   |

$ oc get machine -A    
NAMESPACE               NAME                                PHASE     TYPE        REGION      ZONE   AGE
openshift-machine-api   wj48ios318ay-f6zqd-master-0         Running   m1.xlarge   regionOne   nova   67m
openshift-machine-api   wj48ios318ay-f6zqd-master-1         Running   m1.xlarge   regionOne   nova   67m
openshift-machine-api   wj48ios318ay-f6zqd-master-2         Running   m1.xlarge   regionOne   nova   67m
openshift-machine-api   wj48ios318ay-f6zqd-master-4         Running   m1.xlarge   regionOne   nova   7m8s
openshift-machine-api   wj48ios318ay-f6zqd-worker-0-gb65k   Running   m1.large    regionOne   nova   64m
openshift-machine-api   wj48ios318ay-f6zqd-worker-0-w7k6z   Running   m1.large    regionOne   nova   64m
openshift-machine-api   wj48ios318ay-f6zqd-worker-0-z658h   Running   m1.large    regionOne   nova   64m
# openstack port list |grep -i wj48ios318ay-f6zqd-
| 2220d98b-ef0c-4c33-90a3-e2ed843e2315 | wj48ios318ay-f6zqd-worker-0-z658h  | fa:16:3e:e0:c1:be | ip_address='192.168.3.21', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| 4350e7fa-fdd2-4a74-93a0-5072e3f634ce | wj48ios318ay-f6zqd-master-2        | fa:16:3e:e1:c4:96 | ip_address='192.168.1.62', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| 50f9e1f3-7261-455b-807c-046309e38aba | wj48ios318ay-f6zqd-worker-0-w7k6z  | fa:16:3e:14:8a:39 | ip_address='192.168.0.81', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| 5e671f6f-9ca2-4a71-8195-a7088441b7c7 | wj48ios318ay-f6zqd-master-0        | fa:16:3e:11:33:8d | ip_address='192.168.0.166', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                    | ACTIVE |
| 633445db-1389-44e1-8ff7-3bed2eb68124 | wj48ios318ay-f6zqd-api-port        | fa:16:3e:f8:42:7e | ip_address='192.168.0.5', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                      | DOWN   |
| 8713b2ab-f518-429b-b59d-cbf317f5336c | wj48ios318ay-f6zqd-master-1        | fa:16:3e:98:20:33 | ip_address='192.168.1.30', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| c4775a20-7085-4694-805f-32db40dd475a | wj48ios318ay-f6zqd-master-4        | fa:16:3e:12:4f:0e | ip_address='192.168.1.209', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                    | ACTIVE |
| f45162d8-73ae-421a-bfed-adb787d2fe9d | wj48ios318ay-f6zqd-worker-0-gb65k  | fa:16:3e:26:68:c6 | ip_address='192.168.3.35', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                     | ACTIVE |
| fa111223-cf98-4ec4-82e0-9288eaad2053 | wj48ios318ay-f6zqd-ingress-port    | fa:16:3e:91:cd:8f | ip_address='192.168.0.7', subnet_id='3deeb10c-330a-42b5-8615-61d87f35f7e3'                      | DOWN   |

Comment 17 errata-xmlrpc 2021-07-27 22:48:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.