Bug 1734460 - [OSP] Floating IP specified with lbFloatingIP in install-config.yaml fails to attach to master node causing install to fail
Summary: [OSP] Floating IP specified with lbFloatingIP in install-config.yaml fails to...
Keywords:
Status: POST
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: 4.4.0
Assignee: Mike Fedosin
QA Contact: David Sanz
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-30 15:11 UTC by Jon Uriarte
Modified: 2020-01-08 12:50 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 2868 None open Bug 1734460: OpenStack: check that port exists before creating a fip 2020-02-05 13:51:41 UTC

Description Jon Uriarte 2019-07-30 15:11:26 UTC
Description of problem:

During the openshift on openstack install process a floating IP is being attached to the bootstrap node. It's failing to attach the floating IP specified in install-config.yml to the api node and another floating IP from the pool is attached to the bootstrap node.

Taken from openshift-install release-4.2 branch.

How reproducible: always

Steps to Reproduce:
1. Deploy OSP
2. Run openshift-install


Actual results:


DEBUG module.masters.openstack_compute_instance_v2.master_conf[0]: Creation complete after 54s [id=4d641422-0e3d-4db5-90b1-baf9f031f55e] 
ERROR                                              
ERROR Error: Error associating openstack_networking_floatingip_associate_v2 floating_ip e8b95280-334a-4988-bf52-f491412d6fb0 with port d887f6f0-a695-42ba-8de4-859a60d6d2f6: Resource not found 
ERROR                                              
ERROR   on ../../../../../../tmp/openshift-install-380596170/topology/private-network.tf line 84, in resource "openstack_networking_floatingip_associate_v2" "service_fip": 
ERROR   84: resource "openstack_networking_floatingip_associate_v2" "service_fip" { 
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform 

real	2m13.242s
user	0m40.423s
sys	0m3.888s

Expected results:
No error and floating IP attached to api node.

Comment 1 August Simonelli 2019-07-30 15:22:52 UTC
Further to this we see it happening like this:

We pre-allocate an FIP in openstack as usual:

openstack floating ip create --floating-ip-address IP public

We then add that FIP to lbFloatingIP in install-config.yaml

We then install and this results in the error listed above.

However, if we leave the lbFloatingIP line out of install-config.yaml we get past the error; however we need to manually assign the preallocated FIP to the API node for the install to continue.

It seems like the presence of lbFloatingIP in install-config.yaml is triggering the error but we still need the FIP assigned to allow the install to work.

Comment 2 August Simonelli 2019-07-30 15:23:59 UTC
Additionally, we also see another FIP (NOT the preallocated one) assigned to the bootstrap.

Comment 3 Tomas Sedovic 2019-07-31 15:41:16 UTC
The IP address being assigned to the bootstrap node is now expected. We've added that recently to let the installer gather logs on bootstrap failure. You can safely ignore that.

The fact that you can't use a pre-allocated FIP as `lbFloatingIP` is a genuine issue though. I'm unable to reproduce this locally, but it's something we do want to address.

Could you please paste your install-config.yaml (with the sensitive items such as pull secrets snipped out?

Comment 4 August Simonelli 2019-07-31 17:16:25 UTC
Here's my install from the non-service-vm code which is doing the same.

Failed with

time="2019-07-31T12:23:26-04:00" level=debug msg="module.bootstrap.openstack_compute_instance_v2.bootstrap: Creation complete after 3m4s [id=69821aff-c950-489f-82bc-5d83d0da78a4]"
time="2019-07-31T12:23:27-04:00" level=debug msg="module.masters.openstack_compute_instance_v2.master_conf[0]: Still creating... [3m0s elapsed]"
time="2019-07-31T12:23:29-04:00" level=debug msg="module.masters.openstack_compute_instance_v2.master_conf[0]: Creation complete after 3m2s [id=cab5d901-129c-4ef9-88ec-08413127b92d]"
time="2019-07-31T12:23:29-04:00" level=error
time="2019-07-31T12:23:29-04:00" level=error msg="Error: Error associating openstack_networking_floatingip_associate_v2 floating_ip 4a5af48e-d921-427a-9a72-a6b86a9f8873 with port f8e5f7b4-5525-4254-b936-fbb987f9cf9d: Resource not found"
time="2019-07-31T12:23:29-04:00" level=error
time="2019-07-31T12:23:29-04:00" level=error msg="  on ../../tmp/openshift-install-113850301/topology/private-network.tf line 114, in resource \"openstack_networking_floatingip_associate_v2\" \"api_fip\":"
time="2019-07-31T12:23:29-04:00" level=error msg=" 114: resource \"openstack_networking_floatingip_associate_v2\" \"api_fip\" {"
time="2019-07-31T12:23:29-04:00" level=error
time="2019-07-31T12:23:29-04:00" level=error
time="2019-07-31T12:23:29-04:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"

but i do see this:

[stack@undercloud ~]$ openstack --os-cloud openstack floating ip list
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| ID                                   | Floating IP Address | Fixed IP Address | Port                                 | Floating Network                     | Project                          |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| 4a5af48e-d921-427a-9a72-a6b86a9f8873 | 192.168.122.160     | None             | None                                 | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 0ae48cc6f6a44ee0ac0f1099d8996e9b |
| e36d8556-09ce-4c2f-b8ac-6cde47968d4f | 192.168.122.175     | 10.0.128.16      | cb54a30f-1a0b-49ec-91a9-34d4a83aad5b | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 0ae48cc6f6a44ee0ac0f1099d8996e9b |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
[stack@undercloud ~]$ openstack --os-cloud openstack server list
+--------------------------------------+---------------------------+--------+-----------------------------------------------------+-------+----------+
| ID                                   | Name                      | Status | Networks                                            | Image | Flavor   |
+--------------------------------------+---------------------------+--------+-----------------------------------------------------+-------+----------+
| 478aca2c-1fcc-45f6-accf-9c9c78638432 | ostest-pw6ng-worker-gvcwk | ACTIVE | ostest-pw6ng-openshift=10.0.128.39                  | rhcos | m1.large |
| cab5d901-129c-4ef9-88ec-08413127b92d | ostest-pw6ng-master-0     | ACTIVE | ostest-pw6ng-openshift=10.0.128.26                  | rhcos | m1.large |
| 69821aff-c950-489f-82bc-5d83d0da78a4 | ostest-pw6ng-bootstrap    | ACTIVE | ostest-pw6ng-openshift=10.0.128.16, 192.168.122.175 | rhcos | m1.large |
+--------------------------------------+---------------------------+--------+-----------------------------------------------------+-------+----------+

and my install-config.yaml is this:

apiVersion: v1
baseDomain: shiftstack.com
clusterID: 8d582648-4ed7-45ca-be5b-c5b1c435e834
compute:
- name: worker
  platform: {}
  replicas: 1
controlPlane:
  name: master
  platform: {}
  replicas: 1
metadata:
  creationTimestamp: null
  name: ostest
Networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostSubnetLength: 9
  machineCIDR: 10.0.128.0/17
  serviceCIDR: 172.30.0.0/16
  type: OpenShiftSDN
platform:
  openstack:
    cloud: openstack
    computeFlavor: m1.large
    externalNetwork: external
    lbFloatingIP: 192.168.122.160
    region: regionOne
    trunkSupport: "1"
pullSecret: 'AAAAAAAAAAAAAA'
sshKey: ssh-rsa AAAAAAAAA

Comment 5 August Simonelli 2019-07-31 17:17:50 UTC
A little info about the OSP install I'm using:

#!/bin/bash

exec openstack overcloud deploy \
  --templates /usr/share/openstack-tripleo-heat-templates \
  --timeout 90 \
  --verbose \
  -r /home/stack/templates/roles_data.yaml \
  -e /home/stack/templates/node-count.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/octavia.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml \
  -e /home/stack/templates/environments/10-ntp.yaml \
  -e /home/stack/templates/environments/11-network.yaml \
  -e /home/stack/templates/environments/20-network-environment.yaml \
  -e /home/stack/templates/environments/24-octavia-timeout.yaml \
  -e /home/stack/templates/environments/25-hostname-map.yaml \
  -e /home/stack/templates/environments/30-ips-from-pool-all.yaml \
  -e /home/stack/templates/environments/35-ceph-config.yaml \
  -e /home/stack/templates/environments/40-storage-config.yaml \
  -e /home/stack/templates/environments/50-vip.yaml \
  -e /home/stack/templates/environments/55-rsvd_host_memory.yaml \
  --log-file /home/stack/overcloud-deploy.log

I can give more info on each yaml etc if/as required.

Comment 6 August Simonelli 2019-07-31 18:02:40 UTC
And this is really odd. 
I did another deployment testing a new value for lbFloatingIP of 192.168.122.156 (i was trying assigning the fip by an admin)
I hit an error suggesting that the FIP could not be seen (which is what i expected as I set it by admin).
But that's not the odd part ... keep reading ...

So on this failed run I noticed the bootstrap got 192.168.122.175

So i deleted my cluster. However I noticed the 192.168.122.175 address was not removed from my allocation pool:

(openstack) [stack@undercloud ~]$ openstack --os-cloud openstack floating ip list
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
| ID                                   | Floating IP Address | Fixed IP Address | Port | Floating Network                     | Project                          |
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
| 4a5af48e-d921-427a-9a72-a6b86a9f8873 | 192.168.122.160     | None             | None | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 0ae48cc6f6a44ee0ac0f1099d8996e9b |
| 5b64be33-ae69-4500-b462-183e9c86993b | 192.168.122.175     | None             | None | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 0ae48cc6f6a44ee0ac0f1099d8996e9b |
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+

So I decided to use 192.168.122.175 for my lbFloatingIP value on the next run and guess what? It worked.

(openstack) [stack@undercloud ~]$ openstack --os-cloud openstack server list
+--------------------------------------+------------------------+--------+-----------------------------------------------------+-------+----------+
| ID                                   | Name                   | Status | Networks                                            | Image | Flavor   |
+--------------------------------------+------------------------+--------+-----------------------------------------------------+-------+----------+
| f6c798ed-0567-4cfd-8636-9063e8f7f02e | ostest-mnxzv-master-0  | ACTIVE | ostest-mnxzv-openshift=10.0.128.15, 192.168.122.175 | rhcos | m1.large |
| 9b974ae3-9bf8-441a-90bb-dc76a8009cca | ostest-mnxzv-bootstrap | ACTIVE | ostest-mnxzv-openshift=10.0.128.30, 192.168.122.178 | rhcos | m1.large |
+--------------------------------------+------------------------+--------+-----------------------------------------------------+-------+----------+

It was correctly assigned to the master as I'd expected but as was not working when i allocated it manually before.

I manually allocated using 

openstack floating ip create external

Comment 8 August Simonelli 2019-08-01 14:11:52 UTC
On a recent run with:

(openstack) [stack@undercloud ~]$ openstack floating ip list
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
| ID                                   | Floating IP Address | Fixed IP Address | Port | Floating Network                     | Project                          |
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
| 20f4d4af-733c-4c74-9b07-002fb00656fc | 192.168.122.151     | None             | None | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 0ae48cc6f6a44ee0ac0f1099d8996e9b |
| 58f9205c-814f-439b-a590-2bdbf9f3213e | 192.168.122.168     | None             | None | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 0ae48cc6f6a44ee0ac0f1099d8996e9b |
| 92ad0e4d-2a8b-4687-b0b2-51f1836f8202 | 192.168.122.152     | None             | None | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 0ae48cc6f6a44ee0ac0f1099d8996e9b |
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
(openstack) [stack@undercloud ~]$ openstack network list
+--------------------------------------+----------+--------------------------------------+
| ID                                   | Name     | Subnets                              |
+--------------------------------------+----------+--------------------------------------+
| 10529ced-278b-4c95-ae88-5f84a4a0e236 | external | dacf2dce-c4a4-4b14-a3a4-ebe65c06654e |
+--------------------------------------+----------+--------------------------------------+
(openstack) [stack@undercloud ~]$ openstack subnet list
+--------------------------------------+-----------------+--------------------------------------+------------------+
| ID                                   | Name            | Network                              | Subnet           |
+--------------------------------------+-----------------+--------------------------------------+------------------+
| dacf2dce-c4a4-4b14-a3a4-ebe65c06654e | external_subnet | 10529ced-278b-4c95-ae88-5f84a4a0e236 | 192.168.122.0/24 |
+--------------------------------------+-----------------+--------------------------------------+------------------+

I set   lbFloatingIP: "192.168.122.151"

and it failed with:


DEBUG module.masters.openstack_compute_instance_v2.master_conf[1]: Creation complete after 21s [id=55cc90bd-83f0-4018-acc6-1be6cfe5bf7f]
DEBUG module.masters.openstack_compute_instance_v2.master_conf[2]: Creation complete after 23s [id=cd71ec82-53e7-4fd3-98fe-c62c17fc31dc]
ERROR
ERROR Error: Error associating openstack_networking_floatingip_associate_v2 floating_ip 20f4d4af-733c-4c74-9b07-002fb00656fc with port 27706d75-4122-4ed0-ac69-f9e707a8233b: Resource not found
ERROR
ERROR   on ../../tmp/openshift-install-298254184/topology/private-network.tf line 114, in resource "openstack_networking_floatingip_associate_v2" "api_fip":
ERROR  114: resource "openstack_networking_floatingip_associate_v2" "api_fip" {
ERROR
ERROR
ERROR
ERROR Error: Error creating openstack_networking_floatingip_v2: Resource not found
ERROR
ERROR   on ../../tmp/openshift-install-298254184/topology/private-network.tf line 120, in resource "openstack_networking_floatingip_v2" "bootstrap_fip":
ERROR  120: resource "openstack_networking_floatingip_v2" "bootstrap_fip" {
ERROR
ERROR
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform

install-config.yaml (generated by create install-config using the openstack option:

apiVersion: v1
baseDomain: shiftstack.com
compute:
- hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: ostest
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  openstack:
    apiVIP: 10.0.0.5
    cloud: openstack
    computeFlavor: m1.large
    dnsVIP: 10.0.0.6
    externalNetwork: external
    ingressVIP: 10.0.0.7
    lbFloatingIP: ""
    octaviaSupport: "1"
    region: regionOne
    trunkSupport: "1"
pullSecret: 'XXXXXXX'
sshKey: |
  ssh-rsa XXXXXXX

Comment 9 August Simonelli 2019-08-01 14:26:40 UTC
terraform.tfstate shows
  
  {
      "module": "module.topology",
      "mode": "managed",
      "type": "openstack_networking_floatingip_associate_v2",
      "name": "api_fip",
      "each": "list",
      "provider": "provider.openstack",
      "instances": []
    },
    {
      "module": "module.topology",
      "mode": "managed",
      "type": "openstack_networking_floatingip_v2",
      "name": "bootstrap_fip",
      "each": "list",
      "provider": "provider.openstack",
      "instances": []
    },

Comment 10 rstarr 2019-08-20 19:15:28 UTC
I get similar error as above. Simply changing the format seems to fix this issue for me. 
This fails 
platform:
  openstack:
    cloud: openstack
    computeFlavor: ocp1.node
    externalNetwork: public
    lbFloatingIP: "192.168.247.57"
    octaviaSupport: "1"
    region: regionOne
    trunkSupport: "1"

But this works
platform:
  openstack:
    cloud:            openstack 
    externalNetwork:  public
    region:           regionOne
    computeFlavor:    ocp1.node 
    lbFloatingIP:     "192.168.247.57"
    octaviaSupport: "1"
----
Second observation is that lbFloatingIP doesn't get assigned to the master but does work, it gets assigned to the 10.0.0.5 port in the new machineCIDR network. After deploy api.cluster.domain:6443 seems to work as expected, and if you look at the new network it has a port called clusternam-api-port (10.0.0.5) mapped to lbFloatingIP. But console and *.apps.domain do not work. Adding a floating ip to 10.0.0.7 which is listed as clustername-ingress-port and updating *.apps.domain dns to point the new floating ip resolved the console and apps issue.

Comment 11 Tomas Sedovic 2019-08-26 12:25:19 UTC
Oh, that is strange. Maybe there are some issues with the YAML parser. It would also be interesting to know whether the issues is the spacing between keys and values or the ordering (both have changed).

As for the second observation, that is the current expected behaviour. The ports are managed internally by Keepalived and configuring the wildcard *.apps access is a day two operation (due to IPI preferring minimal configuration -- we have to set the API floating IP up front but the ingress one can be done afterwards).

The upstream docs have more details on both the networking configuration: https://github.com/openshift/installer/blob/master/docs/design/openstack/networking-infrastructure.md

and how to set up external access to the cluster: https://github.com/openshift/installer/tree/master/docs/user/openstack#using-floating-ips.

Comment 13 Tomas Sedovic 2019-09-05 15:36:19 UTC
Moving to 4.3. If I understand it correctly, this only happens when you write the install-config.yaml by hand (which should not be required for the majority of the use cases) and even then there is a way to work around this.

We will need more time to reproduce this and come up with some sanitisation and error reporting, but that will not likely happen in 4.3.

Comment 14 David Sanz 2019-09-11 09:31:25 UTC
API FIP is assigned to a port correctly and APPS FIP is not managed by the installer.


Note You need to log in before you can comment on or make changes to this bug.