Bug 1997704 - [osp][octavia lb] given loadBalancerIP is ignored when creating a LoadBalancer type svc
Summary: [osp][octavia lb] given loadBalancerIP is ignored when creating a LoadBalance...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Maysa Macedo
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-25 16:22 UTC by Jon Uriarte
Modified: 2022-08-10 10:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:37:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 6033 0 None open Bug 1997704: [OpenStack] Document in-tree limitation for external LBs 2022-06-21 10:37:01 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:37:45 UTC

Description Jon Uriarte 2021-08-25 16:22:53 UTC
Description of problem:

When a svc is created with a given (pre-created) fip, it's being ignored so it's not being assigned to the LB and the service is not reachable externally.

i.e:
$ oc expose deployment test1-dep --name test1-svc --type=LoadBalancer --port 80 --target-port=8080 --load-balancer-ip 10.0.0.254

By setting the --load-balancer-ip parameter the service LB should be assigned the floating ip [1], but it's being ignored.

Version-Release number of selected component (if applicable):
OCP 4.9.0-0.nightly-2021-08-25-010624
OSP 16.1.6 GA (RHOS-16.1-RHEL-8-20210604.n.0)

How reproducible: always


Steps to Reproduce:
1. Install OCP 4.9 on OSP

2. Create a floating ip in OSP for later assignment
openstack floating ip create <external network>

3. Create a new project and a deployment in OCP
oc new-project test
oc create deployment test1-dep --image=quay.io/kuryr/demo

4. Create a service setting the desired LB IP address
oc expose deployment test1-dep --name test1-svc --type=LoadBalancer --port 80 --target-port=8080 --load-balancer-ip <precreated fip>

5. Check a LB is created in OSP (wait until it's in ACTIVE status):
openstack loadbalancer list
+--------------------------------------+----------------------------------+----------------------------------+--------------+---------------------+----------+
| id                                   | name                             | project_id                       | vip_address  | provisioning_status | provider |
+--------------------------------------+----------------------------------+----------------------------------+--------------+---------------------+----------+
| a2a0f053-7539-4758-bcdf-a33773e8fbd6 | a0f27ca88bdee4028ac6e288c76efec1 | c0316b3530e64b909f9451a857b404d0 | 10.196.2.216 | ACTIVE              | amphora  |
+--------------------------------------+----------------------------------+----------------------------------+--------------+---------------------+----------+

6. Check the svc in OCP:

oc get svc
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
test1-svc   LoadBalancer   172.30.172.19   <pending>     80:31624/TCP   39m

The external-IP is pending (and the fip should be assigned)

oc get svc -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2021-08-25T15:35:15Z"
    finalizers:
    - service.kubernetes.io/load-balancer-cleanup
    labels:
      app: test1-dep
    name: test1-svc
    namespace: test
    resourceVersion: "192336"
    uid: 0f27ca88-bdee-4028-ac6e-288c76efec1c
  spec:
    allocateLoadBalancerNodePorts: true
    clusterIP: 172.30.172.19
    clusterIPs:
    - 172.30.172.19
    externalTrafficPolicy: Cluster
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    loadBalancerIP: <precreated fip>   <<<<<-------
    ports:
    - nodePort: 31624
      port: 80
      protocol: TCP
      targetPort: 8080
    selector:
      app: test1-dep
    sessionAffinity: None
    type: LoadBalancer
  status:
    loadBalancer: {}                  <<<<<-------
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Actual results:
The loadBalancerIP has been added to the spec section but not in the status section, as it's empty:
  ...
  status:
    loadBalancer: {}
  ...


Expected results:

Fip added to the status section
  ...
  status:
    loadBalancer:
      ingress:
      - ip: <precreated fip>
  ...

Fip assigned in the external-IP
oc get svc
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
test1-svc   LoadBalancer   172.30.172.19   <precreated fip>  80:31624/TCP   39m

Additional info:

It can be tested by creating the next resource directly:
$ cat svc_resource.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    run: demo
  name: demo
  namespace: test
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    run: demo
  type: LoadBalancer
  loadBalancerIP: <precreated fip>


[1] https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer

Comment 2 egarcia 2021-08-25 18:05:17 UTC
Just to clarify, the load balancer is not just not being updated, but also is not being created correctly, right?

Comment 3 Michał Dulko 2021-08-26 07:15:43 UTC
Why would you want to pre-create a FIP in OpenStack? What's the use case there? I'm almost sure the problem is that cloud provider attempts to create the FIP with the given address and fails because the address is already taken (we can confirm that by looking at events in [1]). This might totally be considered an expected behavior - external networks might be shared and you don't want to mess with stuff created by others. Also cloud provider doesn't do FIP tagging and we're supposed to support running multiple clusters on a single OpenStack project. Implementing the behavior you described we could end up with clusters "stealing" FIPs from one another.

[1] `oc -n test describe svc demo`

Comment 4 Jon Uriarte 2021-08-26 07:21:00 UTC
(In reply to egarcia from comment #2)
> Just to clarify, the load balancer is not just not being updated, but also
> is not being created correctly, right?

The load balancer is being created correctly but the svc is not being assigned the external IP (floating ip).

Comment 5 Jon Uriarte 2021-08-26 07:35:23 UTC
(In reply to Michał Dulko from comment #3)
> Why would you want to pre-create a FIP in OpenStack? What's the use case
> there? I'm almost sure the problem is that cloud provider attempts to create
> the FIP with the given address and fails because the address is already
> taken (we can confirm that by looking at events in [1]). This might totally
> be considered an expected behavior - external networks might be shared and
> you don't want to mess with stuff created by others. Also cloud provider
> doesn't do FIP tagging and we're supposed to support running multiple
> clusters on a single OpenStack project. Implementing the behavior you
> described we could end up with clusters "stealing" FIPs from one another.
> 
> [1] `oc -n test describe svc demo`

That was a use case supported in Kuryr if I'm not wrong, please see the related BZs [1] and [2].
I don't have a cluster with Kuryr atm but that's how it used to work. As it depends on the
cloud provider [3] implementation I guess it's up to us to decide if we want to support it.

The events in the svc confirms your thoughts about the issue:
Events:
  Type     Reason                  Age                    From                Message
  ----     ------                  ----                   ----                -------
  Warning  SyncLoadBalancerFailed  3h3m (x149 over 15h)   service-controller  Error syncing load balancer: failed to ensure load balancer: error creating LB floatingip {Description: FloatingNetworkID:4634cf2c-056f-4dee-98de-6b4e68b7af5b FloatingIP:x.x.x.x PortID:2ebaae02-43d0-4adc-bca1-9fc32b52df99 FixedIP: SubnetID: TenantID: ProjectID:}: Request forbidden: [POST https://10.0.0.101:13696/v2.0/floatingips], error message: {"NeutronError": {"type": "PolicyNotAuthorized", "message": "(rule:create_floatingip and rule:create_floatingip:floating_ip_address) is disallowed by policy", "detail": ""}}                                                                                                               
  Normal   EnsuringLoadBalancer    2m44s (x185 over 15h)  service-controller  Ensuring load balancer



[1] https://bugzilla.redhat.com/show_bug.cgi?id=1875352
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1503963#c21
[3] https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer

Comment 6 Michał Dulko 2021-08-26 08:11:43 UTC
Hm, I see. I'm not 100% this should be a behavior considered supported by Kuryr, it's subject to the risks I mentioned. I'd rather expect Kuryr to create the FIP instead.

But let's focus on the cloud provider here - the error listed in `oc describe svc` is not backing my suspicion, it's a policy error. Apparently by default OSP policy does not allow non-admin users to create FIPs providing an IP. Can you try it without pre-creating the FIP? I think we'll see the same and that I'd consider to be a bug we need to document.

It's also worth trying to change the policy to allow it and retry the test with pre-created FIP.

Comment 7 Jon Uriarte 2021-08-26 10:32:30 UTC
I've tried setting a fip which I haven't created previously:

oc expose deployment test1-dep --name test1-svc --type=LoadBalancer --port 80 --target-port=8080 --load-balancer-ip 10.0.0.111

and it shows the same policy error:

Events:        
  Type     Reason                  Age                 From                Message
  ----     ------                  ----                ----                -------
  Normal   EnsuringLoadBalancer    60s (x10 over 23m)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  59s (x10 over 21m)  service-controller  Error syncing load balancer: failed to ensure load balancer: error creating LB floatingip {Description: FloatingNetworkID:4634cf2c-056f-4dee-98de-6b4e68b7af5b Floa
tingIP:10.0.0.111 PortID:868389a8-92a4-4760-83cd-dd5e26171b74 FixedIP: SubnetID: TenantID: ProjectID:}: Request forbidden: [POST https://10.0.0.101:13696/v2.0/floatingips], error message: {"NeutronError": {"type": "PolicyNotAuthorized", "
message": "(rule:create_floatingip and rule:create_floatingip:floating_ip_address) is disallowed by policy", "detail": ""}}

So it's failing in both cases (with 1. a pre-created fip and 2. without a pre-created fip).
It seems like the neutron policy doesn't allow to create a fip with a specific IP address (which on the other hand wouldn't be necessary for case 1. as it's already created).

I've tried it from CLI as well (as non-admin user), and got the same policy error:
$ openstack floating ip create --port 868389a8-92a4-4760-83cd-dd5e26171b74 --floating-ip-address 10.0.0.111 nova                                                                                          
Error while executing command: HttpException: 403, (rule:create_floatingip and rule:create_floatingip:floating_ip_address) is disallowed by policy 

It works though as admin user.

I can create fips as non-admin user if I don't specify any IP value:
$ openstack floating ip create nova

So the policy is limited only when you want to create a fip with a specific IP address.
From openstack neutron docs [1]:

create_floatingip
    Default
        (role:admin and system_scope:all) or (role:member and project_id:%(project_id)s)
    Operations
            POST /floatingips
    Scope Types
            system
            project
    Create a floating IP

create_floatingip:floating_ip_address
    Default
        role:admin and system_scope:all
    Operations
            POST /floatingips
    Scope Types
            system
            project
    Create a floating IP with a specific IP address

The second case is only allowed to admin user.

I'm not able to find how to change the policy and allow it for non-admin users.

[1] https://docs.openstack.org/neutron/latest/configuration/policy.html

Comment 8 Michał Dulko 2021-08-27 09:25:51 UTC
Alright, so that's clearly a bug that'd require at least docs update.

Comment 9 egarcia 2021-08-27 18:17:12 UTC
+1 I think the best way forward is to document it as a known issue.

Comment 10 ShiftStack Bugwatcher 2021-11-25 16:12:12 UTC
Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 23 errata-xmlrpc 2022-08-10 10:37:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.