Bug 1997704

Summary: [osp][octavia lb] given loadBalancerIP is ignored when creating a LoadBalancer type svc
Product: OpenShift Container Platform Reporter: Jon Uriarte <juriarte>
Component: Cloud ComputeAssignee: Maysa Macedo <mdemaced>
Cloud Compute sub component: OpenStack Provider QA Contact: Jon Uriarte <juriarte>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: m.andre, mbridges, mdemaced, mdulko, mfedosin, pprinett, stephenfin
Version: 4.9Keywords: Triaged
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:37:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jon Uriarte 2021-08-25 16:22:53 UTC
Description of problem:

When a svc is created with a given (pre-created) fip, it's being ignored so it's not being assigned to the LB and the service is not reachable externally.

i.e:
$ oc expose deployment test1-dep --name test1-svc --type=LoadBalancer --port 80 --target-port=8080 --load-balancer-ip 10.0.0.254

By setting the --load-balancer-ip parameter the service LB should be assigned the floating ip [1], but it's being ignored.

Version-Release number of selected component (if applicable):
OCP 4.9.0-0.nightly-2021-08-25-010624
OSP 16.1.6 GA (RHOS-16.1-RHEL-8-20210604.n.0)

How reproducible: always


Steps to Reproduce:
1. Install OCP 4.9 on OSP

2. Create a floating ip in OSP for later assignment
openstack floating ip create <external network>

3. Create a new project and a deployment in OCP
oc new-project test
oc create deployment test1-dep --image=quay.io/kuryr/demo

4. Create a service setting the desired LB IP address
oc expose deployment test1-dep --name test1-svc --type=LoadBalancer --port 80 --target-port=8080 --load-balancer-ip <precreated fip>

5. Check a LB is created in OSP (wait until it's in ACTIVE status):
openstack loadbalancer list
+--------------------------------------+----------------------------------+----------------------------------+--------------+---------------------+----------+
| id                                   | name                             | project_id                       | vip_address  | provisioning_status | provider |
+--------------------------------------+----------------------------------+----------------------------------+--------------+---------------------+----------+
| a2a0f053-7539-4758-bcdf-a33773e8fbd6 | a0f27ca88bdee4028ac6e288c76efec1 | c0316b3530e64b909f9451a857b404d0 | 10.196.2.216 | ACTIVE              | amphora  |
+--------------------------------------+----------------------------------+----------------------------------+--------------+---------------------+----------+

6. Check the svc in OCP:

oc get svc
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
test1-svc   LoadBalancer   172.30.172.19   <pending>     80:31624/TCP   39m

The external-IP is pending (and the fip should be assigned)

oc get svc -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2021-08-25T15:35:15Z"
    finalizers:
    - service.kubernetes.io/load-balancer-cleanup
    labels:
      app: test1-dep
    name: test1-svc
    namespace: test
    resourceVersion: "192336"
    uid: 0f27ca88-bdee-4028-ac6e-288c76efec1c
  spec:
    allocateLoadBalancerNodePorts: true
    clusterIP: 172.30.172.19
    clusterIPs:
    - 172.30.172.19
    externalTrafficPolicy: Cluster
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    loadBalancerIP: <precreated fip>   <<<<<-------
    ports:
    - nodePort: 31624
      port: 80
      protocol: TCP
      targetPort: 8080
    selector:
      app: test1-dep
    sessionAffinity: None
    type: LoadBalancer
  status:
    loadBalancer: {}                  <<<<<-------
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Actual results:
The loadBalancerIP has been added to the spec section but not in the status section, as it's empty:
  ...
  status:
    loadBalancer: {}
  ...


Expected results:

Fip added to the status section
  ...
  status:
    loadBalancer:
      ingress:
      - ip: <precreated fip>
  ...

Fip assigned in the external-IP
oc get svc
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
test1-svc   LoadBalancer   172.30.172.19   <precreated fip>  80:31624/TCP   39m

Additional info:

It can be tested by creating the next resource directly:
$ cat svc_resource.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    run: demo
  name: demo
  namespace: test
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    run: demo
  type: LoadBalancer
  loadBalancerIP: <precreated fip>


[1] https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer

Comment 2 egarcia 2021-08-25 18:05:17 UTC
Just to clarify, the load balancer is not just not being updated, but also is not being created correctly, right?

Comment 3 Michał Dulko 2021-08-26 07:15:43 UTC
Why would you want to pre-create a FIP in OpenStack? What's the use case there? I'm almost sure the problem is that cloud provider attempts to create the FIP with the given address and fails because the address is already taken (we can confirm that by looking at events in [1]). This might totally be considered an expected behavior - external networks might be shared and you don't want to mess with stuff created by others. Also cloud provider doesn't do FIP tagging and we're supposed to support running multiple clusters on a single OpenStack project. Implementing the behavior you described we could end up with clusters "stealing" FIPs from one another.

[1] `oc -n test describe svc demo`

Comment 4 Jon Uriarte 2021-08-26 07:21:00 UTC
(In reply to egarcia from comment #2)
> Just to clarify, the load balancer is not just not being updated, but also
> is not being created correctly, right?

The load balancer is being created correctly but the svc is not being assigned the external IP (floating ip).

Comment 5 Jon Uriarte 2021-08-26 07:35:23 UTC
(In reply to Michał Dulko from comment #3)
> Why would you want to pre-create a FIP in OpenStack? What's the use case
> there? I'm almost sure the problem is that cloud provider attempts to create
> the FIP with the given address and fails because the address is already
> taken (we can confirm that by looking at events in [1]). This might totally
> be considered an expected behavior - external networks might be shared and
> you don't want to mess with stuff created by others. Also cloud provider
> doesn't do FIP tagging and we're supposed to support running multiple
> clusters on a single OpenStack project. Implementing the behavior you
> described we could end up with clusters "stealing" FIPs from one another.
> 
> [1] `oc -n test describe svc demo`

That was a use case supported in Kuryr if I'm not wrong, please see the related BZs [1] and [2].
I don't have a cluster with Kuryr atm but that's how it used to work. As it depends on the
cloud provider [3] implementation I guess it's up to us to decide if we want to support it.

The events in the svc confirms your thoughts about the issue:
Events:
  Type     Reason                  Age                    From                Message
  ----     ------                  ----                   ----                -------
  Warning  SyncLoadBalancerFailed  3h3m (x149 over 15h)   service-controller  Error syncing load balancer: failed to ensure load balancer: error creating LB floatingip {Description: FloatingNetworkID:4634cf2c-056f-4dee-98de-6b4e68b7af5b FloatingIP:x.x.x.x PortID:2ebaae02-43d0-4adc-bca1-9fc32b52df99 FixedIP: SubnetID: TenantID: ProjectID:}: Request forbidden: [POST https://10.0.0.101:13696/v2.0/floatingips], error message: {"NeutronError": {"type": "PolicyNotAuthorized", "message": "(rule:create_floatingip and rule:create_floatingip:floating_ip_address) is disallowed by policy", "detail": ""}}                                                                                                               
  Normal   EnsuringLoadBalancer    2m44s (x185 over 15h)  service-controller  Ensuring load balancer



[1] https://bugzilla.redhat.com/show_bug.cgi?id=1875352
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1503963#c21
[3] https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer

Comment 6 Michał Dulko 2021-08-26 08:11:43 UTC
Hm, I see. I'm not 100% this should be a behavior considered supported by Kuryr, it's subject to the risks I mentioned. I'd rather expect Kuryr to create the FIP instead.

But let's focus on the cloud provider here - the error listed in `oc describe svc` is not backing my suspicion, it's a policy error. Apparently by default OSP policy does not allow non-admin users to create FIPs providing an IP. Can you try it without pre-creating the FIP? I think we'll see the same and that I'd consider to be a bug we need to document.

It's also worth trying to change the policy to allow it and retry the test with pre-created FIP.

Comment 7 Jon Uriarte 2021-08-26 10:32:30 UTC
I've tried setting a fip which I haven't created previously:

oc expose deployment test1-dep --name test1-svc --type=LoadBalancer --port 80 --target-port=8080 --load-balancer-ip 10.0.0.111

and it shows the same policy error:

Events:        
  Type     Reason                  Age                 From                Message
  ----     ------                  ----                ----                -------
  Normal   EnsuringLoadBalancer    60s (x10 over 23m)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  59s (x10 over 21m)  service-controller  Error syncing load balancer: failed to ensure load balancer: error creating LB floatingip {Description: FloatingNetworkID:4634cf2c-056f-4dee-98de-6b4e68b7af5b Floa
tingIP:10.0.0.111 PortID:868389a8-92a4-4760-83cd-dd5e26171b74 FixedIP: SubnetID: TenantID: ProjectID:}: Request forbidden: [POST https://10.0.0.101:13696/v2.0/floatingips], error message: {"NeutronError": {"type": "PolicyNotAuthorized", "
message": "(rule:create_floatingip and rule:create_floatingip:floating_ip_address) is disallowed by policy", "detail": ""}}

So it's failing in both cases (with 1. a pre-created fip and 2. without a pre-created fip).
It seems like the neutron policy doesn't allow to create a fip with a specific IP address (which on the other hand wouldn't be necessary for case 1. as it's already created).

I've tried it from CLI as well (as non-admin user), and got the same policy error:
$ openstack floating ip create --port 868389a8-92a4-4760-83cd-dd5e26171b74 --floating-ip-address 10.0.0.111 nova                                                                                          
Error while executing command: HttpException: 403, (rule:create_floatingip and rule:create_floatingip:floating_ip_address) is disallowed by policy 

It works though as admin user.

I can create fips as non-admin user if I don't specify any IP value:
$ openstack floating ip create nova

So the policy is limited only when you want to create a fip with a specific IP address.
From openstack neutron docs [1]:

create_floatingip
    Default
        (role:admin and system_scope:all) or (role:member and project_id:%(project_id)s)
    Operations
            POST /floatingips
    Scope Types
            system
            project
    Create a floating IP

create_floatingip:floating_ip_address
    Default
        role:admin and system_scope:all
    Operations
            POST /floatingips
    Scope Types
            system
            project
    Create a floating IP with a specific IP address

The second case is only allowed to admin user.

I'm not able to find how to change the policy and allow it for non-admin users.

[1] https://docs.openstack.org/neutron/latest/configuration/policy.html

Comment 8 Michał Dulko 2021-08-27 09:25:51 UTC
Alright, so that's clearly a bug that'd require at least docs update.

Comment 9 egarcia 2021-08-27 18:17:12 UTC
+1 I think the best way forward is to document it as a known issue.

Comment 10 ShiftStack Bugwatcher 2021-11-25 16:12:12 UTC
Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 23 errata-xmlrpc 2022-08-10 10:37:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069