Bug 2118563

Summary: [OSP][SDN] The displayed IP Capacity is not consistent with port allowed maximum addresses
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Andreas Karis <akaris>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: akaris
Version: 4.12   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The CloudNetworkConfigController (CNCC) used a hardcoded capacity of 64 which is above OpenStack's default max_allowed_address_pairs count of 10. Consequence: Capacity was misreported and the EgressIP controller created CloudPrivateIPConfig objects beyond OpenStack's capacity. The CloudPrivateIPConfig objects would error out with "The number of allowed address pair exceeds the maximum 10" Fix: The default max_allowed_address_pair for RHOSP is 10. Unfortunately, this value cannot be retrieved via any API requests. It is solely exposed through neutron's configuration files. As a heuristic, the CNCC now assumes that all OSP environments set this to 10. As a consequence, any OSP environment that shall be used together with Red Hat OpenShift Container Platform and the EgressIP feature must have max_allowed_address_pairs set to 10 or above in neutron's configuration. Result: Port capacity is now capped at 10 minus the number of allowed_address_pairs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:54:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2022-08-16 06:59:03 UTC
Description of problem:
[OSP][SDN] The displayed IP Capacity is not consistent with port allowed maximum address

Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2022-08-10-200611

How reproducible:
Always

Steps to Reproduce:
1. Check one worker node about IP capacity, IPv4 capacity is 61
 oc describe node ostest-4jpzf-worker-0-dvkbj
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.196.3.218
                    cloud.network.openshift.io/egress-ipconfig:
                      [{"interface":"b3ae3fb8-63e3-4cd3-9d53-79022fa97c6e","ifaddr":{"ipv4":"10.196.0.0/16"},"capacity":{"ipv4":61}}]

2. Patch 11 egressIPs to one node
$ oc patch hostsubnet ostest-4jpzf-worker-0-dvkbj --type=merge -p  '{"egressIPs": ["10.196.3.1","10.196.3.2","10.196.3.3","10.196.3.4","10.196.3.5","10.196.3.6","10.196.3.7","10.196.3.8","10.196.3.9","10.196.3.10","10.196.3.11"]}'
hostsubnet.network.openshift.io/ostest-4jpzf-worker-0-dvkbj patched

3.Create 11 namespaces and patch egressIP per namespace
for i in {1..11}; do oc create -f test$i; oc patch netnamespace test$i --type=merge -p "{\"egressIPs\":[\10.196.3.$i\"]}";done

Check 
$ oc get cloudprivateipconfig
NAME          AGE
10.196.3.1    16s
10.196.3.10   14s
10.196.3.11   13s
10.196.3.2    16s
10.196.3.3    16s
10.196.3.4    15s
10.196.3.5    15s
10.196.3.6    15s
10.196.3.7    15s
10.196.3.8    14s
10.196.3.9    14s

Actual results:
 oc get cloudprivateipconfig 10.196.3.11 -o yaml
apiVersion: cloud.network.openshift.io/v1
kind: CloudPrivateIPConfig
metadata:
  creationTimestamp: "2022-08-16T06:24:37Z"
  finalizers:
  - cloudprivateipconfig.cloud.network.openshift.io/finalizer
  generation: 1
  name: 10.196.3.11
  resourceVersion: "2607765"
  uid: 32bc2e79-a197-4fda-83af-03ea3baf1fb8
spec:
  node: ostest-4jpzf-worker-0-dvkbj
status:
  conditions:
  - lastTransitionTime: "2022-08-16T06:30:33Z"
    message: 'Error processing cloud assignment request, err: could not allow IP address
      10.196.3.11 on port b3ae3fb8-63e3-4cd3-9d53-79022fa97c6e, err: "Bad request
      with: [PUT https://10.0.0.101:13696/v2.0/ports/b3ae3fb8-63e3-4cd3-9d53-79022fa97c6e],
      error message: {\"NeutronError\": {\"type\": \"AllowedAddressPairExhausted\",
      \"message\": \"The number of allowed address pair exceeds the maximum 10.\",
      \"detail\": \"\"}}". Released neutron port reservation.'
    observedGeneration: 1
    reason: CloudResponseError
    status: "False"
    type: Assigned
  node: ostest-4jpzf-worker-0-dvkbj

sh-4.4# ip a | grep 10.196.3. 
    inet 10.196.3.218/16 brd 10.196.255.255 scope global dynamic noprefixroute ens3
    inet 10.196.3.1/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.2/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.3/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.4/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.5/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.6/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.7/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.8/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.9/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.10/16 brd 10.196.255.255 scope global secondary ens3:eip
    inet 10.196.3.11/16 brd 10.196.255.255 scope global secondary ens3:eip


Check openstack port, no 10.196.3.11 attached.
| admin_state_up          | UP                                                                                                                                                                    |
| allowed_address_pairs   | ip_address='10.196.0.5', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.0.7', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.3.1', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.3.10', mac_address='fa:16:3e:5f:b5:1b'                                                                                                             |
|                         | ip_address='10.196.3.2', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.3.3', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.3.4', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.3.5', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.3.7', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
|                         | ip_address='10.196.3.9', mac_address='fa:16:3e:5f:b5:1b'                                                                                                              |
| binding_host_id         | None  

 oc logs cloud-network-config-controller-8475c54d8-j2mbq -n openshift-cloud-network-config-controller
I0816 06:24:55.359627       1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.196.3.8" will be added to node: "ostest-4jpzf-worker-0-dvkbj"
E0816 06:24:55.952890       1 controller.go:165] error syncing '10.196.3.11': error assigning CloudPrivateIPConfig: "10.196.3.11" to node: "ostest-4jpzf-worker-0-dvkbj", err: could not allow IP address 10.196.3.11 on port b3ae3fb8-63e3-4cd3-9d53-79022fa97c6e, err: "Bad request with: [PUT https://10.0.0.101:13696/v2.0/ports/b3ae3fb8-63e3-4cd3-9d53-79022fa97c6e], error message: {\"NeutronError\": {\"type\": \"AllowedAddressPairExhausted\", \"message\": \"The number of allowed address pair exceeds the maximum 10.\", \"detail\": \"\"}}". Released neutron port reservation., requeuing in cloud-private-ip-config workqueue


Create a tested pod in namespace test11,the egress traffic was blocked.
$ oc get netnamespace test11
NAME     NETID      EGRESS IPS
test11   10564249   ["10.196.3.11"]
$ oc rsh -n test11 hello-pod
/ # curl -I www.google.com --connect-timeout  2
curl: (28) Failed to connect to www.google.com port 80 after 1316 ms: Operation timed out

Expected results:
1. The nodes' IP capacity seems need to be consistent with port allowed max Num As currently we only support primary interface for EgressIP. 
2. If the IP exceeds the allowed number, the additional ones should not be added to the nodes
3.  `oc get cloudprivateipconfig` should not show extra addresses


Additional info:

Comment 2 Andreas Karis 2022-08-17 11:13:16 UTC
Hi,

a) There's an intrinsic problem with the current way how we determine capacity (only once, when a node is added) and with neutron's quotas which is a value that can be changed / updated by administrators at any point in time. That's why we currently choose the capacity based on max{subnet size; 64} and we ignore any quotas. I'll see if we can improve this.

b) The fact that  inet 10.196.3.11/16 brd 10.196.255.255 scope global secondary ens3:eip  is added to the node is a minor order of operations problem with OpenShiftSDN

- Andreas

Comment 3 Andreas Karis 2022-08-17 13:13:45 UTC
Can you test with openshift/cloud-network-config-controller/pull/53  ; that should now report the correct capacity starting from 10 and - the allowed_address_pairs

Comment 10 errata-xmlrpc 2023-01-17 19:54:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399