Bug 1897526 - [Kuryr] Available port count not correctly calculated for alerts
Summary: [Kuryr] Available port count not correctly calculated for alerts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.6.z
Assignee: Maysa Macedo
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On: 1890630
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-13 11:02 UTC by OpenShift BugZilla Robot
Modified: 2021-04-27 14:21 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-27 14:20:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 907 0 None open [release-4.6] Bug 1897526: Ensure LimitedPortsOnNetwork is only triggered when needed 2021-02-19 11:56:05 UTC
Github openshift kuryr-kubernetes pull 427 0 None open Bug 1897526: Fix ports count on subnet 2021-02-19 11:56:04 UTC
Red Hat Product Errata RHBA-2021:1232 0 None None None 2021-04-27 14:21:11 UTC

Description OpenShift BugZilla Robot 2020-11-13 11:02:08 UTC
+++ This bug was initially created as a clone of Bug #1890630 +++

Description of problem:

Kuryr alerts LimitedPortsOnNetwork and InsuficientPortsOnNetwork are triggered based on the free port count in a given namespace subnet.

LimitedPortsOnNetwork is triggered when there are less than 11 available ports (it should be when there are less than 10).
InsuficientPortsOnNetwork is triggered when there is 1 available port (it should be when there are no available ports).

Version-Release number of selected component (if applicable):
OCP 4.6.0-0.nightly-2020-10-20-101225
OSP13 2020-10-06.2


How reproducible: always


Steps to Reproduce:
1. Create new project (will create a /23 subnet -> room for 510 ports)
   $ oc new-project test

2. Create pods until ~490 ports are created
   $ for i in `seq 1 490`; do oc run --image kuryr/demo demo-$i; sleep 4; done
   
3. Create ports manually in the subnet until there are 11 available ports (499 ports in use)
   $ openstack port create --network <network-id> <port-name>

The alert LimitedPortsOnNetwork is raised, while it should be raised when there are 10 available ports.

4. Create ports manually in the subnet until there is 1 available port (509 ports in use)
   $ openstack port create --network <network-id> <port-name>

The alert InsuficientPortsOnNetwork is raised, while it should be raised when there are 0 available ports.


Additional info:
    def _record_ports_quota_per_subnet_metric(self):
        """Records the ports quota per subnet to the registry"""
        subnets = self._os_net.subnets(project_id=self._project_id)
        namespace_prefix = 'ns/'
        for subnet in subnets:
            if namespace_prefix not in subnet.name:
                continue
            total_num_addresses = 0
            ports_availability = 0
            for allocation in subnet.allocation_pools:
                total_num_addresses += netaddr.IPRange(
                    netaddr.IPAddress(allocation['start']),
                    netaddr.IPAddress(allocation['end'])).size
                ports_count = len(list(self._os_net.ports(
                    network_id=subnet.network_id,
                    project_id=self._project_id)))
            labels = {'subnet_id': subnet.id, 'subnet_name': subnet.name}
            ports_availability = total_num_addresses-ports_count
self.port_quota_per_subnet.labels(**labels).set(ports_availability)

The total_num_addresses is calculated base on the allocation pool:
allocation_pools  | 10.128.116.2-10.128.117.254
which doesn't contain the allocation for the .1 (so it's 509)

The ports_count instead (ports in use) counts the .1 (so it can increase up to 510).

The calculation could be fixed by removing 1 port from ports_count (the one belonging to .1), or adding 1 port to total_num_addresses.

--- Additional comment from juriarte on 2020-10-23 09:43:40 UTC ---

Adding an easier reproducer:

1. Create new project (will create a /23 subnet -> room for 510 ports)
   $ oc new-project test
It will create the port for .1

2. Create 499 ports
$ for i in `seq 2 500`; do openstack port create --network <network_id> port--$i; sleep 3; done

This will create 499 ports and the alert LimitedPortsOnNetwork will be raised, but there are still 10 available ports.


3. Create 9 more ports
$ for i in `seq 501 509`; do openstack port create --network <network_id> port--$i; sleep 3; done

This will create 9 additional ports and the alert InsuficientPortsOnNetwork will be raised, but there is still 1 available port.


4. Create an additional port to check there is still one available port
$ openstack port create --network <network_id> port--510

InsuficientPortsOnNetwork alert has been cleared due to the available ports is now -1 (different to 0)

5. Try creating an additional port to check it's not possible
$ openstack port create --network <network_id> port--511
HttpException: 409: Client Error for url: https://10.46.44.10:13696/v2.0/ports, {"NeutronError": {"message": "No more IP addresses available on network <network_id>.", "type": "IpAddressGenerationFailure", "detail": ""}}

Comment 3 Jon Uriarte 2021-04-21 17:26:32 UTC
Verified in OCP 4.6.0-0.nightly-2021-04-17-182039 on top of OSP 13.0.16 (2021-04-09.1).

Verification steps:

For checking the alerts in prometheus, make sure you have the next entry in the /etc/hosts:
<APPS_FIP> prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com

How to check the alerts from CLI:
$ token=`oc sa get-token prometheus-k8s -n openshift-monitoring`

List all the alerts:
$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname'

Get a specific alert (i.e. LimitedPortsOnNetwork):
$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname == "LimitedPortsOnNetwork")'


## Create new project (will create a /23 subnet -> room for 510 ports)
$ oc new-project test

## Create 499 ports in the ns/test network (so there are 500 in use in total)
$ openstack subnet list | grep test
| 29ea4014-b75d-435c-8be3-0e1d3a6b5e12 | ns/test-subnet                                             | 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 | 10.128.116.0/23 |

$ openstack port list | grep 10.128.116
| 44276942-d449-47b5-856f-4bbbdfeb2f47 |                                                      | fa:16:3e:33:fa:43 | ip_address='10.128.116.1', subnet_id='29ea4014-b75d-435c-8be3-0e1d3a6b5e12'   | ACTIVE |

$ for i in `seq 2 500`; do openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--$i; sleep 3; done

## Check LimitedPortsOnNetwork alarm is not raised, as there are still 10 available ports
$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
$

## Create one port more, so there are 9 available ports

$ openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--501

## Check LimitedPortsOnNetwork alarm is raised, as there are less than 10 available ports (there are 9)
$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
"LimitedPortsOnNetwork"

## Create 8 more ports, so there will be 1 available por only
$ for i in `seq 502 509`; do openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--$i; sleep 3; done

## Check InsuficientPortsOnNetwork alarm is not raised, as there is still 1 available port
$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
"LimitedPortsOnNetwork"

## Create one port more, so there are 0 available ports
$ openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--510

## Check InsuficientPortsOnNetwork alarm is raised, as there are no available port
$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
"InsuficientPortsOnNetwork"

Removing the ports will clear the alarms as expected.

Comment 5 errata-xmlrpc 2021-04-27 14:20:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.26 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1232


Note You need to log in before you can comment on or make changes to this bug.