Bug 1890630 - [Kuryr] Available port count not correctly calculated for alerts
Summary: [Kuryr] Available port count not correctly calculated for alerts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.7.0
Assignee: Maysa Macedo
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks: 1897526
TreeView+ depends on / blocked
 
Reported: 2020-10-22 16:02 UTC by Jon Uriarte
Modified: 2021-02-24 15:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:27:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 895 0 None closed Bug 1890630: Ensure LimitedPortsOnNetwork is only triggered when needed 2021-01-08 17:55:24 UTC
Github openshift kuryr-kubernetes pull 403 0 None closed Bug 1890630: Fix calculation of Ports availability in Subnet 2021-01-08 17:55:22 UTC
Github openshift kuryr-kubernetes pull 421 0 None closed Bug 1890630: Ensure Ports in use per Subnet calculation is correct 2021-01-08 17:55:22 UTC
Github openshift kuryr-kubernetes pull 422 0 None closed Revert "Bug 1890630: Ensure Ports in use per Subnet calculation is correct" 2021-01-08 17:56:02 UTC
Github openshift kuryr-kubernetes pull 423 0 None closed Bug 1890630: Fix alert value for ports available on Subnet 2021-01-08 17:56:02 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:28:13 UTC

Description Jon Uriarte 2020-10-22 16:02:07 UTC
Description of problem:

Kuryr alerts LimitedPortsOnNetwork and InsuficientPortsOnNetwork are triggered based on the free port count in a given namespace subnet.

LimitedPortsOnNetwork is triggered when there are less than 11 available ports (it should be when there are less than 10).
InsuficientPortsOnNetwork is triggered when there is 1 available port (it should be when there are no available ports).

Version-Release number of selected component (if applicable):
OCP 4.6.0-0.nightly-2020-10-20-101225
OSP13 2020-10-06.2


How reproducible: always


Steps to Reproduce:
1. Create new project (will create a /23 subnet -> room for 510 ports)
   $ oc new-project test

2. Create pods until ~490 ports are created
   $ for i in `seq 1 490`; do oc run --image kuryr/demo demo-$i; sleep 4; done
   
3. Create ports manually in the subnet until there are 11 available ports (499 ports in use)
   $ openstack port create --network <network-id> <port-name>

The alert LimitedPortsOnNetwork is raised, while it should be raised when there are 10 available ports.

4. Create ports manually in the subnet until there is 1 available port (509 ports in use)
   $ openstack port create --network <network-id> <port-name>

The alert InsuficientPortsOnNetwork is raised, while it should be raised when there are 0 available ports.


Additional info:
    def _record_ports_quota_per_subnet_metric(self):
        """Records the ports quota per subnet to the registry"""
        subnets = self._os_net.subnets(project_id=self._project_id)
        namespace_prefix = 'ns/'
        for subnet in subnets:
            if namespace_prefix not in subnet.name:
                continue
            total_num_addresses = 0
            ports_availability = 0
            for allocation in subnet.allocation_pools:
                total_num_addresses += netaddr.IPRange(
                    netaddr.IPAddress(allocation['start']),
                    netaddr.IPAddress(allocation['end'])).size
                ports_count = len(list(self._os_net.ports(
                    network_id=subnet.network_id,
                    project_id=self._project_id)))
            labels = {'subnet_id': subnet.id, 'subnet_name': subnet.name}
            ports_availability = total_num_addresses-ports_count
self.port_quota_per_subnet.labels(**labels).set(ports_availability)

The total_num_addresses is calculated base on the allocation pool:
allocation_pools  | 10.128.116.2-10.128.117.254
which doesn't contain the allocation for the .1 (so it's 509)

The ports_count instead (ports in use) counts the .1 (so it can increase up to 510).

The calculation could be fixed by removing 1 port from ports_count (the one belonging to .1), or adding 1 port to total_num_addresses.

Comment 1 Jon Uriarte 2020-10-23 09:43:40 UTC
Adding an easier reproducer:

1. Create new project (will create a /23 subnet -> room for 510 ports)
   $ oc new-project test
It will create the port for .1

2. Create 499 ports
$ for i in `seq 2 500`; do openstack port create --network <network_id> port--$i; sleep 3; done

This will create 499 ports and the alert LimitedPortsOnNetwork will be raised, but there are still 10 available ports.


3. Create 9 more ports
$ for i in `seq 501 509`; do openstack port create --network <network_id> port--$i; sleep 3; done

This will create 9 additional ports and the alert InsuficientPortsOnNetwork will be raised, but there is still 1 available port.


4. Create an additional port to check there is still one available port
$ openstack port create --network <network_id> port--510

InsuficientPortsOnNetwork alert has been cleared due to the available ports is now -1 (different to 0)

5. Try creating an additional port to check it's not possible
$ openstack port create --network <network_id> port--511
HttpException: 409: Client Error for url: https://10.46.44.10:13696/v2.0/ports, {"NeutronError": {"message": "No more IP addresses available on network <network_id>.", "type": "IpAddressGenerationFailure", "detail": ""}}

Comment 3 rlobillo 2020-11-20 19:23:56 UTC
Failed on OCP4.7.0-0.nightly-2020-11-18-203317 over OSP16.1 with OVN-Octavia (RHOS-16.1-RHEL-8-20201110.n.1)

InsuficientPortsOnNetwork is raised when 1 port is available and it is cleared when 0 ports are available.

# Creating 499 ports on the ns/test2-subnet:

	$ oc new-project test2
	$ for i in `seq 2 499`; do openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--$i; sleep 3; done

	$ openstack subnet list | grep test2
	| eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | ns/test2-subnet                                            | e0d9964f-b3d5-464d-81ac-b4bc49fcb75b | 10.128.120.0/23 |
	$ openstack port list -f value | grep eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | wc -l
	499

# No alarms raised:
	$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | 
	select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog

# Alarm LimitedPortsOnNetwork raised when there are 10 available ports:

	$ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--500

	$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | 
	select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
	"LimitedPortsOnNetwork"

# Creating 9 ports so 1 port available only. Alarm InsuficientPortsOnNetwork is wrongly raised.

	$ for i in `seq 501 509`; do openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--$i; sleep 3; done

	 curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
	"LimitedPortsOnNetwork"
	"InsuficientPortsOnNetwork"

# When creating the remaining port (0 ports available), the InsuficientPortsOnNetwork is cleared.

	$ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--510

	$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
	"LimitedPortsOnNetwork"

	$ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--511
	ConflictException: 409: Client Error for url: https://overcloud.redhat.local:13696/v2.0/ports, No more IP addresses available on network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b.

Comment 5 rlobillo 2020-12-10 15:17:22 UTC
Verified on 4.7.0-0.nightly-2020-12-09-112139 on top of OSP16.1 with OVN-Octavia (RHOS-16.1-RHEL-8-20201124.n.0).

# Creating 499 ports on the ns/test2-subnet:

	$ oc new-project test2
	$ for i in `seq 2 500`; do openstack port create --network 9bf10b51-1014-4629-b55b-cc39b43c1544 port--$i; sleep 3; done

	$ openstack subnet list | grep test2
	| eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | ns/test2-subnet                                            | e0d9964f-b3d5-464d-81ac-b4bc49fcb75b | 10.128.120.0/23 |
	$ openstack port list -f value | grep eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | wc -l
	500

# No alarms raised:
	$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | 
	select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog

# Alarm LimitedPortsOnNetwork raised when there are 9 available ports:

	$ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--501

	$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | 
	select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
	"LimitedPortsOnNetwork"

# Creating 8 ports so 1 port available only. Alarm InsuficientPortsOnNetwork not raised.

	$ for i in `seq 501 509`; do openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--$i; sleep 3; done

	$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
	"LimitedPortsOnNetwork"

# When creating the remaining port (0 ports available), the InsuficientPortsOnNetwork is raised.

	$ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--510

	$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog
	"InsuficientPortsOnNetwork"

	$ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--511
	ConflictException: 409: Client Error for url: https://overcloud.redhat.local:13696/v2.0/ports, No more IP addresses available on network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b.


Removing the ports will clear the alarms as expected.

Comment 8 errata-xmlrpc 2021-02-24 15:27:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.