Hide Forgot
+++ This bug was initially created as a clone of Bug #1890630 +++ Description of problem: Kuryr alerts LimitedPortsOnNetwork and InsuficientPortsOnNetwork are triggered based on the free port count in a given namespace subnet. LimitedPortsOnNetwork is triggered when there are less than 11 available ports (it should be when there are less than 10). InsuficientPortsOnNetwork is triggered when there is 1 available port (it should be when there are no available ports). Version-Release number of selected component (if applicable): OCP 4.6.0-0.nightly-2020-10-20-101225 OSP13 2020-10-06.2 How reproducible: always Steps to Reproduce: 1. Create new project (will create a /23 subnet -> room for 510 ports) $ oc new-project test 2. Create pods until ~490 ports are created $ for i in `seq 1 490`; do oc run --image kuryr/demo demo-$i; sleep 4; done 3. Create ports manually in the subnet until there are 11 available ports (499 ports in use) $ openstack port create --network <network-id> <port-name> The alert LimitedPortsOnNetwork is raised, while it should be raised when there are 10 available ports. 4. Create ports manually in the subnet until there is 1 available port (509 ports in use) $ openstack port create --network <network-id> <port-name> The alert InsuficientPortsOnNetwork is raised, while it should be raised when there are 0 available ports. Additional info: def _record_ports_quota_per_subnet_metric(self): """Records the ports quota per subnet to the registry""" subnets = self._os_net.subnets(project_id=self._project_id) namespace_prefix = 'ns/' for subnet in subnets: if namespace_prefix not in subnet.name: continue total_num_addresses = 0 ports_availability = 0 for allocation in subnet.allocation_pools: total_num_addresses += netaddr.IPRange( netaddr.IPAddress(allocation['start']), netaddr.IPAddress(allocation['end'])).size ports_count = len(list(self._os_net.ports( network_id=subnet.network_id, project_id=self._project_id))) labels = {'subnet_id': subnet.id, 'subnet_name': subnet.name} ports_availability = total_num_addresses-ports_count self.port_quota_per_subnet.labels(**labels).set(ports_availability) The total_num_addresses is calculated base on the allocation pool: allocation_pools | 10.128.116.2-10.128.117.254 which doesn't contain the allocation for the .1 (so it's 509) The ports_count instead (ports in use) counts the .1 (so it can increase up to 510). The calculation could be fixed by removing 1 port from ports_count (the one belonging to .1), or adding 1 port to total_num_addresses. --- Additional comment from juriarte on 2020-10-23 09:43:40 UTC --- Adding an easier reproducer: 1. Create new project (will create a /23 subnet -> room for 510 ports) $ oc new-project test It will create the port for .1 2. Create 499 ports $ for i in `seq 2 500`; do openstack port create --network <network_id> port--$i; sleep 3; done This will create 499 ports and the alert LimitedPortsOnNetwork will be raised, but there are still 10 available ports. 3. Create 9 more ports $ for i in `seq 501 509`; do openstack port create --network <network_id> port--$i; sleep 3; done This will create 9 additional ports and the alert InsuficientPortsOnNetwork will be raised, but there is still 1 available port. 4. Create an additional port to check there is still one available port $ openstack port create --network <network_id> port--510 InsuficientPortsOnNetwork alert has been cleared due to the available ports is now -1 (different to 0) 5. Try creating an additional port to check it's not possible $ openstack port create --network <network_id> port--511 HttpException: 409: Client Error for url: https://10.46.44.10:13696/v2.0/ports, {"NeutronError": {"message": "No more IP addresses available on network <network_id>.", "type": "IpAddressGenerationFailure", "detail": ""}}
Verified in OCP 4.6.0-0.nightly-2021-04-17-182039 on top of OSP 13.0.16 (2021-04-09.1). Verification steps: For checking the alerts in prometheus, make sure you have the next entry in the /etc/hosts: <APPS_FIP> prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com How to check the alerts from CLI: $ token=`oc sa get-token prometheus-k8s -n openshift-monitoring` List all the alerts: $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' Get a specific alert (i.e. LimitedPortsOnNetwork): $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname == "LimitedPortsOnNetwork")' ## Create new project (will create a /23 subnet -> room for 510 ports) $ oc new-project test ## Create 499 ports in the ns/test network (so there are 500 in use in total) $ openstack subnet list | grep test | 29ea4014-b75d-435c-8be3-0e1d3a6b5e12 | ns/test-subnet | 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 | 10.128.116.0/23 | $ openstack port list | grep 10.128.116 | 44276942-d449-47b5-856f-4bbbdfeb2f47 | | fa:16:3e:33:fa:43 | ip_address='10.128.116.1', subnet_id='29ea4014-b75d-435c-8be3-0e1d3a6b5e12' | ACTIVE | $ for i in `seq 2 500`; do openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--$i; sleep 3; done ## Check LimitedPortsOnNetwork alarm is not raised, as there are still 10 available ports $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog $ ## Create one port more, so there are 9 available ports $ openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--501 ## Check LimitedPortsOnNetwork alarm is raised, as there are less than 10 available ports (there are 9) $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "LimitedPortsOnNetwork" ## Create 8 more ports, so there will be 1 available por only $ for i in `seq 502 509`; do openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--$i; sleep 3; done ## Check InsuficientPortsOnNetwork alarm is not raised, as there is still 1 available port $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "LimitedPortsOnNetwork" ## Create one port more, so there are 0 available ports $ openstack port create --network 2359dce9-28bf-4f19-8639-1fe2cbd8fcd9 port--510 ## Check InsuficientPortsOnNetwork alarm is raised, as there are no available port $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "InsuficientPortsOnNetwork" Removing the ports will clear the alarms as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.26 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1232