Bug 1890630
| Summary: | [Kuryr] Available port count not correctly calculated for alerts | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jon Uriarte <juriarte> |
| Component: | Networking | Assignee: | Maysa Macedo <mdemaced> |
| Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | ltomasbo, rlobillo |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:27:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1897526 | ||
Adding an easier reproducer: 1. Create new project (will create a /23 subnet -> room for 510 ports) $ oc new-project test It will create the port for .1 2. Create 499 ports $ for i in `seq 2 500`; do openstack port create --network <network_id> port--$i; sleep 3; done This will create 499 ports and the alert LimitedPortsOnNetwork will be raised, but there are still 10 available ports. 3. Create 9 more ports $ for i in `seq 501 509`; do openstack port create --network <network_id> port--$i; sleep 3; done This will create 9 additional ports and the alert InsuficientPortsOnNetwork will be raised, but there is still 1 available port. 4. Create an additional port to check there is still one available port $ openstack port create --network <network_id> port--510 InsuficientPortsOnNetwork alert has been cleared due to the available ports is now -1 (different to 0) 5. Try creating an additional port to check it's not possible $ openstack port create --network <network_id> port--511 HttpException: 409: Client Error for url: https://10.46.44.10:13696/v2.0/ports, {"NeutronError": {"message": "No more IP addresses available on network <network_id>.", "type": "IpAddressGenerationFailure", "detail": ""}} Failed on OCP4.7.0-0.nightly-2020-11-18-203317 over OSP16.1 with OVN-Octavia (RHOS-16.1-RHEL-8-20201110.n.1) InsuficientPortsOnNetwork is raised when 1 port is available and it is cleared when 0 ports are available. # Creating 499 ports on the ns/test2-subnet: $ oc new-project test2 $ for i in `seq 2 499`; do openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--$i; sleep 3; done $ openstack subnet list | grep test2 | eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | ns/test2-subnet | e0d9964f-b3d5-464d-81ac-b4bc49fcb75b | 10.128.120.0/23 | $ openstack port list -f value | grep eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | wc -l 499 # No alarms raised: $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog # Alarm LimitedPortsOnNetwork raised when there are 10 available ports: $ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--500 $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "LimitedPortsOnNetwork" # Creating 9 ports so 1 port available only. Alarm InsuficientPortsOnNetwork is wrongly raised. $ for i in `seq 501 509`; do openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--$i; sleep 3; done curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "LimitedPortsOnNetwork" "InsuficientPortsOnNetwork" # When creating the remaining port (0 ports available), the InsuficientPortsOnNetwork is cleared. $ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--510 $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "LimitedPortsOnNetwork" $ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--511 ConflictException: 409: Client Error for url: https://overcloud.redhat.local:13696/v2.0/ports, No more IP addresses available on network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b. Verified on 4.7.0-0.nightly-2020-12-09-112139 on top of OSP16.1 with OVN-Octavia (RHOS-16.1-RHEL-8-20201124.n.0). # Creating 499 ports on the ns/test2-subnet: $ oc new-project test2 $ for i in `seq 2 500`; do openstack port create --network 9bf10b51-1014-4629-b55b-cc39b43c1544 port--$i; sleep 3; done $ openstack subnet list | grep test2 | eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | ns/test2-subnet | e0d9964f-b3d5-464d-81ac-b4bc49fcb75b | 10.128.120.0/23 | $ openstack port list -f value | grep eddc9862-1f09-4fd7-992d-dc01a76f9ef9 | wc -l 500 # No alarms raised: $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog # Alarm LimitedPortsOnNetwork raised when there are 9 available ports: $ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--501 $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "LimitedPortsOnNetwork" # Creating 8 ports so 1 port available only. Alarm InsuficientPortsOnNetwork not raised. $ for i in `seq 501 509`; do openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--$i; sleep 3; done $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "LimitedPortsOnNetwork" # When creating the remaining port (0 ports available), the InsuficientPortsOnNetwork is raised. $ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--510 $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' | grep -v -e AlertmanagerReceiversNotConfigured -e CannotRetrieveUpdates -e Watchdog "InsuficientPortsOnNetwork" $ openstack port create --network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b port--511 ConflictException: 409: Client Error for url: https://overcloud.redhat.local:13696/v2.0/ports, No more IP addresses available on network e0d9964f-b3d5-464d-81ac-b4bc49fcb75b. Removing the ports will clear the alarms as expected. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
Description of problem: Kuryr alerts LimitedPortsOnNetwork and InsuficientPortsOnNetwork are triggered based on the free port count in a given namespace subnet. LimitedPortsOnNetwork is triggered when there are less than 11 available ports (it should be when there are less than 10). InsuficientPortsOnNetwork is triggered when there is 1 available port (it should be when there are no available ports). Version-Release number of selected component (if applicable): OCP 4.6.0-0.nightly-2020-10-20-101225 OSP13 2020-10-06.2 How reproducible: always Steps to Reproduce: 1. Create new project (will create a /23 subnet -> room for 510 ports) $ oc new-project test 2. Create pods until ~490 ports are created $ for i in `seq 1 490`; do oc run --image kuryr/demo demo-$i; sleep 4; done 3. Create ports manually in the subnet until there are 11 available ports (499 ports in use) $ openstack port create --network <network-id> <port-name> The alert LimitedPortsOnNetwork is raised, while it should be raised when there are 10 available ports. 4. Create ports manually in the subnet until there is 1 available port (509 ports in use) $ openstack port create --network <network-id> <port-name> The alert InsuficientPortsOnNetwork is raised, while it should be raised when there are 0 available ports. Additional info: def _record_ports_quota_per_subnet_metric(self): """Records the ports quota per subnet to the registry""" subnets = self._os_net.subnets(project_id=self._project_id) namespace_prefix = 'ns/' for subnet in subnets: if namespace_prefix not in subnet.name: continue total_num_addresses = 0 ports_availability = 0 for allocation in subnet.allocation_pools: total_num_addresses += netaddr.IPRange( netaddr.IPAddress(allocation['start']), netaddr.IPAddress(allocation['end'])).size ports_count = len(list(self._os_net.ports( network_id=subnet.network_id, project_id=self._project_id))) labels = {'subnet_id': subnet.id, 'subnet_name': subnet.name} ports_availability = total_num_addresses-ports_count self.port_quota_per_subnet.labels(**labels).set(ports_availability) The total_num_addresses is calculated base on the allocation pool: allocation_pools | 10.128.116.2-10.128.117.254 which doesn't contain the allocation for the .1 (so it's 509) The ports_count instead (ports in use) counts the .1 (so it can increase up to 510). The calculation could be fixed by removing 1 port from ports_count (the one belonging to .1), or adding 1 port to total_num_addresses.