Bug 2001317 - OCP Platform Quota Check - Inaccurate MissingQuota error
Summary: OCP Platform Quota Check - Inaccurate MissingQuota error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.10.0
Assignee: Pierre Prinetti
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-05 13:56 UTC by Itay Matza
Modified: 2022-03-10 16:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The pre-flight checks did not take into account the current utilisation of OpenStack resources Consequence: Pre-flight checks would fail with the wrong error message when utilisation, and not overall quota, are impending installation Fix: with this change, the pre-flight checks pass information about quota and utilisation Result: pre-flight checks now fail with the relevant message when there is not enough quota, and with the relevant message when the quota would be sufficient but the available resources are insufficient.
Clone Of:
Environment:
Last Closed: 2022-03-10 16:07:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5197 0 None None None 2021-09-07 15:24:48 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:07:24 UTC

Description Itay Matza 2021-09-05 13:56:23 UTC
Version:
OSP: RHOS-16.1-RHEL-8-20210604.n.0
OCP: OCP 4.9.0-0.nightly-2021-08-07-175228.


Platform:
OpenShift on OpenStack.


Description:
The OCP installation requires the following available resources - https://github.com/openshift/installer/tree/master/docs/user/openstack#openstack-requirements and requires the following available resources when using Kuryr - https://github.com/openshift/installer/blob/master/docs/user/openstack/kuryr.md#requirements-when-enabling-kuryr.

The installer checking for the available resources: quota.Limit - quota.InUse - quota.Reserved.
If the OCP project does not have enough available resources, the Platform Quota Check will print the following MissingQuota error:
> level=fatal msg=failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Port is not available because the required number of resources (1499) is more than the limit of 1498

This error message could be confusing as the term "limit" could be understood as the defined quota configuration and not available resources.

Emilien previously suggested changing this error (https://bugzilla.redhat.com/show_bug.cgi?id=1989917#c2):
> e.g. because the required number of resources (1000) is superior than the available resources (996)


Example for a long MissingQuota error message:
> time="2021-09-02T11:33:42Z" level=debug msg="  Generating Platform Quota Check..."
> time="2021-09-02T11:33:42Z" level=fatal msg="failed to fetch Cluster: failed to fetch dependency of \"Cluster\": failed to generate asset \"Platform Quota Check\": error(MissingQuota): RAM is not available because the required number of resources (98304) is more than the limit of 49152, Port is not available because the required number of resources (1496) is more than the limit of 1490, Network is not available because the required number of resources (250) is more than the limit of 249, Subnet is not available because the required number of resources (250) is more than the limit of 249, SecurityGroup is not available because the required number of resources (249) is more than the limit of 247, SecurityGroupRule is not available because the required number of resources (996) is more than the limit of 954"

Comment 1 Pierre Prinetti 2021-09-07 14:47:44 UTC
The OpenStack usage fetcher ("cloudinfo") indeed conflates the number of in-use resources with the quota[1] and passes that to the checker[2].
If instead we passed the detailed data, the checker would output a more precise error message[3]. For example: "the required number of resources (1499) is more than remaining quota of 1498".

This bug is valid.

[1]: https://github.com/openshift/installer/blob/f3f77ccf8b870d6e5a2648f8cf87a01e2a62e6fe/pkg/asset/installconfig/openstack/validation/cloudinfo.go#L471
[2]: https://github.com/openshift/installer/blob/f3f77ccf8b870d6e5a2648f8cf87a01e2a62e6fe/pkg/asset/installconfig/openstack/validation/cloudinfo.go#L536-L538
[3]: https://github.com/openshift/installer/blob/f3f77ccf8b870d6e5a2648f8cf87a01e2a62e6fe/pkg/quota/quota.go#L103

Comment 32 Pierre Prinetti 2021-09-13 09:59:03 UTC
@Itay

Comment 33 Pierre Prinetti 2021-09-13 10:12:27 UTC
Please ignore comment #32.

Itay:

The algorithm is:
* If the required resources exceed quota, the message follows this template:
  *** "the required number of resources (${required}) is more than the limit of ${quota}" ***

* If the required resources do not exceed the OpenStack quota, but they do once summed with the resources that are currently in-use in the cluster, the message follows this template:
  *** "the required number of resources (${required}) is more than remaining quota of ${ quota MINUS currently-in-use }" ***

---

This algorithm is set in a platform-independent part of the code base, and therefore consistent with what happens in AWS or GCP.

---

In order to test, I suggest to prepare your environment like this:

$ OS_CLOUD=${admin}  openstack quota set shiftstack --ports 1500
$ OS_CLOUD=${tenant} openstack network create test1 && for i in {1..10}; do openstack port create --network test1 "testport_${i}"; done

and only after, run the installation. The error message should then mention "remaining quota" instead of "limit", because the quota itself is higher than the requirement.

Comment 34 Itay Matza 2021-09-13 13:48:05 UTC
(In reply to Pierre Prinetti from comment #33)
Thanks Pierre,

Verified successfully on openshift-install version 4.10.0-0.nightly-2021-09-10-083647:

1) For Kuryr:
>(shiftstack) [stack@undercloud-0 ~]$ grep type install-config.yaml 
>  type: "Kuryr"
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --ports 1500
>(shiftstack) [stack@undercloud-0 ~]$ openstack network create test1 && for i in {1..10}; do openstack port create --network test1 "testport_${i}"; done

The two types of errors on the same Platform Quota Check:
>(shiftstack) [stack@undercloud-0 ~]$ openshift-install create cluster --dir ostest/
>INFO Credentials loaded from file "/home/stack/clouds.yaml" 
>INFO Consuming Install Config from target directory 
>INFO Obtaining RHCOS image file from 'https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.9/49.84.202107010027-0/x86_64/rhcos-49.84.202107010027-0-openstack.x86_64.qcow2.gz?sha256=00cb56c8711686255744646394e22a8ca5f27e059016f6758f14388e5a0a14cb' 
>FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Network is not available because the required number of resources (250) is more than remaining quota of 249, Port is not available because the required number of resources (1496) is more than remaining quota of 1487 

The first type of error - in this case, the required resources exceed quota:
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --networks 251
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --ports 1400
>(overcloud) [stack@undercloud-0 ~]$ openshift-install create cluster --dir ostest/
>FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Port is not available because the required number of resources (1496) is more than the limit of 1400 

The second type of error - in this case, the required resources do not exceed the OpenStack quota, but they do once summed with the resources that are currently in-use in the cluster:
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --ports 1500
>(overcloud) [stack@undercloud-0 ~]$ openshift-install create cluster --dir ostest/
>FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Port is not available because the required number of resources (1496) is more than remaining quota of 1487 


2) For other types of networks:
>(overcloud) [stack@undercloud-0 ~]$ grep type install-config.yaml 
>  type: "OpenShiftSDN"

The first type of error - in this case, the required resources exceed quota:
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --routers 0
>(overcloud) [stack@undercloud-0 ~]$ openshift-install create cluster --dir ostest/
>FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Port is not available because the required number of resources (1496) is more than remaining quota of 1487, Router is not available because the required number of resources (1) is more than the limit of 0 
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --routers 1

The second type of error - in this case, the required resources do not exceed the OpenStack quota, but they do once summed with the resources that are currently in-use in the cluster:
>(shiftstack) [stack@undercloud-0 ~]$ openstack router create test1 
>(shiftstack) [stack@undercloud-0 ~]$ openshift-install create cluster --dir ostest/
>FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Router is not available because the required number of resources (1) is more than remaining quota of 0, Port is not available because the required number of resources (1496) is more than remaining quota of 1487

Comment 35 Itay Matza 2021-09-13 14:15:15 UTC
(In reply to Itay Matza from comment #34)

Please ignore the "2) For other types of networks" part in comment #34.
Correction for the other types of networks verification:

2) For other types of networks:
>(overcloud) [stack@undercloud-0 ~]$ grep type install-config.yaml 
>  type: "OpenShiftSDN"


The first type of error - in this case, the required resources exceed quota:
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --routers 0
>(overcloud) [stack@undercloud-0 ~]$ openshift-install create cluster --dir ostest/
>FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Router is not available because the required number of resources (1) is more than the limit of 0 

The second type of error - in this case, the required resources do not exceed the OpenStack quota, but they do once summed with the resources that are currently in-use in the cluster:
>(overcloud) [stack@undercloud-0 ~]$ openstack quota set shiftstack --routers 1
>(shiftstack) [stack@undercloud-0 ~]$ openstack router create test1 
>(shiftstack) [stack@undercloud-0 ~]$ openshift-install create cluster --dir ostest/
>FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): Router is not available because the required number of resources (1) is more than remaining quota of 0

Comment 36 ShiftStack Bugwatcher 2021-11-25 16:12:24 UTC
Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 40 errata-xmlrpc 2022-03-10 16:07:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.