Bug 1473181 - Pod quota not reporting correctly
Pod quota not reporting correctly
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
All Linux
unspecified Severity high
: ---
: 3.1.1
Assigned To: Derek Carr
DeShuai Ma
Depends On:
  Show dependency treegraph
Reported: 2017-07-20 04:04 EDT by Neeraj
Modified: 2017-08-16 17:31 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-08-16 17:31:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Neeraj 2017-07-20 04:04:20 EDT
Description of problem: Only one pod is running but if we see at the project description, it shows 6 pods running.

[root@lpdospeu00060 ~]# oc get pods -n vng-vpaycoreengine
NAME                           READY     STATUS      RESTARTS   AGE
vpaycore-build-83-build        0/1       Completed   0          3d
vpaycore-deployment-83-2lffj   1/1       Running     1073       3d
vpaycore-deployment-83-unvzl   0/1       Running     1087       3d

oc describe project vng-vpaycoreengine
Name:           vng-vpaycoreengine
Created:        5 months ago
Labels:         api-service/templateSource=e1-templates
Annotations:    adsGroups={"authentication":{},"logging":{"e1":"GG-ADS-E1-ePaaS-logging-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-logging-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-logging-vng-vpaycoreengine"},"monitoring":{"e1":"GG-ADS-E1-ePaaS-AppView-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-AppView-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-AppView-vng-vpaycoreengine"}}
                openshift.io/description=Project for application vng-vpaycoreengine
Display Name:   vng-vpaycoreengine
Description:    Project for application vng-vpaycoreengine
Status:         Active
Node Selector:  tier=app,region=intranet
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            6       6                     <<-----------HERE
Resource limits:
        Name:           limit
        Type            Resource        Min     Max     Default
        ----            --------        ---     ---     ---
        Container       cpu             -       1       250m
        Container       memory          -       2500M   500M

Version-Release number of selected component (if applicable): OSE 3.1

Actual results:

        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            6       6                     <<-----------HERE

Expected results:
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            1       6
Comment 1 Derek Carr 2017-07-20 10:05:19 EDT
The resource quota controller in enterprise-3.1 release ran a reconciliation loop that did a full resync over the entire system every 10s by default.

Depending on the number of quotas in the system, and the number of resources potentially tracked by each quota, it is possible to have observed longer latencies in the quota systems ability to replenish quota.

To better understand this environment, can I understand the following:

- how many quotas are in the entire system?

$ oc get quotas --all-namespaces

- what resources are being tracked under quota in the system?

the referenced quota shows pod related resources.

are other quotas tracking additional items?

if so, what?  and how many of those are there in aggregate?

- how long did we wait before never observing replenishment?

there were a number of feature enhancements to improve the rate at which quota can replenish released resources in subsequent releases by modifying the controller to use shared informers and watches rather the polling model present in 3.1.  this meant the quota system would observe the delete, and add the quota to the queue for reprocessing rather than polling.
Comment 9 Derek Carr 2017-08-16 17:31:34 EDT
For reference, the quota system was re-written in Kubernetes 1.2 release to improve responsiveness of the quota system.  The major change was to update the quota framework to have the concept of a replenishment controller that watched for deletion events for particular kinds, and in response add associated quotas into a queue for processing.

Upstream PR:

An upgrade is needed to version 3.2+ in order to improve responsiveness of the quota system for replenishment needs.

Note You need to log in before you can comment on or make changes to this bug.