1473181 – Pod quota not reporting correctly

Bug 1473181 - Pod quota not reporting correctly

Summary: Pod quota not reporting correctly

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.1.1
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.1.1
Assignee:	Derek Carr
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-20 08:04 UTC by Neeraj
Modified:	2017-08-16 21:31 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-16 21:31:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Neeraj 2017-07-20 08:04:20 UTC

Description of problem: Only one pod is running but if we see at the project description, it shows 6 pods running.


[root@lpdospeu00060 ~]# oc get pods -n vng-vpaycoreengine
NAME                           READY     STATUS      RESTARTS   AGE
vpaycore-build-83-build        0/1       Completed   0          3d
vpaycore-deployment-83-2lffj   1/1       Running     1073       3d
vpaycore-deployment-83-unvzl   0/1       Running     1087       3d


oc describe project vng-vpaycoreengine
Name:           vng-vpaycoreengine
Created:        5 months ago
Labels:         api-service/templateSource=e1-templates
Annotations:    adsGroups={"authentication":{},"logging":{"e1":"GG-ADS-E1-ePaaS-logging-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-logging-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-logging-vng-vpaycoreengine"},"monitoring":{"e1":"GG-ADS-E1-ePaaS-AppView-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-AppView-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-AppView-vng-vpaycoreengine"}}
                api-service/approvalStatus=e3_ready
                openshift.io/description=Project for application vng-vpaycoreengine
                openshift.io/display-name=vng-vpaycoreengine
                openshift.io/node-selector=tier=app,region=intranet
                openshift.io/requester=epaas.admin
                openshift.io/sa.scc.mcs=s0:c173,c167
                openshift.io/sa.scc.supplemental-groups=1030090000/10000
                openshift.io/sa.scc.uid-range=1030090000/10000
                releases={"ecn":"6296942","emailID":"vidyasagar.guduru","appURL":"vpaymentcore-dev.aexp.com","releaseStatuses":{"jboss-app-template":{"releaseID":"Release2944566","onboardStatus":"Deployed","xlrUrl":"https://cd-paas.aexp.com/#releases/Release2944566"}},"currentapps":{},"cdVersions":{"jboss-app-template:":{"lastBuildVersions":"89","lastDeploymentVersions":"83"}},"isBGDeployment":false,"applicationName":"vpaycore"}
Display Name:   vng-vpaycoreengine
Description:    Project for application vng-vpaycoreengine
Status:         Active
Node Selector:  tier=app,region=intranet
Quota:
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            6       6                     <<-----------HERE
Resource limits:
        Name:           limit
        Type            Resource        Min     Max     Default
        ----            --------        ---     ---     ---
        Container       cpu             -       1       250m
        Container       memory          -       2500M   500M

Version-Release number of selected component (if applicable): OSE 3.1


Actual results:

Quota:
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            6       6                     <<-----------HERE

Expected results:
 Quota:
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            1       6

Comment 1 Derek Carr 2017-07-20 14:05:19 UTC

The resource quota controller in enterprise-3.1 release ran a reconciliation loop that did a full resync over the entire system every 10s by default.

Depending on the number of quotas in the system, and the number of resources potentially tracked by each quota, it is possible to have observed longer latencies in the quota systems ability to replenish quota.

To better understand this environment, can I understand the following:

- how many quotas are in the entire system?

$ oc get quotas --all-namespaces

- what resources are being tracked under quota in the system?

the referenced quota shows pod related resources.

are other quotas tracking additional items?

if so, what?  and how many of those are there in aggregate?

- how long did we wait before never observing replenishment?

there were a number of feature enhancements to improve the rate at which quota can replenish released resources in subsequent releases by modifying the controller to use shared informers and watches rather the polling model present in 3.1.  this meant the quota system would observe the delete, and add the quota to the queue for reprocessing rather than polling.

Comment 9 Derek Carr 2017-08-16 21:31:34 UTC

For reference, the quota system was re-written in Kubernetes 1.2 release to improve responsiveness of the quota system.  The major change was to update the quota framework to have the concept of a replenishment controller that watched for deletion events for particular kinds, and in response add associated quotas into a queue for processing.

Upstream PR:
https://github.com/kubernetes/kubernetes/pull/20446

An upgrade is needed to version 3.2+ in order to improve responsiveness of the quota system for replenishment needs.

Note You need to log in before you can comment on or make changes to this bug.