Bug 1473181

Summary:	Pod quota not reporting correctly
Product:	OpenShift Container Platform	Reporter:	Neeraj <nbhatt>
Component:	Node	Assignee:	Derek Carr <decarr>
Status:	CLOSED WONTFIX	QA Contact:	DeShuai Ma <dma>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.1.1	CC:	aos-bugs, cjo, decarr, jokerman, mmccomas, nbhatt
Target Milestone:	---
Target Release:	3.1.1
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-16 21:31:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Neeraj 2017-07-20 08:04:20 UTC

Description of problem: Only one pod is running but if we see at the project description, it shows 6 pods running.


[root@lpdospeu00060 ~]# oc get pods -n vng-vpaycoreengine
NAME                           READY     STATUS      RESTARTS   AGE
vpaycore-build-83-build        0/1       Completed   0          3d
vpaycore-deployment-83-2lffj   1/1       Running     1073       3d
vpaycore-deployment-83-unvzl   0/1       Running     1087       3d


oc describe project vng-vpaycoreengine
Name:           vng-vpaycoreengine
Created:        5 months ago
Labels:         api-service/templateSource=e1-templates
Annotations:    adsGroups={"authentication":{},"logging":{"e1":"GG-ADS-E1-ePaaS-logging-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-logging-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-logging-vng-vpaycoreengine"},"monitoring":{"e1":"GG-ADS-E1-ePaaS-AppView-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-AppView-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-AppView-vng-vpaycoreengine"}}
                api-service/approvalStatus=e3_ready
                openshift.io/description=Project for application vng-vpaycoreengine
                openshift.io/display-name=vng-vpaycoreengine
                openshift.io/node-selector=tier=app,region=intranet
                openshift.io/requester=epaas.admin
                openshift.io/sa.scc.mcs=s0:c173,c167
                openshift.io/sa.scc.supplemental-groups=1030090000/10000
                openshift.io/sa.scc.uid-range=1030090000/10000
                releases={"ecn":"6296942","emailID":"vidyasagar.guduru","appURL":"vpaymentcore-dev.aexp.com","releaseStatuses":{"jboss-app-template":{"releaseID":"Release2944566","onboardStatus":"Deployed","xlrUrl":"https://cd-paas.aexp.com/#releases/Release2944566"}},"currentapps":{},"cdVersions":{"jboss-app-template:":{"lastBuildVersions":"89","lastDeploymentVersions":"83"}},"isBGDeployment":false,"applicationName":"vpaycore"}
Display Name:   vng-vpaycoreengine
Description:    Project for application vng-vpaycoreengine
Status:         Active
Node Selector:  tier=app,region=intranet
Quota:
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            6       6                     <<-----------HERE
Resource limits:
        Name:           limit
        Type            Resource        Min     Max     Default
        ----            --------        ---     ---     ---
        Container       cpu             -       1       250m
        Container       memory          -       2500M   500M

Version-Release number of selected component (if applicable): OSE 3.1


Actual results:

Quota:
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            6       6                     <<-----------HERE

Expected results:
 Quota:
        Name:           quota
        Resource        Used    Hard
        --------        ----    ----
        cpu             4800m   8
        memory          10500M  17G
        pods            1       6

Comment 1 Derek Carr 2017-07-20 14:05:19 UTC

The resource quota controller in enterprise-3.1 release ran a reconciliation loop that did a full resync over the entire system every 10s by default.

Depending on the number of quotas in the system, and the number of resources potentially tracked by each quota, it is possible to have observed longer latencies in the quota systems ability to replenish quota.

To better understand this environment, can I understand the following:

- how many quotas are in the entire system?

$ oc get quotas --all-namespaces

- what resources are being tracked under quota in the system?

the referenced quota shows pod related resources.

are other quotas tracking additional items?

if so, what?  and how many of those are there in aggregate?

- how long did we wait before never observing replenishment?

there were a number of feature enhancements to improve the rate at which quota can replenish released resources in subsequent releases by modifying the controller to use shared informers and watches rather the polling model present in 3.1.  this meant the quota system would observe the delete, and add the quota to the queue for reprocessing rather than polling.

Comment 9 Derek Carr 2017-08-16 21:31:34 UTC

For reference, the quota system was re-written in Kubernetes 1.2 release to improve responsiveness of the quota system.  The major change was to update the quota framework to have the concept of a replenishment controller that watched for deletion events for particular kinds, and in response add associated quotas into a queue for processing.

Upstream PR:
https://github.com/kubernetes/kubernetes/pull/20446

An upgrade is needed to version 3.2+ in order to improve responsiveness of the quota system for replenishment needs.