Description of problem: Only one pod is running but if we see at the project description, it shows 6 pods running. [root@lpdospeu00060 ~]# oc get pods -n vng-vpaycoreengine NAME READY STATUS RESTARTS AGE vpaycore-build-83-build 0/1 Completed 0 3d vpaycore-deployment-83-2lffj 1/1 Running 1073 3d vpaycore-deployment-83-unvzl 0/1 Running 1087 3d oc describe project vng-vpaycoreengine Name: vng-vpaycoreengine Created: 5 months ago Labels: api-service/templateSource=e1-templates Annotations: adsGroups={"authentication":{},"logging":{"e1":"GG-ADS-E1-ePaaS-logging-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-logging-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-logging-vng-vpaycoreengine"},"monitoring":{"e1":"GG-ADS-E1-ePaaS-AppView-vng-vpaycoreengine","e2":"GG-ADS-E2-ePaaS-AppView-vng-vpaycoreengine","e3":"GG-ADS-E3-ePaaS-AppView-vng-vpaycoreengine"}} api-service/approvalStatus=e3_ready openshift.io/description=Project for application vng-vpaycoreengine openshift.io/display-name=vng-vpaycoreengine openshift.io/node-selector=tier=app,region=intranet openshift.io/requester=epaas.admin openshift.io/sa.scc.mcs=s0:c173,c167 openshift.io/sa.scc.supplemental-groups=1030090000/10000 openshift.io/sa.scc.uid-range=1030090000/10000 releases={"ecn":"6296942","emailID":"vidyasagar.guduru","appURL":"vpaymentcore-dev.aexp.com","releaseStatuses":{"jboss-app-template":{"releaseID":"Release2944566","onboardStatus":"Deployed","xlrUrl":"https://cd-paas.aexp.com/#releases/Release2944566"}},"currentapps":{},"cdVersions":{"jboss-app-template:":{"lastBuildVersions":"89","lastDeploymentVersions":"83"}},"isBGDeployment":false,"applicationName":"vpaycore"} Display Name: vng-vpaycoreengine Description: Project for application vng-vpaycoreengine Status: Active Node Selector: tier=app,region=intranet Quota: Name: quota Resource Used Hard -------- ---- ---- cpu 4800m 8 memory 10500M 17G pods 6 6 <<-----------HERE Resource limits: Name: limit Type Resource Min Max Default ---- -------- --- --- --- Container cpu - 1 250m Container memory - 2500M 500M Version-Release number of selected component (if applicable): OSE 3.1 Actual results: Quota: Name: quota Resource Used Hard -------- ---- ---- cpu 4800m 8 memory 10500M 17G pods 6 6 <<-----------HERE Expected results: Quota: Name: quota Resource Used Hard -------- ---- ---- cpu 4800m 8 memory 10500M 17G pods 1 6
The resource quota controller in enterprise-3.1 release ran a reconciliation loop that did a full resync over the entire system every 10s by default. Depending on the number of quotas in the system, and the number of resources potentially tracked by each quota, it is possible to have observed longer latencies in the quota systems ability to replenish quota. To better understand this environment, can I understand the following: - how many quotas are in the entire system? $ oc get quotas --all-namespaces - what resources are being tracked under quota in the system? the referenced quota shows pod related resources. are other quotas tracking additional items? if so, what? and how many of those are there in aggregate? - how long did we wait before never observing replenishment? there were a number of feature enhancements to improve the rate at which quota can replenish released resources in subsequent releases by modifying the controller to use shared informers and watches rather the polling model present in 3.1. this meant the quota system would observe the delete, and add the quota to the queue for reprocessing rather than polling.
For reference, the quota system was re-written in Kubernetes 1.2 release to improve responsiveness of the quota system. The major change was to update the quota framework to have the concept of a replenishment controller that watched for deletion events for particular kinds, and in response add associated quotas into a queue for processing. Upstream PR: https://github.com/kubernetes/kubernetes/pull/20446 An upgrade is needed to version 3.2+ in order to improve responsiveness of the quota system for replenishment needs.