Bug 1259531

Summary:	quota synchronization timer not documented
Product:	OpenShift Container Platform	Reporter:	Erik M Jacobs <ejacobs>
Component:	Documentation	Assignee:	brice <bfallonf>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Vikram Goyal <vigoyal>
Severity:	high	Docs Contact:	Vikram Goyal <vigoyal>
Priority:	medium
Version:	3.0.0	CC:	aos-bugs, decarr, ejacobs, jokerman, mmccomas
Target Milestone:	---	Keywords:	Reopened
Target Release:	---	Flags:	decarr: needinfo-
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-09-23 23:10:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Erik M Jacobs 2015-09-02 23:53:21 UTC

openshift-master-3.0.1.0-1.git.525.eddc479.el7ose.x86_64

Using a quota of max 3 pods and a JSON file that creates 4 pods, do the following:

echo "First create"; oc create -f ~/training/content/hello-quota.json; sleep 10; echo "Delete everything"; oc delete pods --all -n demo; echo "Second create"; oc create -f ~/training/content/hello-quota.json

You'll see:

First create
pods/hello-openshift-1
pods/hello-openshift-2
pods/hello-openshift-3
Error from server: Pod "hello-openshift-4" is forbidden: Limited to 3 pods
Delete everything
pods/hello-openshift-1
pods/hello-openshift-2
pods/hello-openshift-3
Second create
Error from server: Pod "hello-openshift-1" is forbidden: Limited to 3 pods
Error from server: Pod "hello-openshift-2" is forbidden: Limited to 3 pods
Error from server: Pod "hello-openshift-3" is forbidden: Limited to 3 pods
Error from server: Pod "hello-openshift-4" is forbidden: Limited to 3 pods

So, even though the pods are gone, the quota has not been simultaneously recalculated. This could cause a problem in automation scenarios.

I used the following to determine about how long it takes for the quota to get updated:

oc create -f ~/training/content/hello-quota.json; sleep 10; oc delete pods --all -n demo; time watch oc describe quota test-quota                                                         
pods/hello-openshift-1
pods/hello-openshift-2
pods/hello-openshift-3
Error from server: Pod "hello-openshift-4" is forbidden: Limited to 3 pods
pods/hello-openshift-1
pods/hello-openshift-2
pods/hello-openshift-3

real    0m8.012s
user    0m1.180s
sys     0m0.037s

8 seconds is a *long* time.

Comment 2 Paul Weil 2015-09-03 17:41:13 UTC

Quota is synchronized asynchronously in Kube by a controller.  You can control the default synchronization period (10s) by changing the master config's controllerArguments.  For example:

kubernetesMasterConfig:
  apiLevels:
  - v1beta3
  - v1
  apiServerArguments: null
  controllerArguments:
    resource-quota-sync-period:
      - "5s"

Derek, sending this your way for confirmation that this is the correct.

Comment 3 Erik M Jacobs 2015-09-03 17:49:57 UTC

Should this be a docs bug?

Comment 4 Derek Carr 2015-09-03 18:22:44 UTC

Paul, your summary is correct.
Erik, the doc makes not of the asynchronous deletion behavior.

Doc here:
https://docs.openshift.org/latest/dev_guide/quota.html

Quota enforcement:
Once a quota is created and usage statistics are up-to-date, the project accepts the creation of new content. When you create resources, your quota usage is incremented immediately upon the request to create or modify the resource. When you delete a resource, your quota use is decremented during the next full recalculation of quota statistics for the project. As a result, it may take a moment for your quota usage statistics to be reduced to their current observed system value when you delete resources.

As for improving the latency, there is a card to look at shortening the interval.  In practice, things like replication controllers just retry so that style of automation works well.  In latest upstream, graceful deletion of pods went in so the pod actually hangs around 30s terminating after you attempt to delete it, so quota is not the largest interval period.

Reason we cannot capture deletes synchronously is because etcd lacks multi-object transaction support.

Comment 5 Erik M Jacobs 2015-09-03 18:32:40 UTC

Missing config doc, though..?

Comment 6 brice 2015-09-15 06:23:05 UTC

Erik, Derek,

I've submitted a PR for this:

https://github.com/openshift/openshift-docs/pull/963

I've put in a new section on changing the setting to change the sync time, and linked to it from a previous paragraph so the reader can get some more context. 

Can I get an ack this is following the right track, or so I know I'm on the right track? I don't think I have any questions.

Comment 7 Derek Carr 2015-09-16 18:24:35 UTC

The updated text looks good if you note that increasing the frequency of quota calculation will increase the load on the openshift-master and should be done in a balanced way to maintain system performance with end-user goals.

As noted on the doc, there is a plan to make observations of deletes for cpu,memory,pods to happen more rapidly by adding a watch in quota controller for pod related resources.  When that happens, we will most likely increase the value from 30s to something much larger.

Comment 8 brice 2015-09-21 04:45:51 UTC

Suggestion fixed, and PR has merged.

Putting this to closed.