Bug 1324825

Summary:	Better to adjust min value shown in web console Set Resource Limits page when ClusterResourceOverride request/limit ratio is used
Product:	OpenShift Online	Reporter:	Xingxing Xia <xxia>
Component:	Management Console	Assignee:	Abhishek Gupta <abhgupta>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Yadan Pei <yapei>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.x	CC:	abhgupta, aos-bugs, dmcphers, jforrest, jokerman, lmeyer, mmccomas, pweil, qixuan.wang, spadgett, tiwillia, wmeng
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1439818 (view as bug list)		Environment:
Last Closed:	2016-06-23 17:32:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Xingxing Xia 2016-04-07 11:34:42 UTC

Description of problem:
Online can let user set memory limit thus can adjust numbers of running pods within pod quota. But in fact, it fails. `oc describe` shows it accepts wrong set values.
But if don't set '--limits' (use the default 512Mi), the pod will be created successfully.
$ oc run hello --image=openshift/hello-openshift --generator=run-controller/v1

Version-Release number of selected component (if applicable):
dev-preview-int
v3.2.0.6

How reproducible:
Always

Steps to Reproduce:
1. $ oc run hello --image=openshift/hello-openshift --generator=run-controller/v1 --limits='memory=200Mi'  # 200Mi is valid, because in limitrange, memory requires at least 150Mi. See 'Additional Info'

replicationcontroller "hello" created
2. $ oc get rc hello -o yaml
...
      containers:
      - image: openshift/hello-openshift
        imagePullPolicy: Always
        name: hello
        resources:
          limits:
            memory: 200Mi
...
3. 
$ oc describe rc hello
...
Error creating: pods "hello-" is forbidden: [Minimum cpu usage per Pod is 30m, but request is 24m., Minimum memory usage per Pod is 150Mi, but request is 125829120., Minimum cpu usage per Container is 30m, but request is 23400u., Minimum memory usage per Container is 150Mi, but request is 120Mi.]
...
Actual results:
3. The message shows wrong set values. Only memory limit is set, and the set value is 200Mi, but the message mistakes as 120Mi. And cpu values are not set, but the message says they are set.


Expected results:
3. rc should successfully create pod with correct resource setting.


Additional info:
$ oc get limitrange resource-limits -o yaml
...
spec:
  limits:
  - max:
      cpu: "2"
      memory: 1Gi
    min:
      cpu: 30m
      memory: 150Mi
    type: Pod
  - default:
      cpu: "1"
      memory: 512Mi
    defaultRequest:
      cpu: 60m
      memory: 307Mi
    max:
      cpu: "2"
      memory: 1Gi
    min:
      cpu: 30m
      memory: 150Mi
    type: Container

Comment 1 Timothy Williams 2016-04-07 21:10:57 UTC

I can reproduce this issue.

It appears that setting a memory limit below 250Mi causes this issue. Anything higher works without issue:

$ oc run hello9 --image=openshift/hello-openshift --generator=run-controller/v1 --requests='memory=160Mi' --limits='memory=249Mi'
replicationcontroller "hello9" created

$ oc describe rc hello9
Name:		hello9
Namespace:	timbo
Image(s):	openshift/hello-openshift
Selector:	run=hello9
Labels:		run=hello9
Replicas:	0 current / 1 desired
Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Reason		Message
  ─────────	────────	─────	────				─────────────	──────		───────
  3s		3s		2	{replication-controller }			FailedCreate	Error creating: pods "hello9-" is forbidden: [Minimum memory usage per Pod is 150Mi, but request is 156657255., Minimum memory usage per Container is 150Mi, but request is 156657254400m.]



$ oc run hello10 --image=openshift/hello-openshift --generator=run-controller/v1 --requests='memory=160Mi' --limits='memory=250Mi'
replicationcontroller "hello10" created

$ oc describe rc hello10
Name:		hello10
Namespace:	timbo
Image(s):	openshift/hello-openshift
Selector:	run=hello10
Labels:		run=hello10
Replicas:	1 current / 1 desired
Pods Status:	0 Running / 1 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Reason			Message
  ─────────	────────	─────	────				─────────────	──────			───────
  2s		2s		1	{replication-controller }			SuccessfulCreate	Created pod: hello10-bm973

Comment 2 Qixuan Wang 2016-04-08 07:15:02 UTC

Master-config.yaml has been set Override ratio as follows,
      ClusterResourceOverride:
        configuration:
          apiVersion: v1
          kind: ClusterResourceOverrideConfig
          limitCPUToMemoryPercent: 200
          cpuRequestToLimitPercent: 6
          memoryRequestToLimitPercent: 60

And dev-preview-int has limitrange set as follows,
# oc get limits
NAME              AGE
resource-limits   5m

# oc describe limits resource-limits
Name:		resource-limits
Namespace:	qwang2
Type		Resource	Min	Max	Default Request	Default Limit	Max Limit/Request Ratio
----		--------	---	---	---------------	-------------	-----------------------
Pod		memory		150Mi	1Gi	-		-		-
Pod		cpu		30m	2	-		-		-
Container	cpu		30m	2	60m		1		-
Container	memory		150Mi	1Gi	307Mi		512Mi		-

request.memory=0.6*limit.memory, request.cpu=0.06*limit.cpu
If we assume (request.memory=0.6*limit.memory) > (limitrange.Min=150Mi), limit.memory should be 250Mi at least. CPU is in the same way. 
So calculation in the error message is correct although it's not clear at a glance.
But the problems is user prompts on website "The amount of Memory the container is limited to use. Memory 150 MiB min to 1 GiB max ". Users don't know the calculating ratio and process. What users concern are like this "ok you said 150MiB~1GiB and I input 200MiB, why pod can't run? The error message isn't clear to me and finally what number can work?" IMO, Should we update that value on website or adjust numbers of limitrange?

Comment 3 Xingxing Xia 2016-04-08 07:51:25 UTC

Thanks  Qixuan Wang.
Indeed, web console suggestion confuses user. Steps to see web console suggestion:
  Create dc or rc.
  In web console project overview page, click "Browse" --> "Deployments".
  Click the created dc or rc --> "Actions" -- > "Set Resource Limits". Then the page displays suggestion above memory input box: "Memory 150 MiB min to 1 GiB max"

User might be unclear about the ClusterResourceOverride and its mechanism. If he/she inputs value < 250Mi, he/she may not easily figure out why pod cannot run.

Similar web console suggestion when new app from image.

The "Setting" page also makes user prone to confusion.

Comment 4 Dan Mace 2016-04-15 12:46:14 UTC

Jessica/Sam,

Isn't this a presentation issue in both the Origin console and the Kube limit range describer? I'm not sure I understand how this can be an issue with Online specifically.

Comment 5 Samuel Padgett 2016-04-15 13:20:30 UTC

Dan, the request is being set automatically from the limit using the admission controller Luke wrote https://github.com/openshift/origin/pull/6901. I opened a bug already for the request value units and large numbers https://github.com/openshift/origin/issues/8391.

The web console should prevent you from setting a value that won't work, but the CLI won't block you from it.

> 200Mi is valid, because in limitrange, memory requires at least 150Mi. See 'Additional Info'

It won't be valid for the request value that's generated (60% of 200 is 120Mi, which is less than 150Mi).

Comment 6 Dan Mace 2016-04-15 14:05:49 UTC

(In reply to Samuel Padgett from comment #5)
> Dan, the request is being set automatically from the limit using the
> admission controller Luke wrote
> https://github.com/openshift/origin/pull/6901. I opened a bug already for
> the request value units and large numbers
> https://github.com/openshift/origin/issues/8391.
> 
> The web console should prevent you from setting a value that won't work, but
> the CLI won't block you from it.
> 
> > 200Mi is valid, because in limitrange, memory requires at least 150Mi. See 'Additional Info'
> 
> It won't be valid for the request value that's generated (60% of 200 is
> 120Mi, which is less than 150Mi).

So the admission controller is ultimate allowed to persist a resource request which violates the minimum? In any case, the system is right: the requested limit violates the minimum. Whether the CLI or admission control prevents it, origin is allowing this state to occur, and is then correctly reporting on the invalid state.

Still not sure how this is an Online issue. Wouldn't this be the concern of the admission controller and quota validation logic and/or the CLI?

Comment 7 Dan Mace 2016-04-15 14:20:26 UTC

I'm going to go ahead and close this as NOTABUG. It's simply a poor user experience. The way the system is limiting/mutating the requested limit means that the system might actually create pods which violate the configured minimums, and there's currently no way to report that nicely to the user via the CLI (at least, in a synchronous way). The CLI is also not indicating to the user that their request will be modified, so there's an element of surprise.

The minimums are still correctly enforced, and the reporting after the fact is still accurate.

This is a problem in Origin and is not specific to Online. A followup GH issue may be warranted to try and address the UX concerns.

Comment 8 Luke Meyer 2016-04-15 18:57:46 UTC

@Dan

Agreed that the UX isn't great when creating a pod template that will violate the minimums.

The issue is that the admission controller only operates on pods. Not pod templates, which only become pods at some later time when a controller creates them from the template.

The admission controller could be updated to operate on anything that contains a pod template, but that would be complicated. BTW I'm pretty sure this problem already existed with the LimitRanger that also only operates on pods, it's just that nothing was adjusting minimums down automatically before. Technically the only reliable way to determine from the pod template that it will create an invalid pod would be instantiate a pod from it and run it through the admission controller chain.

Comment 9 Xingxing Xia 2016-04-18 03:03:57 UTC

Thought the override percent is only for Online, sorry for the product trouble.

@Samuel
> But the problems is user prompts on website "The amount of Memory the
> container is limited to use. Memory 150 MiB min to 1 GiB max ". Users don't
> know the calculating ratio and process. What users concern are like this "ok
> you said 150MiB~1GiB and I input 200MiB, why pod can't run? The error
> message isn't clear to me and finally what number can work?" IMO, Should we
> update that value on website or adjust numbers of limitrange?
This is still a problem for web console when ClusterResourceOverride is used. Would this be fixed? Thanks.

Comment 10 Samuel Padgett 2016-04-20 14:43:00 UTC

Looks like the web console does not adjust the min value for the request/limit ratio. We should probably make that change to improve the user experience, although the current behavior is working as intended.

Comment 11 Xingxing Xia 2016-04-27 08:27:08 UTC

@Samuel
Seems not see there is related github issue/pr tracking comment 9/10. Convert this bug just to track it.

Comment 12 Xingxing Xia 2016-05-06 01:59:08 UTC

Though the problem belongs to Origin, Online is most impacted because actually ClusterResourceOverride is mainly applied in Online. Quite many people met this problem, were confused and reported the same problem, such as other bugs: bug 1331816, bug 1333029, Bug 1333079

Comment 13 Samuel Padgett 2016-05-06 16:00:47 UTC

https://github.com/openshift/origin/pull/8775

Comment 15 Xingxing Xia 2016-05-10 11:10:10 UTC

@Samuel
Verified in Origin console:
1 With master started with configuration as in comment 2, create project, limitrange and dc:
$ oc new-project xxia-proj
$ oc create -f limitrange_online.yaml -n xxia-proj --config aws/admin.kubeconfig
$ oc run origin --image=openshift/origin --command=true -- sleep 30d
2. Login to web console, click Browse --> Deployments --> origin --> Actions --> Set Resource Limits
The page will show correct range: Memory 250 MiB min to 1 GiB max

Hi, all, I have the following thinking:
A. In terms of this range display in web, the problem is fixed. However, in another page - the Settings page, will the display below still leave confusion?
Resource  type 	    Min 	Max 	Default Request	Default Limit 	Max Limit/Request Ratio 
---- snipped ----
Container Memory    150 MiB     1 GiB 	307 MiB 	512 MiB 	—

From user perspective, he may think:"Why the Min here is 150MiB but the min there is 250MiB?"


B. Seemed the CLI confusion in bug 1331816 is untouched in https://github.com/openshift/origin/pull/8775 and how will that bug be treated?

Thanks!

Comment 16 Weihua Meng 2016-05-19 07:16:47 UTC

I am quite curious about the ratio. memory request/limit is 60%, why cpu request/limit is 6%? 
Is it reasonable?
Is it too small?
typo?
  
          cpuRequestToLimitPercent: 6
          memoryRequestToLimitPercent: 60

Comment 17 Samuel Padgett 2016-05-25 11:44:36 UTC

Marking ON_QA to verify the web console changes.

The limit range and CLI changes are potentially large and will require some thought. I don't think it's something we can address for 1.2.x.

Comment 18 Xingxing Xia 2016-05-26 09:01:21 UTC

Verified in Origin with:
openshift v1.3.0-alpha.0-669-g6e74aa4
kubernetes v1.3.0-alpha.1-331-g0522e63
etcd 2.3.0

The page will show correct range: Memory 250 MiB min to 1 GiB max. And will disable Save/Create and give error prompt when input is < 250MiB.

Comment 19 Xingxing Xia 2016-05-26 09:02:52 UTC

(In reply to Samuel Padgett from comment #17)
> The limit range and CLI changes are potentially large and will require some
> thought. I don't think it's something we can address for 1.2.x.
If there is github issue/PR tracking this right now, could you please paste here? Thanks

Comment 20 Xingxing Xia 2016-05-26 09:55:53 UTC

(In reply to Xingxing Xia from comment #18)
> The page will show correct range: Memory 250 MiB min to 1 GiB max. And will
> disable Save/Create and give error prompt when input is < 250MiB.

Oh, sorry, there is still a minor problem. When input is between 250~255MiB, the pod still could not be started due to CPU request:

$ oc describe rc dctest-3
Name:		dctest-3
......
Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  2m		29s		7	{replication-controller }			Warning		FailedCreate	Error creating: pods "dctest-3-" is forbidden: [Minimum cpu usage per Container is 30m, but request is 29m., Minimum cpu usage per Container is 30m, but request is 29m.]

Only when input >= 256Mi, the pod could be running. Have mistake somewhere in the PR's calculation?

Comment 25 Abhishek Gupta 2016-06-08 17:21:25 UTC

Will move it to ON_QA once the fix is applied to INT/STG.

Comment 26 Abhishek Gupta 2016-06-14 18:22:06 UTC

This fix has now been included in DevPreview INT - can you please test?

Comment 27 Xingxing Xia 2016-06-15 06:53:30 UTC

Yes, the fix is included in INT. Move to VERIFIED.