Bug 1324825
Summary: | Better to adjust min value shown in web console Set Resource Limits page when ClusterResourceOverride request/limit ratio is used | |||
---|---|---|---|---|
Product: | OpenShift Online | Reporter: | Xingxing Xia <xxia> | |
Component: | Management Console | Assignee: | Abhishek Gupta <abhgupta> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Yadan Pei <yapei> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 3.x | CC: | abhgupta, aos-bugs, dmcphers, jforrest, jokerman, lmeyer, mmccomas, pweil, qixuan.wang, spadgett, tiwillia, wmeng | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1439818 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 17:32:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Xingxing Xia
2016-04-07 11:34:42 UTC
I can reproduce this issue. It appears that setting a memory limit below 250Mi causes this issue. Anything higher works without issue: $ oc run hello9 --image=openshift/hello-openshift --generator=run-controller/v1 --requests='memory=160Mi' --limits='memory=249Mi' replicationcontroller "hello9" created $ oc describe rc hello9 Name: hello9 Namespace: timbo Image(s): openshift/hello-openshift Selector: run=hello9 Labels: run=hello9 Replicas: 0 current / 1 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed No volumes. Events: FirstSeen LastSeen Count From SubobjectPath Reason Message ───────── ──────── ───── ──── ───────────── ────── ─────── 3s 3s 2 {replication-controller } FailedCreate Error creating: pods "hello9-" is forbidden: [Minimum memory usage per Pod is 150Mi, but request is 156657255., Minimum memory usage per Container is 150Mi, but request is 156657254400m.] $ oc run hello10 --image=openshift/hello-openshift --generator=run-controller/v1 --requests='memory=160Mi' --limits='memory=250Mi' replicationcontroller "hello10" created $ oc describe rc hello10 Name: hello10 Namespace: timbo Image(s): openshift/hello-openshift Selector: run=hello10 Labels: run=hello10 Replicas: 1 current / 1 desired Pods Status: 0 Running / 1 Waiting / 0 Succeeded / 0 Failed No volumes. Events: FirstSeen LastSeen Count From SubobjectPath Reason Message ───────── ──────── ───── ──── ───────────── ────── ─────── 2s 2s 1 {replication-controller } SuccessfulCreate Created pod: hello10-bm973 Master-config.yaml has been set Override ratio as follows, ClusterResourceOverride: configuration: apiVersion: v1 kind: ClusterResourceOverrideConfig limitCPUToMemoryPercent: 200 cpuRequestToLimitPercent: 6 memoryRequestToLimitPercent: 60 And dev-preview-int has limitrange set as follows, # oc get limits NAME AGE resource-limits 5m # oc describe limits resource-limits Name: resource-limits Namespace: qwang2 Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio ---- -------- --- --- --------------- ------------- ----------------------- Pod memory 150Mi 1Gi - - - Pod cpu 30m 2 - - - Container cpu 30m 2 60m 1 - Container memory 150Mi 1Gi 307Mi 512Mi - request.memory=0.6*limit.memory, request.cpu=0.06*limit.cpu If we assume (request.memory=0.6*limit.memory) > (limitrange.Min=150Mi), limit.memory should be 250Mi at least. CPU is in the same way. So calculation in the error message is correct although it's not clear at a glance. But the problems is user prompts on website "The amount of Memory the container is limited to use. Memory 150 MiB min to 1 GiB max ". Users don't know the calculating ratio and process. What users concern are like this "ok you said 150MiB~1GiB and I input 200MiB, why pod can't run? The error message isn't clear to me and finally what number can work?" IMO, Should we update that value on website or adjust numbers of limitrange? Thanks Qixuan Wang. Indeed, web console suggestion confuses user. Steps to see web console suggestion: Create dc or rc. In web console project overview page, click "Browse" --> "Deployments". Click the created dc or rc --> "Actions" -- > "Set Resource Limits". Then the page displays suggestion above memory input box: "Memory 150 MiB min to 1 GiB max" User might be unclear about the ClusterResourceOverride and its mechanism. If he/she inputs value < 250Mi, he/she may not easily figure out why pod cannot run. Similar web console suggestion when new app from image. The "Setting" page also makes user prone to confusion. Jessica/Sam, Isn't this a presentation issue in both the Origin console and the Kube limit range describer? I'm not sure I understand how this can be an issue with Online specifically. Dan, the request is being set automatically from the limit using the admission controller Luke wrote https://github.com/openshift/origin/pull/6901. I opened a bug already for the request value units and large numbers https://github.com/openshift/origin/issues/8391. The web console should prevent you from setting a value that won't work, but the CLI won't block you from it. > 200Mi is valid, because in limitrange, memory requires at least 150Mi. See 'Additional Info' It won't be valid for the request value that's generated (60% of 200 is 120Mi, which is less than 150Mi). (In reply to Samuel Padgett from comment #5) > Dan, the request is being set automatically from the limit using the > admission controller Luke wrote > https://github.com/openshift/origin/pull/6901. I opened a bug already for > the request value units and large numbers > https://github.com/openshift/origin/issues/8391. > > The web console should prevent you from setting a value that won't work, but > the CLI won't block you from it. > > > 200Mi is valid, because in limitrange, memory requires at least 150Mi. See 'Additional Info' > > It won't be valid for the request value that's generated (60% of 200 is > 120Mi, which is less than 150Mi). So the admission controller is ultimate allowed to persist a resource request which violates the minimum? In any case, the system is right: the requested limit violates the minimum. Whether the CLI or admission control prevents it, origin is allowing this state to occur, and is then correctly reporting on the invalid state. Still not sure how this is an Online issue. Wouldn't this be the concern of the admission controller and quota validation logic and/or the CLI? I'm going to go ahead and close this as NOTABUG. It's simply a poor user experience. The way the system is limiting/mutating the requested limit means that the system might actually create pods which violate the configured minimums, and there's currently no way to report that nicely to the user via the CLI (at least, in a synchronous way). The CLI is also not indicating to the user that their request will be modified, so there's an element of surprise. The minimums are still correctly enforced, and the reporting after the fact is still accurate. This is a problem in Origin and is not specific to Online. A followup GH issue may be warranted to try and address the UX concerns. @Dan Agreed that the UX isn't great when creating a pod template that will violate the minimums. The issue is that the admission controller only operates on pods. Not pod templates, which only become pods at some later time when a controller creates them from the template. The admission controller could be updated to operate on anything that contains a pod template, but that would be complicated. BTW I'm pretty sure this problem already existed with the LimitRanger that also only operates on pods, it's just that nothing was adjusting minimums down automatically before. Technically the only reliable way to determine from the pod template that it will create an invalid pod would be instantiate a pod from it and run it through the admission controller chain. Thought the override percent is only for Online, sorry for the product trouble.
@Samuel
> But the problems is user prompts on website "The amount of Memory the
> container is limited to use. Memory 150 MiB min to 1 GiB max ". Users don't
> know the calculating ratio and process. What users concern are like this "ok
> you said 150MiB~1GiB and I input 200MiB, why pod can't run? The error
> message isn't clear to me and finally what number can work?" IMO, Should we
> update that value on website or adjust numbers of limitrange?
This is still a problem for web console when ClusterResourceOverride is used. Would this be fixed? Thanks.
Looks like the web console does not adjust the min value for the request/limit ratio. We should probably make that change to improve the user experience, although the current behavior is working as intended. @Samuel Seems not see there is related github issue/pr tracking comment 9/10. Convert this bug just to track it. Though the problem belongs to Origin, Online is most impacted because actually ClusterResourceOverride is mainly applied in Online. Quite many people met this problem, were confused and reported the same problem, such as other bugs: bug 1331816, bug 1333029, Bug 1333079 @Samuel Verified in Origin console: 1 With master started with configuration as in comment 2, create project, limitrange and dc: $ oc new-project xxia-proj $ oc create -f limitrange_online.yaml -n xxia-proj --config aws/admin.kubeconfig $ oc run origin --image=openshift/origin --command=true -- sleep 30d 2. Login to web console, click Browse --> Deployments --> origin --> Actions --> Set Resource Limits The page will show correct range: Memory 250 MiB min to 1 GiB max Hi, all, I have the following thinking: A. In terms of this range display in web, the problem is fixed. However, in another page - the Settings page, will the display below still leave confusion? Resource type Min Max Default Request Default Limit Max Limit/Request Ratio ---- snipped ---- Container Memory 150 MiB 1 GiB 307 MiB 512 MiB — From user perspective, he may think:"Why the Min here is 150MiB but the min there is 250MiB?" B. Seemed the CLI confusion in bug 1331816 is untouched in https://github.com/openshift/origin/pull/8775 and how will that bug be treated? Thanks! I am quite curious about the ratio. memory request/limit is 60%, why cpu request/limit is 6%? Is it reasonable? Is it too small? typo? cpuRequestToLimitPercent: 6 memoryRequestToLimitPercent: 60 Marking ON_QA to verify the web console changes. The limit range and CLI changes are potentially large and will require some thought. I don't think it's something we can address for 1.2.x. Verified in Origin with: openshift v1.3.0-alpha.0-669-g6e74aa4 kubernetes v1.3.0-alpha.1-331-g0522e63 etcd 2.3.0 The page will show correct range: Memory 250 MiB min to 1 GiB max. And will disable Save/Create and give error prompt when input is < 250MiB. (In reply to Samuel Padgett from comment #17) > The limit range and CLI changes are potentially large and will require some > thought. I don't think it's something we can address for 1.2.x. If there is github issue/PR tracking this right now, could you please paste here? Thanks (In reply to Xingxing Xia from comment #18) > The page will show correct range: Memory 250 MiB min to 1 GiB max. And will > disable Save/Create and give error prompt when input is < 250MiB. Oh, sorry, there is still a minor problem. When input is between 250~255MiB, the pod still could not be started due to CPU request: $ oc describe rc dctest-3 Name: dctest-3 ...... Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed No volumes. Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 29s 7 {replication-controller } Warning FailedCreate Error creating: pods "dctest-3-" is forbidden: [Minimum cpu usage per Container is 30m, but request is 29m., Minimum cpu usage per Container is 30m, but request is 29m.] Only when input >= 256Mi, the pod could be running. Have mistake somewhere in the PR's calculation? Will move it to ON_QA once the fix is applied to INT/STG. This fix has now been included in DevPreview INT - can you please test? Yes, the fix is included in INT. Move to VERIFIED. |