1364403 – [platformmanagement_public_713] Should give proper message and prevent further creation when resources usage exceed cluster quota

Bug 1364403 - [platformmanagement_public_713] Should give proper message and prevent further creation when resources usage exceed cluster quota

Summary: [platformmanagement_public_713] Should give proper message and prevent furthe...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	David Eads
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-05 09:43 UTC by Qixuan Wang
Modified:	2017-03-08 18:26 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-11 12:17:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Exceeded quota (154.01 KB, image/png) 2016-08-05 10:27 UTC, Qixuan Wang	no flags	Details
master config (5.21 KB, text/plain) 2016-08-08 10:48 UTC, Qixuan Wang	no flags	Details
ha-master-config.yaml (4.88 KB, text/plain) 2016-08-09 09:56 UTC, Qixuan Wang	no flags	Details
non-ha-master-config.yaml (4.44 KB, text/plain) 2016-08-09 09:56 UTC, Qixuan Wang	no flags	Details
ha-atomic-openshift-master-controllers.log (2.40 MB, text/x-vhdl) 2016-08-09 09:57 UTC, Qixuan Wang	no flags	Details
ha-atomic-openshift-master-api.log (470.57 KB, text/x-vhdl) 2016-08-09 09:57 UTC, Qixuan Wang	no flags	Details
View All

Description Qixuan Wang 2016-08-05 09:43:35 UTC

Description of problem:
Create resources exceed cluster quota Hard limit, there is no CLI warning, meanwhile, resources still can be created and counted.

Version-Release number of selected component (if applicable):
openshift v1.3.0-alpha.2+89b7193
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. 1. Create 2 projects 
# oc new-project project-a
# oc new-project project-b

2. Label projects
# oc label namespace project-a user=dev --config=./admin.kubeconfig  
# oc label namespace project-c user=qe --config=./admin.kubeconfig 

3. Create a clusterquota with label selector "user=dev"
# oc create clusterresourcequota crq --project-label-selector=user=dev --hard=pods=10 --hard=services=15 --hard secrets=10 --config=./admin.kubeconfig 

4. Check clusterquota via CLI and web console
# oc describe clusterresourcequota crq --config=./admin.kubeconfig

5. Create a secret in project-a and check secrets via CLI and web console
# oc secrets new mysecret-1 /root/.ssh/xxx

6. Create a secret in project-a and check secrets via CLI and web console again
# oc secrets new mysecret-2 /root/.ssh/xxx


Actual results:
4. [root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq --config=./admin.kubeconfigName:		crq
Namespace:	<none>
Created:	About an hour ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		0	10
secrets		9	10
services 	0	15

5. [root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq --config=./admin.kubeconfigName:		crq
Namespace:	<none>
Created:	About an hour ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		 0	10
secrets		 10	10
services	0	15
 
[root@dhcp-141-95 qwang]# oc get secrets
NAME                       TYPE                                  DATA      AGE
builder-dockercfg-dolu0    kubernetes.io/dockercfg               1         57m
builder-token-rfssn        kubernetes.io/service-account-token   3         57m
builder-token-tjej4        kubernetes.io/service-account-token   3         57m
default-dockercfg-uotsj    kubernetes.io/dockercfg               1         57m
default-token-ks826        kubernetes.io/service-account-token   3         57m
default-token-y6qou        kubernetes.io/service-account-token   3         57m
deployer-dockercfg-sibgt   kubernetes.io/dockercfg               1         57m
deployer-token-1i3rt       kubernetes.io/service-account-token   3         57m
deployer-token-bjljm       kubernetes.io/service-account-token   3         57m
mysecret-1                 Opaque                                1         31m

Here are 9 secrets by default. When secrets account reaches Hard=10, a warning "Quota limit reached"shows in web console 

6. The 11th secret create without any CLI warning 
[root@dhcp-141-95 qwang]# oc secrets new mysecret-2 /root/.ssh/xxx
secret/mysecret-2

[root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq --config=./admin.kubeconfigName:		crq
Namespace:	<none>
Created:	About an hour ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		0	10
secrets		11	10
services	0	15


Expected results:
It should warn that Quota limit reached and prevent further creation. 

Additional info:

Comment 1 Qixuan Wang 2016-08-05 10:27:11 UTC

Created attachment 1187825 [details]
Exceeded quota

Comment 2 Qixuan Wang 2016-08-05 10:36:19 UTC

Update: The problem is in OSE(openshift v3.3.0.14, kubernetes v1.3.0+57fb9ac, etcd 2.3.0+git)

On Origin(openshift v1.3.0-alpha.2+89b7193, kubernetes v1.3.0+507d3a7, etcd 2.3.0+git), the problem can't be reproduced. Origin has the correct warning:
Error from server: secrets "mysecret-2" is forbidden: Exceeded quota: crq, requested: secrets=1, used: secrets=10, limited: secrets=10

Comment 3 David Eads 2016-08-05 12:59:31 UTC

Are you running the OSE from config?  If so, can you provide the config?  It's possible to specify a different set of admission plugins and that can prevent new ones from taking affect.

Comment 4 Qixuan Wang 2016-08-08 10:41:24 UTC

This problem can't be reproduced in non-HA environment but exist in HA (2master+2infra_node+2node+3etcd). Attached master-config.yaml

Comment 5 Qixuan Wang 2016-08-08 10:48:27 UTC

Created attachment 1188618 [details]
master config

Comment 6 David Eads 2016-08-08 12:04:55 UTC

Ok, I suspect that you're using a different master-config.yaml in your HA and non-HA configuration.  In the one you linked, you're specifying:

```yaml
  admissionConfig:
    pluginOrderOverride:
      - NamespaceLifecycle
      - OriginPodNodeEnvironment
      - LimitRanger
      - ServiceAccount
      - SecurityContextConstraint
      - BuildDefaults
      - BuildOverrides
      - ResourceQuota
      - SCCExecRestrictions
      - AlwaysPullImages
```

That takes control of the admission chain.  You should be getting a warning like this in your log, "specified admission ordering is being phased out".  Because its being specified, you don't get new admission plugins including "ClusterResourceQuota".

You can add "ClusterResourceQuota", but you really shouldn't be specifying the chain at all.  Did you have to do it for some reason?  Was it set up that way automatically?

Comment 7 Qixuan Wang 2016-08-09 09:50:16 UTC

Yes QE's testing environment is setup by jenkins. There are "openshift_master_kube_admission_plugin_order" and "openshift_master_kube_admission_plugin_config" in "openshift_ansible_vars" options of HA environment config template but not in Non-HA config template.


Part of Jenkins log which is setup HA job:

#The following parameters is used by openshift-ansible
openshift_master_kube_admission_plugin_order=["NamespaceLifecycle","OriginPodNodeEnvironment","LimitRanger","ServiceAccount","SecurityContextConstraint","BuildDefaults","BuildOverrides","ResourceQuota","SCCExecRestrictions","AlwaysPullImages"]
openshift_master_kube_admission_plugin_config={"RunOnceDuration":{"configuration":{"apiVersion":"v1","kind":"RunOnceDurationConfig","activeDeadlineSecondsOverride":"3600"}},"ClusterResourceOverride":{"configuration":{"apiVersion":"v1","kind":"ClusterResourceOverrideConfig","limitCPUToMemoryPercent":"200","cpuRequestToLimitPercent":"6","memoryRequestToLimitPercent":"60"}},"PodNodeConstraints":{"configuration":{"apiVersion":"v1","kind":"PodNodeConstraintsConfig"}},"BuildOverrides":{"configuration":{"apiVersion":"v1","kind":"BuildOverridesConfig","forcePull":True}}}



Capture master log:

Aug 09 02:56:16 ip-172-18-14-59.ec2.internal atomic-openshift-master-controllers[13217]: W0809 02:56:16.506089   13217 start_master.go:272] kubernetesMasterConfig.admissionConfig.pluginOrderOverride: Invalid value: ["NamespaceLifecycle","OriginPodNodeEnvironment","LimitRanger","ServiceAccount","SecurityContextConstraint","BuildDefaults","BuildOverrides","ResourceQuota","SCCExecRestrictions","AlwaysPullImages"]: specified admission ordering is being phased out. Convert to DefaultAdmissionConfig in admissionConfig.pluginConfig.


I think perhaps QE wrote incomplete ansible variables. The log shows "Convert to DefaultAdmissionConfig in admissionConfig.pluginConfig". The "DefaultAdmissionConfig" should be the same with Non-HA master-config, but these above admission plugins are still added into master-config. Doesn't "convertion" happen? 

HA:

kubernetesMasterConfig:
  admissionConfig:
    pluginOrderOverride:
      - NamespaceLifecycle
      - OriginPodNodeEnvironment
      - LimitRanger
      - ServiceAccount
      - SecurityContextConstraint
      - BuildDefaults
      - BuildOverrides
      - ResourceQuota
      - SCCExecRestrictions
      - AlwaysPullImages
    pluginConfig:
      BuildOverrides:
        configuration:
          apiVersion: v1
          forcePull: true
          kind: BuildOverridesConfig
      ClusterResourceOverride:
        configuration:
          apiVersion: v1
          cpuRequestToLimitPercent: '6'
          kind: ClusterResourceOverrideConfig
          limitCPUToMemoryPercent: '200'
          memoryRequestToLimitPercent: '60'
      PodNodeConstraints:
        configuration:
          apiVersion: v1
          kind: PodNodeConstraintsConfig
      RunOnceDuration:
        configuration:
          activeDeadlineSecondsOverride: '3600'
          apiVersion: v1
          kind: RunOnceDurationConfig


Non-HA:

kubernetesMasterConfig:
  admissionConfig:
    pluginConfig:
      {}


Attached files, hope these help.

Comment 8 Qixuan Wang 2016-08-09 09:56:00 UTC

Created attachment 1189189 [details]
ha-master-config.yaml

Comment 9 Qixuan Wang 2016-08-09 09:56:33 UTC

Created attachment 1189190 [details]
non-ha-master-config.yaml

Comment 10 Qixuan Wang 2016-08-09 09:57:20 UTC

Created attachment 1189192 [details]
ha-atomic-openshift-master-controllers.log

Comment 11 Qixuan Wang 2016-08-09 09:57:51 UTC

Created attachment 1189193 [details]
ha-atomic-openshift-master-api.log

Comment 12 Qixuan Wang 2016-08-09 10:13:47 UTC

> "DefaultAdmissionConfig" should be the same with Non-HA master-config, but
> these above admission plugins are still added into master-config.

Sorry, please ignore "these above admission plugins are still added into master-config". I mean since it's an invalid configuration, the behavior should be the same with "DefaultAdmissionConfig", but it seems not convert to DefaultAdmissionConfig.

Comment 13 David Eads 2016-08-09 12:32:31 UTC

@Scott: are we encouraging people to set these admission values?

@Qixuan Wang: You need to either add `ClusterResourceQuota` to the bottom of your list or you need to stop specifying the values.  The current configuration is saying to *NOT* run the admission plugin that enforces quota.

Comment 14 Scott Dodson 2016-08-09 13:23:47 UTC

(In reply to Qixuan Wang from comment #7)
> Yes QE's testing environment is setup by jenkins. There are
> "openshift_master_kube_admission_plugin_order" and
> "openshift_master_kube_admission_plugin_config" in "openshift_ansible_vars"
> options of HA environment config template but not in Non-HA config template.

Ok, that's an installer bug we should fix.


(In reply to David Eads from comment #13)
> @Scott: are we encouraging people to set these admission values?

Encourage no, but we enable them to set admission plugin config. If they're shooting themselves not much we can do about that.

Comment 15 David Eads 2016-08-09 13:26:55 UTC

@scott: I want to remove that knob from the master-config in two releases.  What does it take to get there from here in ansible?

We're combining the admission chains and we're providing a different on/off mechanism.

Comment 16 Scott Dodson 2016-08-09 13:41:58 UTC

(In reply to David Eads from comment #15)
> @scott: I want to remove that knob from the master-config in two releases. 
> What does it take to get there from here in ansible?
> 
> We're combining the admission chains and we're providing a different on/off
> mechanism.

When the time comes, file an issue in openshift-ansible and link it to the origin PR that drops it from the config.

Comment 17 David Eads 2016-08-11 12:17:34 UTC

The ClusterResourceQuota admission plugin needs to be enabled.  This can be done by adding to the list or by not specifying the list.  Not specifying is preferred.

Comment 18 Qixuan Wang 2016-08-15 10:02:57 UTC

Adding "ClusterResourceQuota" instead of "ResourceQuota" can get expected result. Thanks.

Note You need to log in before you can comment on or make changes to this bug.