Bug 1320433

Summary: [infrastructure_public_284]Node reservation argument should not be negative or invalid value.
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: NodeAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: low Docs Contact:
Priority: medium    
Version: 3.2.0CC: agoldste, aos-bugs, jchaloup, jokerman, mmccomas, tdawson, wmeng
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:36:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description DeShuai Ma 2016-03-23 08:34:14 UTC
Description of problem:
Node reservation argument should not be negative or invalid value.

Version-Release number of selected component (if applicable):
openshift v3.2.0.6
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

How reproducible:
Always

Steps to Reproduce:
1.Set reservation with negative value.
kubeletArguments:
  system-reserved:
    - "cpu=-100,memory=-2G"
  kube-reserved:
    - "cpu=-100m,memory=-3G"
2.Restart node and check node status
  allocatable:
    cpu: 102100m
    memory: "8975499776"
    pods: "110"
  capacity:
    cpu: "2"
    memory: 3882324Ki
    pods: "110"

3.Set reservation with invalid value
kubeletArguments:
  system-reserved:
    - "cpu=20wx,memory=100Gm"
  kube-reserved:
    - "cpu=20wx,memory=100Gm"

4.Restart node and check node status
  allocatable:
    cpu: "0"
    memory: "0"
    pods: "110"
  capacity:
    cpu: "2"
    memory: 3882324Ki
    pods: "110"

5.Set reservation > node capacity
kubeletArguments:
  system-reserved:
    - "cpu=20,memory=100G"
  kube-reserved:
    - "cpu=20,memory=100G"
6.Restart node and check node status
  allocatable:
    cpu: "0"
    memory: "0"
    pods: "110"
  capacity:
    cpu: "2"
    memory: 3882324Ki
    pods: "110"

Actual results:
2.allocatable > capacity
4.allocatable = 0
6.allocatable = 0

Expected results:
2.Should restart node fail and tip the value should not be negative
allocatable should be never larger than capacity.
4.Should fail restart node, tip invalid value
6 should fail restart node(the result for step6 may reasonable,not confirm)

Additional info:

Comment 1 Jan Chaloupka 2016-03-23 13:44:31 UTC
Based on the regexp for ResourceQuantity, negative values are allowed. I believe there should be a check the resource is non-negative. Let me check the code again.

Comment 2 Jan Chaloupka 2016-03-23 13:56:30 UTC
It is a bug. The quantity is parsed but not checked for negativity. Once set, negative value is subtracted from capacity resulting in addition. Thus, allocatable can grow over node's actual capacity.

[1] https://github.com/kubernetes/kubernetes/commit/e2ffd007f7794f06b9a2f39d62280bd974d747c6

Comment 3 Jan Chaloupka 2016-03-23 14:25:37 UTC
For the values with invalid units (wx, Gm): if a unit is not recognized, quantity unit parser automatically returns error. So kubelet should fail with "unable to parse quantity's suffix".

The last usecase is correct. Allocated resource for both cpu and memory computed by the formula is negative. Thus set to 0.

Comment 4 Jan Chaloupka 2016-03-23 15:06:16 UTC
For the negative values [1]

[1] https://github.com/kubernetes/kubernetes/pull/23379

Comment 5 Jan Chaloupka 2016-03-29 10:50:27 UTC
Merged upstream

Comment 6 Jan Chaloupka 2016-03-30 12:54:35 UTC
Just create unit test for testing invalid resource quantity unit and the parse reports "unable to parse quantity's suffix".

go test -v k8s.io/kubernetes/cmd/kubelet/app
=== RUN   TestValueOfAllocatableResources
--- PASS: TestValueOfAllocatableResources (0.00s)
	server_test.go:39: err: "resource quantity for \"memory\" cannot be negative: -150G"
	server_test.go:39: err: "unable to parse quantity's suffix"
PASS
ok  	k8s.io/kubernetes/cmd/kubelet/app	0.020s

Comment 7 Jan Chaloupka 2016-03-30 13:11:16 UTC
Upstream PR here [1]

https://github.com/kubernetes/kubernetes/pull/23627

Comment 8 Jan Chaloupka 2016-03-30 13:15:34 UTC
> 3.Set reservation with invalid value
> kubeletArguments:
>  system-reserved:
>    - "cpu=20wx,memory=100Gm"
>  kube-reserved:
>    - "cpu=20wx,memory=100Gm"

go test -v k8s.io/kubernetes/cmd/kubelet/app
=== RUN   TestValueOfAllocatableResources
--- PASS: TestValueOfAllocatableResources (0.00s)
	server_test.go:47: Returned err: "resource quantity for \"memory\" cannot be negative: -150G"
	server_test.go:47: Returned err: "quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'"
PASS
ok  	k8s.io/kubernetes/cmd/kubelet/app	0.329s

Comment 9 Andy Goldstein 2016-04-14 13:45:04 UTC
Upstream PR has merged. Changing to POST

Comment 10 Jan Chaloupka 2016-06-07 18:12:00 UTC
Merged upstream

Comment 11 Jan Chaloupka 2016-06-24 07:42:54 UTC
Merged in the origin master [1]. What is the next step, close it as the current release?

[1] https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/k8s.io/kubernetes/cmd/kubelet/app/server.go#L919

Comment 12 DeShuai Ma 2016-06-24 09:25:38 UTC
(In reply to Jan Chaloupka from comment #11)
> Merged in the origin master [1]. What is the next step, close it as the
> current release?
> 
> [1]
> https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/k8s.io/
> kubernetes/cmd/kubelet/app/server.go#L919

you can set ON_QA, then QE will verify this bug.

Comment 13 Andy Goldstein 2016-06-24 11:31:43 UTC
This bug is against OSE, not Origin. Set it to MODIFIED and Productization will set it to ON_QA once there's a new puddle with the fix to be tested.

Comment 14 Troy Dawson 2016-09-01 15:34:40 UTC
This has been merged into ose and is in OSE v3.3.0.28 or newer.

Comment 16 Weihua Meng 2016-09-02 04:16:18 UTC
Verified.
openshift v3.3.0.28
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

results:
2. restart node fail
9月 01 23:54:19 qe-wmeng-master-1 atomic-openshift-node[52214]: F0901 23:54:19.260774   52214 start_node.go:126] resource quantity for "cpu" cannot be negative: -100m

4. restart node fail
9月 01 23:48:10 qe-wmeng-master-1 atomic-openshift-node[51413]: F0901 23:48:10.320714   51413 start_node.go:126] quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

6. restart node OK
Capacity:
 alpha.kubernetes.io/nvidia-gpu:	0
 cpu:					1
 memory:				3624520Ki
 pods:					110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:	0
 cpu:					0
 memory:				0
 pods:					110

Comment 18 errata-xmlrpc 2016-09-27 09:36:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933