Bug 1455420 - Error updating node status when set resource reservation larger then node capacity
Summary: Error updating node status when set resource reservation larger then node cap...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-25 05:51 UTC by Zhang Cheng
Modified: 2017-08-16 19:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-10 05:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Zhang Cheng 2017-05-25 05:51:49 UTC
Description of problem: Error updating node status when set resource reservation larger then node capacity

Version-Release number of selected component (if applicable):
openshift v3.6.84
kubernetes v1.6.1+5115d708d7

How reproducible:
Always

Steps to Reproduce:
1.  Set reservation value > [Node Capacity], and restart atomic-openshift-node service successfully.
kubeletArguments:
  system-reserved:
  - "cpu=200,memory=1000G"
  kube-reserved:
  - "cpu=200,memory=1000G"

2. Check node status by `oc get node` `and oc describe node`


Actual results:
2. Check node status by `oc get node` `and oc describe node`, found node NotReady and Allocatable value didn't change.
[root@host-8-175-81 ~]# oc get node host-8-175-189.host.centralci.eng.rdu2.redhat.com
NAME                                                STATUS     AGE       VERSION
host-8-175-189.host.centralci.eng.rdu2.redhat.com   NotReady   1h        v1.6.1+5115d708d7


Name:            host-8-175-189.host.centralci.eng.rdu2.redhat.com
Role:            
Labels:            beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=host-8-175-189.host.centralci.eng.rdu2.redhat.com
            registry=enabled
            role=node
            router=enabled
Annotations:        volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:            <none>
CreationTimestamp:    Wed, 24 May 2017 21:48:42 -0400
Phase:            
Conditions:
  Type            Status        LastHeartbeatTime            LastTransitionTime            Reason            Message
  ----            ------        -----------------            ------------------            ------            -------
  OutOfDisk         Unknown     Wed, 24 May 2017 23:13:19 -0400     Wed, 24 May 2017 23:14:01 -0400     NodeStatusUnknown     Kubelet stopped posting node status.
  MemoryPressure     Unknown     Wed, 24 May 2017 23:13:19 -0400     Wed, 24 May 2017 23:14:01 -0400     NodeStatusUnknown     Kubelet stopped posting node status.
  DiskPressure         Unknown     Wed, 24 May 2017 23:13:19 -0400     Wed, 24 May 2017 23:14:01 -0400     NodeStatusUnknown     Kubelet stopped posting node status.
  Ready         Unknown     Wed, 24 May 2017 23:13:19 -0400     Wed, 24 May 2017 23:14:01 -0400     NodeStatusUnknown     Kubelet stopped posting node status.
Addresses:        10.8.175.189,10.8.175.189,host-8-175-189.host.centralci.eng.rdu2.redhat.com
Capacity:
 cpu:        2
 memory:    3881920Ki
 pods:        250
Allocatable:
 cpu:        2
 memory:    3779520Ki
 pods:        250
System Info:
 Machine ID:            1754a4957d8a442c8d2362df57fa5626
 System UUID:            E9A93021-F312-4F49-B47D-6488A09656B8
 Boot ID:            479bd1fc-9ca4-4da7-92d4-4d143d437e0a
 Kernel Version:        3.10.0-514.10.2.el7.x86_64
 OS Image:            Red Hat Enterprise Linux Server 7.3 (Maipo)
 Operating System:        linux
 Architecture:            amd64
 Container Runtime Version:    docker://1.12.6
 Kubelet Version:        v1.6.1+5115d708d7
 Kube-Proxy Version:        v1.6.1+5115d708d7
ExternalID:            host-8-175-189.host.centralci.eng.rdu2.redhat.com
Non-terminated Pods:        (4 in total)
  Namespace            Name                    CPU Requests    CPU Limits    Memory Requests    Memory Limits
  ---------            ----                    ------------    ----------    ---------------    -------------
  default            docker-registry-1-5rftx            100m (5%)    0 (0%)        256Mi (6%)    0 (0%)
  default            router-1-gc006                100m (5%)    0 (0%)        256Mi (6%)    0 (0%)
  install-test            cakephp-mysql-example-1-q0crb        0 (0%)        0 (0%)        512Mi (13%)    512Mi (13%)
  install-test            mysql-1-zb4jr                0 (0%)        0 (0%)        512Mi (13%)    512Mi (13%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests    CPU Limits    Memory Requests    Memory Limits
  ------------    ----------    ---------------    -------------
  200m (10%)    0 (0%)        1536Mi (41%)    1Gi (27%)
Events:
  FirstSeen    LastSeen    Count    From                                SubObjectPath    Type        Reason                Message
  ---------    --------    -----    ----                                -------------    --------    ------                -------
  1h        1h        1    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Normal        Starting            Starting kubelet.
  1h        1h        1    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Warning        ImageGCFailed            unable to find data for container /
  1h        1h        2    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Normal        NodeHasSufficientDisk        Node host-8-175-189.host.centralci.eng.rdu2.redhat.com status is now: NodeHasSufficientDisk
  1h        1h        2    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Normal        NodeHasSufficientMemory        Node host-8-175-189.host.centralci.eng.rdu2.redhat.com status is now: NodeHasSufficientMemory
  1h        1h        2    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Normal        NodeHasNoDiskPressure        Node host-8-175-189.host.centralci.eng.rdu2.redhat.com status is now: NodeHasNoDiskPressure
  1h        1h        1    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Normal        NodeReady            Node host-8-175-189.host.centralci.eng.rdu2.redhat.com status is now: NodeReady
  58m        58m        1    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Normal        Starting            Starting kubelet.
  58m        58m        1    kubelet, host-8-175-189.host.centralci.eng.rdu2.redhat.com            Warning        ImageGCFailed            unable to find data for container /


Expected results: 
2. Check node status by `oc get node` `and oc describe node`, node should be ready status and both cpu and memory of allocatable should be "0".
Such as:
Capacity:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                    2
 memory:                3881920Ki
 pods:                    250
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                    0
 memory:                0
 pods:                    250 


addition info: 
1. Did comparing test on oc v3.5.5.15, can get expected results with above same reproduce steps. 
2. Below is the detail error in logs on OCP3.6:
May 25 01:39:19 host-8-175-189 journal: E0525 01:39:19.594371   22779 kubelet_node_status.go:357] Error updating node status, will retry: failed to patch status "{\"status\":{\"allocatable\":{\"cpu\":\"-398\",\"memory\":\"-1949345480Ki\"},\"conditions\":[{\"lastHeartbeatTime\":\"2017-05-25T05:39:19Z\",\"lastTransitionTime\":\"2017-05-25T05:39:19Z\",\"message\":\"kubelet has no disk pressure\",\"reason\":\"KubeletHasNoDiskPressure\",\"status\":\"False\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2017-05-25T05:39:19Z\",\"lastTransitionTime\":\"2017-05-25T05:39:19Z\",\"message\":\"kubelet has sufficient memory available\",\"reason\":\"KubeletHasSufficientMemory\",\"status\":\"False\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2017-05-25T05:39:19Z\",\"lastTransitionTime\":\"2017-05-25T05:39:19Z\",\"message\":\"kubelet has sufficient disk space available\",\"reason\":\"KubeletHasSufficientDisk\",\"status\":\"False\",\"type\":\"OutOfDisk\"},{\"lastHeartbeatTime\":\"2017-05-25T05:39:19Z\",\"lastTransitionTime\":\"2017-05-25T05:39:19Z\",\"message\":\"kubelet is posting ready status\",\"reason\":\"KubeletReady\",\"status\":\"True\",\"type\":\"Ready\"}]}}" for node "host-8-175-189.host.centralci.eng.rdu2.redhat.com": Node "host-8-175-189.host.centralci.eng.rdu2.redhat.com" is invalid: [status.allocatable.cpu: Invalid value: "-398": must be greater than or equal to 0, status.allocatable.memory: Invalid value: "-1949345480Ki": must be greater than or equal to 0]

Comment 1 Derek Carr 2017-05-26 15:58:20 UTC
Upstream PR:
https://github.com/kubernetes/kubernetes/pull/46516

Comment 2 Derek Carr 2017-05-26 17:46:39 UTC
Origin PR:
https://github.com/openshift/origin/pull/14379

Comment 7 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.