Description of problem: When create a deployment with memory requests and scale it to a number, will got "OutOfmemory", and got following logs from kubelet: Sep 17 09:39:29 share-0916c-8vp8z-worker-rdtdw hyperkube[1264]: I0917 09:39:29.406474 1264 predicate.go:136] Predicate failed on Pod: test-56cf6cdb48-5nlrn_default(0b067e0c-d92f-11e9-89a5-fa163eb3bdb0), for reason: Node didn't have enough resource: memory, requested: 2 147483648, used: 15223226368, capacity: 16185528320 But the node should has enough memory for the pod. ➜ ~ oc describe nodes -l node-role.kubernetes.io/worker= | grep -i -A 7 allocate Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 570m (7%) 0 (0%) memory 11446Mi (74%) 512Mi (3%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-cinder 0 0 -- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2150m (28%) 700m (9%) memory 11369Mi (73%) 687Mi (4%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-cinder 0 0 Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-09-16-151155 How reproducible: unknown Steps to Reproduce: 1. Create deployment with spec.containers[*].resources.requests.memory 2. Scale up the deployment to over the allocatable memory 3. Actual results: Got pods with OutOfmemory status test-56cf6cdb48-k85kv 0/1 OutOfmemory 0 28m <none> share-0916c-8vp8z-worker-sc4nc <none> <none> test-56cf6cdb48-lmptq 0/1 OutOfmemory 0 28m <none> share-0916c-8vp8z-worker-sc4nc <none> <none> test-56cf6cdb48-lq4jz 0/1 OutOfmemory 0 28m <none> share-0916c-8vp8z-worker-sc4nc <none> <none> test-56cf6cdb48-m2rw8 0/1 OutOfmemory 0 28m <none> share-0916c-8vp8z-worker-sc4nc <none> <none> test-56cf6cdb48-m6cvf 0/1 OutOfmemory 0 28m <none> share-0916c-8vp8z-worker-sc4nc <none> <none> test-56cf6cdb48-mmzz9 0/1 OutOfmemory 0 28m <none> Expected results: The node should not fail to admit the pod. Additional info:
Found the root cause here that, coredns, keepalived and mdns-publisher pods take 3Gi additional memory space from the worker. so make the calculation failed. sh-4.4# cat /etc/kubernetes/manifests/* | grep -A 3 resources: resources: {} volumeMounts: - name: kubeconfig mountPath: "/etc/kubernetes/kubeconfig" -- resources: requests: cpu: 150m memory: 1Gi -- resources: {} volumeMounts: - name: resource-dir mountPath: "/config" -- resources: requests: cpu: 150m memory: 1Gi -- resources: {} volumeMounts: - name: kubeconfig mountPath: "/etc/kubernetes/kubeconfig" -- resources: requests: cpu: 150m memory: 1Gi
Found a workaround that after I create the ns manually, all things work well. oc adm new-project openshift-kni-infra
Verified on 4.2.0-0.nightly-2019-09-19-040356 ➜ ~ oc get pods -n openshift-openstack-infra NAME READY STATUS RESTARTS AGE coredns-share-0919a-qn4pn-master-0 1/1 Running 2 2m56s coredns-share-0919a-qn4pn-master-2 1/1 Running 3 118m coredns-share-0919a-qn4pn-worker-9mw25 1/1 Running 3 118m coredns-share-0919a-qn4pn-worker-gzt8w 0/1 Pending 0 2s coredns-share-0919a-qn4pn-worker-v7g8v 1/1 Running 3 2m56s haproxy-share-0919a-qn4pn-master-0 2/2 Running 0 2m56s haproxy-share-0919a-qn4pn-master-2 2/2 Running 2 118m keepalived-share-0919a-qn4pn-master-0 1/1 Running 0 2m56s keepalived-share-0919a-qn4pn-master-2 1/1 Running 1 118m keepalived-share-0919a-qn4pn-worker-9mw25 1/1 Running 1 118m keepalived-share-0919a-qn4pn-worker-gzt8w 0/1 Pending 0 2s keepalived-share-0919a-qn4pn-worker-v7g8v 1/1 Running 1 2m56s mdns-publisher-share-0919a-qn4pn-master-0 1/1 Running 0 2m56s mdns-publisher-share-0919a-qn4pn-master-2 1/1 Running 1 118m mdns-publisher-share-0919a-qn4pn-worker-9mw25 1/1 Running 1 118m mdns-publisher-share-0919a-qn4pn-worker-gzt8w 0/1 Pending 0 2s mdns-publisher-share-0919a-qn4pn-worker-v7g8v 1/1 Running 1 2m56s ➜ ~ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES h-1-265hh 0/1 Pending 0 17s <none> <none> <none> <none> h-1-4sjmn 0/1 Pending 0 17s <none> <none> <none> <none> h-1-678m8 0/1 Pending 0 17s <none> <none> <none> <none> h-1-8tsgh 0/1 Pending 0 17s <none> <none> <none> <none> h-1-9md7j 1/1 Running 0 17s 10.128.2.20 share-0919a-qn4pn-worker-v7g8v <none> <none> h-1-c957g 0/1 Pending 0 17s <none> <none> <none> <none> h-1-cj6mk 1/1 Running 0 17s 10.128.2.22 share-0919a-qn4pn-worker-v7g8v <none> <none> h-1-ctpd8 0/1 Pending 0 17s <none> <none> <none> <none> h-1-deploy 1/1 Running 0 29s 10.131.0.28 share-0919a-qn4pn-worker-9mw25 <none> <none> h-1-h7rzz 0/1 Pending 0 17s <none> <none> <none> <none> h-1-hvh9v 1/1 Running 0 17s 10.131.0.29 share-0919a-qn4pn-worker-9mw25 <none> <none> h-1-jvnjw 1/1 Running 0 17s 10.131.0.30 share-0919a-qn4pn-worker-9mw25 <none> <none> h-1-nwdlx 1/1 Running 0 17s 10.131.0.31 share-0919a-qn4pn-worker-9mw25 <none> <none> h-1-pkmmm 0/1 Pending 0 17s <none> <none> <none> <none> h-1-pppsl 1/1 Running 0 17s 10.128.2.23 share-0919a-qn4pn-worker-v7g8v <none> <none> h-1-px7ls 0/1 Pending 0 17s <none> <none> <none> <none> h-1-r7cbl 0/1 Pending 0 17s <none> <none> <none> <none> h-1-rn5tg 1/1 Running 0 17s 10.128.2.21 share-0919a-qn4pn-worker-v7g8v <none> <none> h-1-rz2x2 1/1 Running 0 17s 10.131.0.32 share-0919a-qn4pn-worker-9mw25 <none> <none> h-1-tnxlb 0/1 Pending 0 17s <none> <none> <none> <none> h-1-x8bxq 0/1 Pending 0 17s <none> <none> <none> <none>
Since this is targeted to 4.3.0, so need to wait 4.3 nightly build to have a try then.
Checked with 4.3.0-0.nightly-2019-10-15-021732, and the issue is fixed. ➜ ~ oc get pods -n openshift-openstack-infra -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-qe-wj-6lx69-master-0 1/1 Running 0 45m 192.168.0.29 qe-wj-6lx69-master-0 <none> <none> coredns-qe-wj-6lx69-master-1 1/1 Running 0 45m 192.168.0.15 qe-wj-6lx69-master-1 <none> <none> coredns-qe-wj-6lx69-master-2 1/1 Running 0 46m 192.168.0.20 qe-wj-6lx69-master-2 <none> <none> coredns-qe-wj-6lx69-worker-64svj 1/1 Running 0 29m 192.168.0.35 qe-wj-6lx69-worker-64svj <none> <none> coredns-qe-wj-6lx69-worker-g7pvh 1/1 Running 0 37m 192.168.0.12 qe-wj-6lx69-worker-g7pvh <none> <none> coredns-qe-wj-6lx69-worker-hdgql 1/1 Running 0 37m 192.168.0.41 qe-wj-6lx69-worker-hdgql <none> <none> haproxy-qe-wj-6lx69-master-0 2/2 Running 0 45m 192.168.0.29 qe-wj-6lx69-master-0 <none> <none> haproxy-qe-wj-6lx69-master-1 2/2 Running 0 45m 192.168.0.15 qe-wj-6lx69-master-1 <none> <none> haproxy-qe-wj-6lx69-master-2 2/2 Running 0 45m 192.168.0.20 qe-wj-6lx69-master-2 <none> <none> keepalived-qe-wj-6lx69-master-0 1/1 Running 0 45m 192.168.0.29 qe-wj-6lx69-master-0 <none> <none> keepalived-qe-wj-6lx69-master-1 1/1 Running 0 45m 192.168.0.15 qe-wj-6lx69-master-1 <none> <none> keepalived-qe-wj-6lx69-master-2 1/1 Running 0 45m 192.168.0.20 qe-wj-6lx69-master-2 <none> <none> keepalived-qe-wj-6lx69-worker-64svj 1/1 Running 0 29m 192.168.0.35 qe-wj-6lx69-worker-64svj <none> <none> keepalived-qe-wj-6lx69-worker-g7pvh 1/1 Running 0 37m 192.168.0.12 qe-wj-6lx69-worker-g7pvh <none> <none> keepalived-qe-wj-6lx69-worker-hdgql 1/1 Running 0 37m 192.168.0.41 qe-wj-6lx69-worker-hdgql <none> <none> mdns-publisher-qe-wj-6lx69-master-0 1/1 Running 0 45m 192.168.0.29 qe-wj-6lx69-master-0 <none> <none> mdns-publisher-qe-wj-6lx69-master-1 1/1 Running 0 46m 192.168.0.15 qe-wj-6lx69-master-1 <none> <none> mdns-publisher-qe-wj-6lx69-master-2 1/1 Running 0 45m 192.168.0.20 qe-wj-6lx69-master-2 <none> <none> mdns-publisher-qe-wj-6lx69-worker-64svj 1/1 Running 0 29m 192.168.0.35 qe-wj-6lx69-worker-64svj <none> <none> mdns-publisher-qe-wj-6lx69-worker-g7pvh 1/1 Running 0 37m 192.168.0.12 qe-wj-6lx69-worker-g7pvh <none> <none> mdns-publisher-qe-wj-6lx69-worker-hdgql 1/1 Running 0 37m 192.168.0.41 qe-wj-6lx69-worker-hdgql <none> <none> ➜ ~ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES h-1-8vsk7 1/1 Running 0 40s 10.131.0.34 qe-wj-6lx69-worker-hdgql <none> <none> h-1-ctdnp 1/1 Running 0 40s 10.129.2.25 qe-wj-6lx69-worker-64svj <none> <none> h-1-deploy 1/1 Running 0 48s 10.129.2.24 qe-wj-6lx69-worker-64svj <none> <none> h-1-dkbzb 0/1 Pending 0 40s <none> <none> <none> <none> h-1-fhckn 1/1 Running 0 40s 10.128.2.27 qe-wj-6lx69-worker-g7pvh <none> <none> h-1-gxj98 1/1 Running 0 40s 10.128.2.28 qe-wj-6lx69-worker-g7pvh <none> <none> h-1-mhddx 1/1 Running 0 40s 10.131.0.35 qe-wj-6lx69-worker-hdgql <none> <none> h-1-njdrm 1/1 Running 0 40s 10.128.2.29 qe-wj-6lx69-worker-g7pvh <none> <none> h-1-w477k 1/1 Running 0 40s 10.129.2.27 qe-wj-6lx69-worker-64svj <none> <none> h-1-x27zn 0/1 Pending 0 40s <none> <none> <none> <none> h-1-z5vwf 1/1 Running 0 40s 10.129.2.26 qe-wj-6lx69-worker-64svj <none> <none> ➜ ~ oc version Client Version: v4.3.0 Server Version: 4.3.0-0.nightly-2019-10-15-021732 Kubernetes Version: v1.16.0-beta.2+a6ff814
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062