Bug 1771016
| Summary: | Pod fails to start because of Node didn't have enough resource but UI console shows it has enough | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Filip Brychta <fbrychta> | ||||
| Component: | Node | Assignee: | Ryan Phillips <rphillips> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Sunil Choudhary <schoudha> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.2.z | CC: | alegrand, anpicker, aos-bugs, erooth, jokerman, kakkoyun, lcosic, mloibl, pkrupa, surbania | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.4.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-02-25 15:48:49 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
I don't have original environment but I reproduced it in new env:
Events on pod which failed to start:
Generated from default-scheduler
Successfully assigned bookinfo2/reviews-v3-6595c9dcb-8lr9r to fbr-42-s-c2nmq-worker-7km6z
Generated from kubelet on fbr-42-s-c2nmq-worker-7km6z
Node didn't have enough resource: memory, requested: 134217728, used: 7622098944, capacity: 7730569216
Node stats:
oc adm top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
fbr-42-s-c2nmq-master-0 1388m 18% 4031Mi 26%
fbr-42-s-c2nmq-master-1 697m 9% 2761Mi 17%
fbr-42-s-c2nmq-master-2 1207m 16% 4302Mi 27%
fbr-42-s-c2nmq-worker-7km6z 1006m 28% 3433Mi 46%
fbr-42-s-c2nmq-worker-hw4d4 1138m 32% 3374Mi 45%
fbr-42-s-c2nmq-worker-pt9p7 1497m 42% 3782Mi 51%
Node details:
oc describe node fbr-42-s-c2nmq-worker-7km6z
Name: fbr-42-s-c2nmq-worker-7km6z
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=ci.w1.large
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=regionOne
failure-domain.beta.kubernetes.io/zone=nova
kubernetes.io/arch=amd64
kubernetes.io/hostname=fbr-42-s-c2nmq-worker-7km6z
kubernetes.io/os=linux
node-role.kubernetes.io/worker=
node.openshift.io/os_id=rhcos
Annotations: machine.openshift.io/machine: openshift-machine-api/fbr-42-s-c2nmq-worker-7km6z
machineconfiguration.openshift.io/currentConfig: rendered-worker-7d0c404aee63b69d895dd1bf28a8cda7
machineconfiguration.openshift.io/desiredConfig: rendered-worker-7d0c404aee63b69d895dd1bf28a8cda7
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 12 Nov 2019 09:07:39 +0100
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:28:40 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:28:40 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:28:40 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:29:00 +0100 KubeletReady kubelet is posting ready status
Addresses:
Hostname: fbr-42-s-c2nmq-worker-7km6z
InternalIP: 192.168.0.35
Capacity:
attachable-volumes-cinder: 256
cpu: 4
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8163784Ki
pods: 250
Allocatable:
attachable-volumes-cinder: 256
cpu: 3500m
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7549384Ki
pods: 250
System Info:
Machine ID: cc0990d2e47544e48514d5092cd824fd
System UUID: cc0990d2-e475-44e4-8514-d5092cd824fd
Boot ID: 1ab7a161-be81-4ee5-8b84-79fc176bf15b
Kernel Version: 4.18.0-80.11.2.el8_0.x86_64
OS Image: Red Hat Enterprise Linux CoreOS 42.80.20191022.0 (Ootpa)
Operating System: linux
Architecture: amd64
Container Runtime Version: cri-o://1.14.11-0.23.dev.rhaos4.2.gitc41de67.el8
Kubelet Version: v1.14.6+7e13ab9a7
Kube-Proxy Version: v1.14.6+7e13ab9a7
ProviderID: openstack://cc0990d2-e475-44e4-8514-d5092cd824fd
Non-terminated Pods: (28 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
bookinfo reviews-v3-6595c9dcb-j99fp 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 11m
bookinfo2 details-v1-5b6d97f647-m7bfc 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 6m21s
bookinfo2 reviews-v1-5bb5b76576-scswl 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 6m20s
bookinfo2 reviews-v3-6595c9dcb-87wlp 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 1s
istio-operator istio-node-p45k8 10m (0%) 0 (0%) 100Mi (1%) 0 (0%) 24m
istio-system 3scale-istio-adapter-585bbcb595-6h8zx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20m
istio-system istio-ingressgateway-8657d8cfff-tbfw9 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 21m
istio-system istio-pilot-8d85c5ddd-c76m4 20m (0%) 0 (0%) 256Mi (3%) 0 (0%) 22m
istio-system istio-pilot-8d85c5ddd-sr4wl 20m (0%) 0 (0%) 256Mi (3%) 0 (0%) 2s
istio-system istio-sidecar-injector-bb8b5554b-26k6x 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 21m
kiali-test-mesh-operator kiali-test-mesh-operator-69c5b8bb8-fqpmq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12m
openshift-cluster-node-tuning-operator tuned-pl72p 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 93m
openshift-console downloads-df59f64db-h97lw 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 97m
openshift-dns dns-default-dcqlq 110m (3%) 0 (0%) 70Mi (0%) 512Mi (6%) 119m
openshift-image-registry image-registry-69bcb5c874-vxfbw 100m (2%) 0 (0%) 256Mi (3%) 0 (0%) 97m
openshift-image-registry node-ca-2xw44 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 106m
openshift-ingress router-default-64f68cd7b7-7qf4v 100m (2%) 0 (0%) 256Mi (3%) 0 (0%) 97m
openshift-machine-config-operator machine-config-daemon-v8829 20m (0%) 0 (0%) 50Mi (0%) 0 (0%) 119m
openshift-monitoring alertmanager-main-1 100m (2%) 100m (2%) 225Mi (3%) 25Mi (0%) 97m
openshift-monitoring grafana-69f4f95645-gwgt4 100m (2%) 0 (0%) 100Mi (1%) 0 (0%) 97m
openshift-monitoring node-exporter-tbdjt 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 106m
openshift-monitoring openshift-state-metrics-7f4bdfbdf9-wv9xl 120m (3%) 0 (0%) 190Mi (2%) 0 (0%) 97m
openshift-monitoring prometheus-adapter-5668d4848f-7gdck 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 97m
openshift-monitoring prometheus-k8s-1 430m (12%) 200m (5%) 1134Mi (15%) 50Mi (0%) 97m
openshift-monitoring telemeter-client-7bf667c5-g2hg4 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 97m
openshift-multus multus-xtthk 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 119m
openshift-sdn ovs-w964g 200m (5%) 0 (0%) 400Mi (5%) 0 (0%) 119m
openshift-sdn sdn-lbbj4 100m (2%) 0 (0%) 200Mi (2%) 0 (0%) 119m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1560m (44%) 300m (8%)
memory 4581Mi (62%) 587Mi (7%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-cinder 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeNotSchedulable 101m kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeNotSchedulable
Normal Starting 98m kubelet, fbr-42-s-c2nmq-worker-7km6z Starting kubelet.
Normal NodeAllocatableEnforced 98m kubelet, fbr-42-s-c2nmq-worker-7km6z Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 98m (x8 over 98m) kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 98m (x8 over 98m) kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 98m (x7 over 98m) kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeHasSufficientPID
Are you sure the workers have 8 GB and have 4 CPUs? The screenshot attached shows the node with 4 GB and 2 CPUs. The filesystem appears to be 8 GB as well, which would be extremely tiny for openshift 4. Yes, I'm sure that the VM for worker has 8GB of memory and 4 VCPUs. It's visible in oc describe node fbr-42-s-c2nmq-worker-7km6z in comment2 too: memory: 8163784Ki cpu: 4 I guess that the highest values for y-axis in graphs on attached screen shot do NOT show maximal available value but are relative to current consumed value. e.g. Network out graph shows 800KBps which is definitely NOT the max possible value for the network. The root of this bug is that UI graph for Memory usage shows only ~3.5GB of memory is consumed. But pod is failing to start with "Node didn't have enough resource: memory, requested: 134217728, used: 7622098944, capacity: 7730569216 " which is basically saying that already ~7.6GB of memory is used on the node. This ticket is likely a duplicate for another fix going into the tree: https://github.com/openshift/machine-config-operator/pull/1459 Duplicate BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1801826 *** This bug has been marked as a duplicate of bug 1801826 *** |
Created attachment 1634963 [details] node overview screenshot Description of problem: Pod is failing to start with following err: Successfully assigned istio-system/istio-policy-659bc7b88c-4cs4l to fbr-42-m-6psqn-worker-24sfj Pistio-policy-659bc7b88c-4cs4l Node didn't have enough resource: memory, requested: 268435456, used: 7632243200, capacity: 7730569216 It says: used: 7632243200 But UI console -> Compute -> Nodes -> fbr-42-m-6psqn-worker-24sfj shows that the consumed memory on host is only ~3.5 GB. See attached screenshot. OC also shows it's not consuming 7632243200: oc adm top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% fbr-42-m-6psqn-master-0 1660m 22% 4961Mi 32% fbr-42-m-6psqn-master-1 1109m 14% 4438Mi 28% fbr-42-m-6psqn-master-2 779m 10% 3183Mi 20% fbr-42-m-6psqn-worker-24sfj 1962m 56% 3648Mi 49% fbr-42-m-6psqn-worker-8dm7p 753m 21% 3379Mi 45% fbr-42-m-6psqn-worker-g6n65 2078m 59% 4434Mi 60% fbr-42-m-6psqn-worker-js4fm 591m 16% 3228Mi 43% Version-Release number of selected component (if applicable): OCP 4.2.2 How reproducible: Always Steps to Reproduce: 1. install OCP 4.2 on OpenStack with 3 masters (16GB, 8 CPUs), 4 workers (8GB 4 CPUs) 2. install OpenShift Service mesh with two control planes Actual results: Pods failing to start because of: Node didn't have enough resource: memory, requested: 268435456, used: 7632243200, capacity: 7730569216 But UI shows that there is enough of free memory on given host. Expected results: UI should show correct memory usage on hosts. Additional info: Not sure if the error message is incorrect or if the UI shows incorrect values.