Created attachment 1634963 [details] node overview screenshot Description of problem: Pod is failing to start with following err: Successfully assigned istio-system/istio-policy-659bc7b88c-4cs4l to fbr-42-m-6psqn-worker-24sfj Pistio-policy-659bc7b88c-4cs4l Node didn't have enough resource: memory, requested: 268435456, used: 7632243200, capacity: 7730569216 It says: used: 7632243200 But UI console -> Compute -> Nodes -> fbr-42-m-6psqn-worker-24sfj shows that the consumed memory on host is only ~3.5 GB. See attached screenshot. OC also shows it's not consuming 7632243200: oc adm top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% fbr-42-m-6psqn-master-0 1660m 22% 4961Mi 32% fbr-42-m-6psqn-master-1 1109m 14% 4438Mi 28% fbr-42-m-6psqn-master-2 779m 10% 3183Mi 20% fbr-42-m-6psqn-worker-24sfj 1962m 56% 3648Mi 49% fbr-42-m-6psqn-worker-8dm7p 753m 21% 3379Mi 45% fbr-42-m-6psqn-worker-g6n65 2078m 59% 4434Mi 60% fbr-42-m-6psqn-worker-js4fm 591m 16% 3228Mi 43% Version-Release number of selected component (if applicable): OCP 4.2.2 How reproducible: Always Steps to Reproduce: 1. install OCP 4.2 on OpenStack with 3 masters (16GB, 8 CPUs), 4 workers (8GB 4 CPUs) 2. install OpenShift Service mesh with two control planes Actual results: Pods failing to start because of: Node didn't have enough resource: memory, requested: 268435456, used: 7632243200, capacity: 7730569216 But UI shows that there is enough of free memory on given host. Expected results: UI should show correct memory usage on hosts. Additional info: Not sure if the error message is incorrect or if the UI shows incorrect values.
I don't have original environment but I reproduced it in new env: Events on pod which failed to start: Generated from default-scheduler Successfully assigned bookinfo2/reviews-v3-6595c9dcb-8lr9r to fbr-42-s-c2nmq-worker-7km6z Generated from kubelet on fbr-42-s-c2nmq-worker-7km6z Node didn't have enough resource: memory, requested: 134217728, used: 7622098944, capacity: 7730569216 Node stats: oc adm top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% fbr-42-s-c2nmq-master-0 1388m 18% 4031Mi 26% fbr-42-s-c2nmq-master-1 697m 9% 2761Mi 17% fbr-42-s-c2nmq-master-2 1207m 16% 4302Mi 27% fbr-42-s-c2nmq-worker-7km6z 1006m 28% 3433Mi 46% fbr-42-s-c2nmq-worker-hw4d4 1138m 32% 3374Mi 45% fbr-42-s-c2nmq-worker-pt9p7 1497m 42% 3782Mi 51% Node details: oc describe node fbr-42-s-c2nmq-worker-7km6z Name: fbr-42-s-c2nmq-worker-7km6z Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=ci.w1.large beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=regionOne failure-domain.beta.kubernetes.io/zone=nova kubernetes.io/arch=amd64 kubernetes.io/hostname=fbr-42-s-c2nmq-worker-7km6z kubernetes.io/os=linux node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos Annotations: machine.openshift.io/machine: openshift-machine-api/fbr-42-s-c2nmq-worker-7km6z machineconfiguration.openshift.io/currentConfig: rendered-worker-7d0c404aee63b69d895dd1bf28a8cda7 machineconfiguration.openshift.io/desiredConfig: rendered-worker-7d0c404aee63b69d895dd1bf28a8cda7 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 12 Nov 2019 09:07:39 +0100 Taints: <none> Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:28:40 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:28:40 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:28:40 +0100 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 12 Nov 2019 11:06:34 +0100 Tue, 12 Nov 2019 09:29:00 +0100 KubeletReady kubelet is posting ready status Addresses: Hostname: fbr-42-s-c2nmq-worker-7km6z InternalIP: 192.168.0.35 Capacity: attachable-volumes-cinder: 256 cpu: 4 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8163784Ki pods: 250 Allocatable: attachable-volumes-cinder: 256 cpu: 3500m hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7549384Ki pods: 250 System Info: Machine ID: cc0990d2e47544e48514d5092cd824fd System UUID: cc0990d2-e475-44e4-8514-d5092cd824fd Boot ID: 1ab7a161-be81-4ee5-8b84-79fc176bf15b Kernel Version: 4.18.0-80.11.2.el8_0.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 42.80.20191022.0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.14.11-0.23.dev.rhaos4.2.gitc41de67.el8 Kubelet Version: v1.14.6+7e13ab9a7 Kube-Proxy Version: v1.14.6+7e13ab9a7 ProviderID: openstack://cc0990d2-e475-44e4-8514-d5092cd824fd Non-terminated Pods: (28 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- bookinfo reviews-v3-6595c9dcb-j99fp 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 11m bookinfo2 details-v1-5b6d97f647-m7bfc 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 6m21s bookinfo2 reviews-v1-5bb5b76576-scswl 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 6m20s bookinfo2 reviews-v3-6595c9dcb-87wlp 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 1s istio-operator istio-node-p45k8 10m (0%) 0 (0%) 100Mi (1%) 0 (0%) 24m istio-system 3scale-istio-adapter-585bbcb595-6h8zx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20m istio-system istio-ingressgateway-8657d8cfff-tbfw9 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 21m istio-system istio-pilot-8d85c5ddd-c76m4 20m (0%) 0 (0%) 256Mi (3%) 0 (0%) 22m istio-system istio-pilot-8d85c5ddd-sr4wl 20m (0%) 0 (0%) 256Mi (3%) 0 (0%) 2s istio-system istio-sidecar-injector-bb8b5554b-26k6x 10m (0%) 0 (0%) 128Mi (1%) 0 (0%) 21m kiali-test-mesh-operator kiali-test-mesh-operator-69c5b8bb8-fqpmq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12m openshift-cluster-node-tuning-operator tuned-pl72p 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 93m openshift-console downloads-df59f64db-h97lw 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 97m openshift-dns dns-default-dcqlq 110m (3%) 0 (0%) 70Mi (0%) 512Mi (6%) 119m openshift-image-registry image-registry-69bcb5c874-vxfbw 100m (2%) 0 (0%) 256Mi (3%) 0 (0%) 97m openshift-image-registry node-ca-2xw44 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 106m openshift-ingress router-default-64f68cd7b7-7qf4v 100m (2%) 0 (0%) 256Mi (3%) 0 (0%) 97m openshift-machine-config-operator machine-config-daemon-v8829 20m (0%) 0 (0%) 50Mi (0%) 0 (0%) 119m openshift-monitoring alertmanager-main-1 100m (2%) 100m (2%) 225Mi (3%) 25Mi (0%) 97m openshift-monitoring grafana-69f4f95645-gwgt4 100m (2%) 0 (0%) 100Mi (1%) 0 (0%) 97m openshift-monitoring node-exporter-tbdjt 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 106m openshift-monitoring openshift-state-metrics-7f4bdfbdf9-wv9xl 120m (3%) 0 (0%) 190Mi (2%) 0 (0%) 97m openshift-monitoring prometheus-adapter-5668d4848f-7gdck 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 97m openshift-monitoring prometheus-k8s-1 430m (12%) 200m (5%) 1134Mi (15%) 50Mi (0%) 97m openshift-monitoring telemeter-client-7bf667c5-g2hg4 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 97m openshift-multus multus-xtthk 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 119m openshift-sdn ovs-w964g 200m (5%) 0 (0%) 400Mi (5%) 0 (0%) 119m openshift-sdn sdn-lbbj4 100m (2%) 0 (0%) 200Mi (2%) 0 (0%) 119m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1560m (44%) 300m (8%) memory 4581Mi (62%) 587Mi (7%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-cinder 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 101m kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeNotSchedulable Normal Starting 98m kubelet, fbr-42-s-c2nmq-worker-7km6z Starting kubelet. Normal NodeAllocatableEnforced 98m kubelet, fbr-42-s-c2nmq-worker-7km6z Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 98m (x8 over 98m) kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 98m (x8 over 98m) kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 98m (x7 over 98m) kubelet, fbr-42-s-c2nmq-worker-7km6z Node fbr-42-s-c2nmq-worker-7km6z status is now: NodeHasSufficientPID
Are you sure the workers have 8 GB and have 4 CPUs? The screenshot attached shows the node with 4 GB and 2 CPUs.
The filesystem appears to be 8 GB as well, which would be extremely tiny for openshift 4.
Yes, I'm sure that the VM for worker has 8GB of memory and 4 VCPUs. It's visible in oc describe node fbr-42-s-c2nmq-worker-7km6z in comment2 too: memory: 8163784Ki cpu: 4 I guess that the highest values for y-axis in graphs on attached screen shot do NOT show maximal available value but are relative to current consumed value. e.g. Network out graph shows 800KBps which is definitely NOT the max possible value for the network. The root of this bug is that UI graph for Memory usage shows only ~3.5GB of memory is consumed. But pod is failing to start with "Node didn't have enough resource: memory, requested: 134217728, used: 7622098944, capacity: 7730569216 " which is basically saying that already ~7.6GB of memory is used on the node.
This ticket is likely a duplicate for another fix going into the tree: https://github.com/openshift/machine-config-operator/pull/1459 Duplicate BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1801826
*** This bug has been marked as a duplicate of bug 1801826 ***