Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1846805

Summary: KubeletTooManyPods seems to take into account completed pods.
Product: OpenShift Container Platform Reporter: David Hernández Fernández <dahernan>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: medium    
Version: 4.4CC: abodhe, alegrand, anpicker, ChetRHosey, erooth, kakkoyun, ksathe, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---Keywords: Reopened
Target Release: 4.6.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Currently, kubelet_running_pod_count includes Completed pods, which is incorrect from the point of KubeTooManyPods alert. Since every pod needs to have container_memory_rss exposed, we can leverage it to find the actual number of pods running on a node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:06:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hernández Fernández 2020-06-14 15:11:53 UTC
########################################################################
Description of problem: The following alert is showing up:
KubeletTooManyPods alert “Kubelet 'worker.example.com' is running at 103.2% of its Pod capacity” while too many completed pods are still not deleted.  
########################################################################

$ oc get PrometheusRule prometheus-k8s-rules -n openshift-monitoring -o yaml | less
    - alert: KubeletTooManyPods
      annotations:
        message: Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage
          }} of its Pod capacity.
      expr: |
        max(max(kubelet_running_pod_count{job="kubelet"}) by(instance) * on(instance) group_left(node) kubelet_node_name{job="kubelet"}) by(node) / max(kube_node_status_capacity_pods{job="kube-state-metrics"}) by(node) > 0.95
      for: 15m
      labels:
        severity: warning

# oc get pods -A -o wide | grep worker03 | wc -l
251

# oc get pods -A -o wide | grep worker03 | grep -v  'Completed|Running' | wc -l
256

# oc get pods -A -o wide | grep worker03 | grep -v  'Running' | wc -l
201

# oc describe node worker03
...
Capacity:
  cpu:                4
  ephemeral-storage:  314020844Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32936388Ki
  pods:               250
Allocatable:
  cpu:                3500m
  ephemeral-storage:  289401609352
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32321988Ki
  pods:               250
...

########################################################################
Actual results: Completed pods are taken into account where those does not impact on the resources used.
########################################################################

#######################################################################
Expected results: Completed pods should not be taken into account.
#######################################################################

#######################################################################
Additional info: It does not affect to replicas scalability, it just seems confusing.
#######################################################################

Comment 1 Pawel Krupa 2020-06-15 09:20:43 UTC
This alert originates from kubernetes-mixin upstream, hence I created https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/442.

@David: let's discuss the issue there as it has ramifications on a bigger community.

Comment 9 Junqi Zhao 2020-08-07 13:38:07 UTC
check on one node, example: qe-anusaxen10-nhx4f-master-2
count by(node) ((kube_pod_status_phase{job="kube-state-metrics",phase="Running"} == 1) * on(instance,pod,namespace,cluster) group_left(node) topk by(instance,pod,namespace,cluster) (1, kube_pod_info{job="kube-state-metrics"})) / max by(node) (kube_node_status_capacity_pods{job="kube-state-metrics",node="qe-anusaxen10-nhx4f-master-2"} != 1)
Element 	Value
{node="qe-anusaxen10-nhx4f-master-2"}	0.112

this node can allocate 250 pods
kube_node_status_capacity_pods{job="kube-state-metrics",node="qe-anusaxen10-nhx4f-master-2"} != 1
Element 	Value
kube_node_status_capacity_pods{endpoint="https-main",environment="vSphere",instance="10.128.2.10:8443",job="kube-state-metrics",namespace="openshift-monitoring",node="qe-anusaxen10-nhx4f-master-2",pod="kube-state-metrics-75bcb99ff6-6td44",prometheus="openshift-monitoring/k8s",region="unknown",service="kube-state-metrics"}	250

38 pods on this node,28 Running, 10 Completed pods, 250 * 0.112 = 28, the expression does not count Completed pods
# oc get pod --all-namespaces -o wide --no-headers| grep "qe-anusaxen10-nhx4f-master-2" |  wc -l
38
# oc get pod --all-namespaces -o wide | grep "qe-anusaxen10-nhx4f-master-2" | grep Running |  wc -l
28
# oc get pod --all-namespaces -o wide | grep "qe-anusaxen10-nhx4f-master-2" | grep Completed |  wc -l
10

Comment 10 David Hernández Fernández 2020-08-07 14:00:50 UTC
LGTM! The PR merge also is perfect, thanks! Looking forward to see this included in next ERRATAS.

Comment 13 errata-xmlrpc 2020-10-27 16:06:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196