Hide Forgot
Description of problem: Previously[1] we attempted to do the same, but there was a misunderstanding about the GC behavior and it caused the alert to be fired even before GC comes into play. According to[2][3] kubelet GC kicks in only when `imageGCHighThresholdPercent` is hit which is set to 85% by default. However `NodeFilesystemSpaceFillingUp` is set to fire as soon as 80% usage is hit. [1] https://github.com/prometheus-operator/kube-prometheus/pull/1357 [2] https://docs.openshift.com/container-platform/4.10/nodes/nodes/nodes-nodes-garbage-collection.html#nodes-nodes-garbage-collection-images_nodes-nodes-configuring [3] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: NodeFilesystemSpaceFillingUp fires before kubelet GC kicks in Expected results: NodeFilesystemSpaceFillingUp shouldn't fire before kubelet GC kicks in Additional info:
Wait for pr is in payload
Test with payload 4.11.0-0.nightly-2022-04-23-153426 % host=$(oc -n openshift-monitoring get route thanos-querier -ojsonpath={.spec.host}) % token=`oc sa get-token prometheus-k8s -n openshift-monitoring` % curl -H "Authorization: Bearer $token" -k "https://$host/api/v1/rules" | jq |grep -A10 NodeFilesystemSpaceFillingUp % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 252k 0 252k 0 0 140k 0 --:--:-- 0:00:01 --:--:-- 140k "name": "NodeFilesystemSpaceFillingUp", "query": "(node_filesystem_avail_bytes{fstype!=\"\",job=\"node-exporter\"} / node_filesystem_size_bytes{fstype!=\"\",job=\"node-exporter\"} * 100 < 10 and predict_linear(node_filesystem_avail_bytes{fstype!=\"\",job=\"node-exporter\"}[6h], 4 * 60 * 60) < 0 and node_filesystem_readonly{fstype!=\"\",job=\"node-exporter\"} == 0)", "duration": 3600, "labels": { "prometheus": "openshift-monitoring/k8s", "severity": "critical" }, "annotations": { "description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up fast.", "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/NodeFilesystemSpaceFillingUp.md", "summary": "Filesystem is predicted to run out of space within the next 4 hours." }, "alerts": [], "health": "ok", "evaluationTime": 0.00215177, "lastEvaluation": "2022-04-24T02:15:36.216317682Z", "type": "alerting" }, { "state": "inactive", "name": "NodeFilesystemSpaceFillingUp", "query": "(node_filesystem_avail_bytes{fstype!=\"\",job=\"node-exporter\"} / node_filesystem_size_bytes{fstype!=\"\",job=\"node-exporter\"} * 100 < 15 and predict_linear(node_filesystem_avail_bytes{fstype!=\"\",job=\"node-exporter\"}[6h], 24 * 60 * 60) < 0 and node_filesystem_readonly{fstype!=\"\",job=\"node-exporter\"} == 0)", "duration": 3600, "labels": { "prometheus": "openshift-monitoring/k8s", "severity": "warning" }, "annotations": { "description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up.", "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/NodeFilesystemSpaceFillingUp.md", "summary": "Filesystem is predicted to run out of space within the next 24 hours." }, "alerts": [], "health": "ok", "evaluationTime": 0.002492956, "lastEvaluation": "2022-04-24T02:15:36.21382262Z", "type": "alerting" }, { "state": "inactive",
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069