Bug 2007677
Summary: | Regression: core container io performance metrics are missing for pod, qos, and system slices on nodes | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | |
Component: | Monitoring | Assignee: | Philip Gough <pgough> | |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 4.9 | CC: | amuller, anpicker, aos-bugs, arajkuma, erooth, pgough | |
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2008120 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-12 04:38:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2008120 |
Description
Clayton Coleman
2021-09-24 14:41:11 UTC
This applies to most container_* metrics that we decide to keep on the pod scope. Ok, so reviewing the drop rule - action: drop regex: (container_fs_.*|container_spec_.*|container_blkio_device_usage_total|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);; sourceLabels: - __name__ - pod - namespace container_fs_* and container_blkio_device_usage_total is pod centric (the drop rule is wrong, and must change to drop only container series, relying on slice summarization from cgroups to cadvisor) container_last_seen, start_time_seconds, thread_max, file_descriptor, sockets are all container centirc (the drop rule is correct, since cgroups doesn’t sum these) container_spec_* is probably no longer used (in a future release we can review and drop), but is container centric and the drop rule is correct checked with 4.10.0-0.nightly-2021-09-26-233013, and search "count(container_fs_writes_total) by (id)" in prometheus, the core cadvisor metrics mentioned in Comment 0 are found Moving back to assigned because of the discussion in https://github.com/openshift/cluster-monitoring-operator/pull/1395 checked with 4.10.0-0.nightly-2021-09-28-220911, # oc -n openshift-monitoring get servicemonitor kubelet -oyaml ... metricRelabelings: - action: drop regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s) sourceLabels: - __name__ - action: drop regex: (container_spec_.*|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);; sourceLabels: - __name__ - pod - namespace - action: drop regex: (container_blkio_device_usage_total);.+ sourceLabels: - __name__ - container - action: drop regex: container_memory_failures_total sourceLabels: - __name__ - action: drop regex: (container_fs_.*);.+ sourceLabels: - __name__ - container ************************************* except the result mentioned in Comment 6 and Comment 7, there is not container label for container_blkio_device_usage_total container_fs_.* count(container_blkio_device_usage_total) by (container) {} 1770 count(container_fs_writes_total) by (container) {} 337 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |