Bug 1913096 - backport: cadvisor machine metrics are missing in k8s 1.19
Summary: backport: cadvisor machine metrics are missing in k8s 1.19
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Elana Hashman
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On:
Blocks: 1913543
TreeView+ depends on / blocked
 
Reported: 2021-01-05 23:59 UTC by Elana Hashman
Modified: 2021-02-24 15:50 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: https://github.com/kubernetes/kubernetes/issues/95204 Consequence: machine_* metrics from cadvisor disappeared Fix: https://github.com/kubernetes/kubernetes/pull/97006 Result: metrics are restored
Clone Of:
Environment:
Last Closed: 2021-02-24 15:50:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 507 0 None closed Bug 1913096: UPSTREAM: 97006: kubelet: Fix cadvisor machine metrics 2021-02-01 19:06:43 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:50:37 UTC

Description Elana Hashman 2021-01-05 23:59:07 UTC
See upstream bug: https://github.com/kubernetes/kubernetes/issues/95204


Description of problem:

Machine metrics from cadvisor are missing in Kubernetes 1.19+ (OpenShift 4.6+).

I believe OpenShift does not use the machine_* metrics to calculate machine resource stats, instead relying on the stable metrics provided by kube-state-metrics: https://github.com/kubernetes/kubernetes/issues/95204#issuecomment-719445180

However, it is possible our customers are using these metrics directly.


Version-Release number of selected component (if applicable): 4.6+


How reproducible:

Expected output from a metrics query:

# kubectl get --raw "/api/v1/nodes/NODE_NAME/proxy/metrics/cadvisor" | grep -i machine_cpu_cores
# HELP machine_cpu_cores Number of CPU cores on the machine.
# TYPE machine_cpu_cores gauge
machine_cpu_cores 4

Actual output: 

No matching metrics, as these metrics are not produced.


Additional info:

Patch here: https://github.com/kubernetes/kubernetes/pull/97006

Backport for 1.19: https://github.com/kubernetes/kubernetes/pull/97692
Backport for 1.20: https://github.com/kubernetes/kubernetes/pull/97691

Comment 2 Weinan Liu 2021-01-15 09:28:03 UTC
$ kubectl get --raw "/api/v1/nodes/$NODE_NAME/proxy/metrics/cadvisor" | grep -i machine_cpu_cores
# HELP machine_cpu_cores Number of logical CPU cores.
# TYPE machine_cpu_cores gauge
machine_cpu_cores{boot_id="6e0992fd-4663-4453-93c7-1e19ecb5f2c5",machine_id="ec254bc850b75418b882621232566930",system_uuid="ec254bc8-50b7-5418-b882-621232566930"} 4

Verified to be fixed on4.7.0-0.nightly-2021-01-13-124141

Comment 5 errata-xmlrpc 2021-02-24 15:50:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.