Bug 1913096

Summary: backport: cadvisor machine metrics are missing in k8s 1.19
Product: OpenShift Container Platform Reporter: Elana Hashman <ehashman>
Component: NodeAssignee: Elana Hashman <ehashman>
Node sub component: Kubelet QA Contact: Weinan Liu <weinliu>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, weinliu
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: https://github.com/kubernetes/kubernetes/issues/95204 Consequence: machine_* metrics from cadvisor disappeared Fix: https://github.com/kubernetes/kubernetes/pull/97006 Result: metrics are restored
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:50:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1913543    

Description Elana Hashman 2021-01-05 23:59:07 UTC
See upstream bug: https://github.com/kubernetes/kubernetes/issues/95204


Description of problem:

Machine metrics from cadvisor are missing in Kubernetes 1.19+ (OpenShift 4.6+).

I believe OpenShift does not use the machine_* metrics to calculate machine resource stats, instead relying on the stable metrics provided by kube-state-metrics: https://github.com/kubernetes/kubernetes/issues/95204#issuecomment-719445180

However, it is possible our customers are using these metrics directly.


Version-Release number of selected component (if applicable): 4.6+


How reproducible:

Expected output from a metrics query:

# kubectl get --raw "/api/v1/nodes/NODE_NAME/proxy/metrics/cadvisor" | grep -i machine_cpu_cores
# HELP machine_cpu_cores Number of CPU cores on the machine.
# TYPE machine_cpu_cores gauge
machine_cpu_cores 4

Actual output: 

No matching metrics, as these metrics are not produced.


Additional info:

Patch here: https://github.com/kubernetes/kubernetes/pull/97006

Backport for 1.19: https://github.com/kubernetes/kubernetes/pull/97692
Backport for 1.20: https://github.com/kubernetes/kubernetes/pull/97691

Comment 2 Weinan Liu 2021-01-15 09:28:03 UTC
$ kubectl get --raw "/api/v1/nodes/$NODE_NAME/proxy/metrics/cadvisor" | grep -i machine_cpu_cores
# HELP machine_cpu_cores Number of logical CPU cores.
# TYPE machine_cpu_cores gauge
machine_cpu_cores{boot_id="6e0992fd-4663-4453-93c7-1e19ecb5f2c5",machine_id="ec254bc850b75418b882621232566930",system_uuid="ec254bc8-50b7-5418-b882-621232566930"} 4

Verified to be fixed on4.7.0-0.nightly-2021-01-13-124141

Comment 5 errata-xmlrpc 2021-02-24 15:50:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633