Bug 1913543

Summary: backport: cadvisor machine metrics are missing in k8s 1.19
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Elana Hashman <ehashman>
Node sub component: Kubelet QA Contact: Weinan Liu <weinliu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: aos-bugs, tsweeney
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-09 20:16:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1913096    
Bug Blocks:    

Description OpenShift BugZilla Robot 2021-01-07 03:49:10 UTC
+++ This bug was initially created as a clone of Bug #1913096 +++

See upstream bug: https://github.com/kubernetes/kubernetes/issues/95204


Description of problem:

Machine metrics from cadvisor are missing in Kubernetes 1.19+ (OpenShift 4.6+).

I believe OpenShift does not use the machine_* metrics to calculate machine resource stats, instead relying on the stable metrics provided by kube-state-metrics: https://github.com/kubernetes/kubernetes/issues/95204#issuecomment-719445180

However, it is possible our customers are using these metrics directly.


Version-Release number of selected component (if applicable): 4.6+


How reproducible:

Expected output from a metrics query:

# kubectl get --raw "/api/v1/nodes/NODE_NAME/proxy/metrics/cadvisor" | grep -i machine_cpu_cores
# HELP machine_cpu_cores Number of CPU cores on the machine.
# TYPE machine_cpu_cores gauge
machine_cpu_cores 4

Actual output: 

No matching metrics, as these metrics are not produced.


Additional info:

Patch here: https://github.com/kubernetes/kubernetes/pull/97006

Backport for 1.19: https://github.com/kubernetes/kubernetes/pull/97692
Backport for 1.20: https://github.com/kubernetes/kubernetes/pull/97691

Comment 3 Elana Hashman 2021-02-25 16:26:48 UTC
Bumped severity - cadvisor machine metrics will be totally missing on a 4.5 -> 4.6 upgrade without this.

Comment 11 errata-xmlrpc 2021-03-09 20:16:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.20 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0674