Description of problem: Currently the cpu utilization for nodes and containers/pods is computed (in a complex and error-prone way) using the cpu-time and the number of cores of each machine. We should switch to use (draft, possible changes): - cpu/node_utilization for percentage of cpu usage on nodes - cpu/usage_rate for millicores used for nodes, containers and pods By using the above metrics instead of complex computations we'll get higher precision and more reliability (not dependent on number of cores of a node, changes of number of cores of a node, etc.) This will also allow us to get rid of spurious errors that we find from time to time as side-effect of the complex computation and the dependency on the nodes cores: WARN -- : ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode name: [ocp-c07-node02.10.35.48.141.nip.io], id: [2] Timestamp: [2017-05-23T08:00:20Z], Column [cpu_usage_rate_average]: 'percent value 103.94 is out of range, resetting to 100.0' ERROR -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) ContainerGroup(1000000019955) is not valid: Validation error: cores not defined Moving to these two new metrics has dependency on the db schema (we probably need extra columns to save this information) and chargeback reports. Anyway it's possible to make this change backward-compatible.
Submitted upstream: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/47
submitted upstream: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/159 - under develpment. old patch is deprecated: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/47 - closed
merged upstream: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/159