Bug 1955247

Summary: Gaps in metrics in Prometheus data
Product: OpenShift Container Platform Reporter: Alex Krzos <akrzos>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.8CC: alegrand, anpicker, erooth, kakkoyun, lcosic, pkrupa, spasquie, surbania
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-30 09:08:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Grafana showing gaps in data
none
Prometheus showing gaps in data none

Description Alex Krzos 2021-04-29 18:29:14 UTC
Created attachment 1777314 [details]
Grafana showing gaps in data

Description of problem:
Reviewing pod metrics to see any issues with some testing we are running and found gaps in collected metrics mainly off worker nodes however it seems master nodes also have some gaps in metrics.

Version-Release number of selected component (if applicable):
4.8.0.fc.1

How reproducible:
If we rebuild the cluster I will recheck if the issue shows up again


Steps to Reproduce:
1. Deploy Bare Metal cluster from Assisted Installer (unclear if assisted installer has anything to do with it but just mentioning how we built this cluster) and review metrics (Ex container_memory_working_set_bytes in prometheus)
2.
3.

Actual results:
View gaps in data (Included screenshots show prometheus and grafana gap artifacts)



Expected results:


Additional info:

Initial viewing of the logs only found a few errors off the kubelets that seem to suggest problems with cadvisor getting metrics.

Examples:

Apr 29 17:20:33 f19-h03-000-r640 hyperkube[3689]: E0429 17:20:33.611504    3689 cadvisor_stats_provider.go:415] "Partial failure issuing cadvisor.ContainerInfoV2" err="partial failures: [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-9b80aa737caad2147e903311140991cbec9034ed47b2d825d1efb6bb6498a021.scope\": RecentStats: unable to find data in memory cache], [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-eab74d53bad7a5ae49d87844c8139c7f0d871f14be024eff9b39cc99ae47ffe9.scope\": RecentStats: unable to find data in memory cache]"
Apr 29 17:20:33 f19-h03-000-r640 hyperkube[3689]: E0429 17:20:33.611599    3689 cadvisor_stats_provider.go:415] "Partial failure issuing cadvisor.ContainerInfoV2" err="partial failures: [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-eab74d53bad7a5ae49d87844c8139c7f0d871f14be024eff9b39cc99ae47ffe9.scope\": RecentStats: unable to find data in memory cache], [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-9b80aa737caad2147e903311140991cbec9034ed47b2d825d1efb6bb6498a021.scope\": RecentStats: unable to find data in memory cache]"


Apr 29 18:28:04 f19-h03-000-r640 hyperkube[3689]: E0429 18:28:04.807172    3689 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/6c49e7df-ffa3-462c-8bb5-45cda6c3d0c0/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-4z578"
Apr 29 18:28:14 f19-h03-000-r640 hyperkube[3689]: E0429 18:28:14.981119    3689 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/6c49e7df-ffa3-462c-8bb5-45cda6c3d0c0/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-4z578"

Comment 1 Alex Krzos 2021-04-29 18:31:38 UTC
Created attachment 1777315 [details]
Prometheus showing gaps in data

Comment 3 Simon Pasquier 2021-04-30 09:08:16 UTC
This looks very similar to bug 1950993 (see comment [1]). Closing as a DUPLICATE, feel free to reopen if you disagree.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1950993#c4

*** This bug has been marked as a duplicate of bug 1950993 ***