Bug 1955247 - Gaps in metrics in Prometheus data
Summary: Gaps in metrics in Prometheus data
Keywords:
Status: CLOSED DUPLICATE of bug 1950993
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-29 18:29 UTC by Alex Krzos
Modified: 2021-04-30 09:08 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-30 09:08:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Grafana showing gaps in data (420.40 KB, image/png)
2021-04-29 18:29 UTC, Alex Krzos
no flags Details
Prometheus showing gaps in data (620.52 KB, image/png)
2021-04-29 18:31 UTC, Alex Krzos
no flags Details

Description Alex Krzos 2021-04-29 18:29:14 UTC
Created attachment 1777314 [details]
Grafana showing gaps in data

Description of problem:
Reviewing pod metrics to see any issues with some testing we are running and found gaps in collected metrics mainly off worker nodes however it seems master nodes also have some gaps in metrics.

Version-Release number of selected component (if applicable):
4.8.0.fc.1

How reproducible:
If we rebuild the cluster I will recheck if the issue shows up again


Steps to Reproduce:
1. Deploy Bare Metal cluster from Assisted Installer (unclear if assisted installer has anything to do with it but just mentioning how we built this cluster) and review metrics (Ex container_memory_working_set_bytes in prometheus)
2.
3.

Actual results:
View gaps in data (Included screenshots show prometheus and grafana gap artifacts)



Expected results:


Additional info:

Initial viewing of the logs only found a few errors off the kubelets that seem to suggest problems with cadvisor getting metrics.

Examples:

Apr 29 17:20:33 f19-h03-000-r640 hyperkube[3689]: E0429 17:20:33.611504    3689 cadvisor_stats_provider.go:415] "Partial failure issuing cadvisor.ContainerInfoV2" err="partial failures: [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-9b80aa737caad2147e903311140991cbec9034ed47b2d825d1efb6bb6498a021.scope\": RecentStats: unable to find data in memory cache], [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-eab74d53bad7a5ae49d87844c8139c7f0d871f14be024eff9b39cc99ae47ffe9.scope\": RecentStats: unable to find data in memory cache]"
Apr 29 17:20:33 f19-h03-000-r640 hyperkube[3689]: E0429 17:20:33.611599    3689 cadvisor_stats_provider.go:415] "Partial failure issuing cadvisor.ContainerInfoV2" err="partial failures: [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-eab74d53bad7a5ae49d87844c8139c7f0d871f14be024eff9b39cc99ae47ffe9.scope\": RecentStats: unable to find data in memory cache], [\"/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podab6d0823_1b04_4f1e_86b9_eefc839041a0.slice/crio-9b80aa737caad2147e903311140991cbec9034ed47b2d825d1efb6bb6498a021.scope\": RecentStats: unable to find data in memory cache]"


Apr 29 18:28:04 f19-h03-000-r640 hyperkube[3689]: E0429 18:28:04.807172    3689 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/6c49e7df-ffa3-462c-8bb5-45cda6c3d0c0/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-4z578"
Apr 29 18:28:14 f19-h03-000-r640 hyperkube[3689]: E0429 18:28:14.981119    3689 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/6c49e7df-ffa3-462c-8bb5-45cda6c3d0c0/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-4z578"

Comment 1 Alex Krzos 2021-04-29 18:31:38 UTC
Created attachment 1777315 [details]
Prometheus showing gaps in data

Comment 3 Simon Pasquier 2021-04-30 09:08:16 UTC
This looks very similar to bug 1950993 (see comment [1]). Closing as a DUPLICATE, feel free to reopen if you disagree.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1950993#c4

*** This bug has been marked as a duplicate of bug 1950993 ***


Note You need to log in before you can comment on or make changes to this bug.