Bug 1910140

Summary: fix the api dashboard with changes in upstream kube 1.20
Product: OpenShift Container Platform Reporter: Abu Kashem <akashem>
Component: kube-apiserverAssignee: Abu Kashem <akashem>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: aos-bugs, kewang, mfojtik, sttts, xxia
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:48:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
topk(20, histogram_quantile... displays more than 20
none
50 occurrences of {{
none
Rate of TLS Handshake Error has no legend name due to using empty by()
none
Request Rate by Resource and Verb screenshot: topk(20, sum(rate(apiserver_request_total{apiserver=\"$apiserver\"}[$period])) by(resource,verb)) none

Description Abu Kashem 2020-12-22 19:51:07 UTC
Description of problem:

- adjust graph titles 
- update the dashboard with changes in kube 1.20
- add a couple of new metrics to the dashboard

Comment 2 Xingxing Xia 2021-01-18 11:54:15 UTC
Read the PR, got to know it updated some and added some metrics, but tested 4.7.0-0.nightly-2021-01-17-211555 , hit some more issues:
1. Some places defined "topk(20, ...", but displayed more than 20, see attachment.
2. It has no {{flowSchema}}:{{priorityLevel}} legend names now, but still has other {{ like {{resource}}-{{verb}}, {{component}}-{{resource}}, {{group}}:{{kind}}, the {{ occurrences are as many as 50, more than bug 1911173#c1 attachment, so 1911173 needs separate fix, it is not DUP of this bug, see attachment.
3. Rate of TLS Handshake Error has no legend, see attachment.

Comment 3 Xingxing Xia 2021-01-18 11:56:22 UTC
Created attachment 1748437 [details]
topk(20, histogram_quantile... displays more than 20

Comment 4 Xingxing Xia 2021-01-18 11:57:50 UTC
Created attachment 1748439 [details]
50 occurrences of {{

Comment 5 Xingxing Xia 2021-01-18 12:01:12 UTC
Created attachment 1748441 [details]
Rate of TLS Handshake Error has no legend name due to using empty by()

Comment 6 Abu Kashem 2021-02-03 22:48:38 UTC
xxia

> Some places defined "topk(20, ...", but displayed more than 20, see attachment.
This is a known issue, topk does not seem to work well histogram. We can look at it in 4.8. Do you want to open a separate BZ for this targeting 4.8?

> It has no {{flowSchema}}:{{priorityLevel}} legend names now, but still has other {{ like {{resource}}-{{verb}}, {{component}}-{{resource}}, {{group}}:{{kind}}, the {{ occurrences are as many as 50, more than bug 1911173#c1 attachment, so 1911173 needs separate fix, it is not DUP of this bug, see attachment.

Yeah, it seems to work fine in grafana, so probably a console dashboard issue.

> 3. Rate of TLS Handshake Error has no legend, see attachment.
it does not have any label.

Comment 7 Xingxing Xia 2021-02-04 03:30:07 UTC
Abu Kashem, OK, let me move this bug VERIFIED.
> This is a known issue, topk ... Do you want to open a separate BZ for this targeting 4.8?
I'll contact Monitoring QE about it.

> it seems to work fine in grafana, so probably a console dashboard issue
I'm not sure how to check this in Grafana given bug 1911182 is not fixed. I'm contacting Monitoring QE how to manually provision a dashboard, will move bug 1911173 to Management Console once confirmed.

> it does not have any label
I knew. But it has a small square legend bullet without legend name, this is ** ugly **

Comment 8 Xingxing Xia 2021-02-07 10:21:27 UTC
Created attachment 1755448 [details]
Request Rate by Resource and Verb screenshot: topk(20, sum(rate(apiserver_request_total{apiserver=\"$apiserver\"}[$period])) by(resource,verb))

>> This is a known issue, topk does not seem to work well histogram. We can look at it in 4.8. Do you want to open a separate BZ for this targeting 4.8?
> I'll contact Monitoring QE about it.
Contacted, the guy pointed out that: hovering on the the graph, the graph already works well in displaying 20 series, just the legend does not.
I checked this happens in Prometheus UI too.
I also checked topk with sum instead of histogram, happens too, see attachment.
So the legend may work by design to show all scraped series.

Comment 11 errata-xmlrpc 2021-02-24 15:48:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633