Bug 1798215

Summary: Send apiserver request-in-flight metrics to telemeter
Product: OpenShift Container Platform Reporter: Abu Kashem <akashem>
Component: openshift-apiserverAssignee: Abu Kashem <akashem>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, kewang, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1798214 Environment:
Last Closed: 2020-05-04 11:33:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1798214, 1799057    
Bug Blocks:    

Description Abu Kashem 2020-02-04 20:23:25 UTC
Send apiserver request-in-flight metrics to telemeter

We want to have an idea of how loaded
our api server(s) are. Use the metric apiserver_current_inflight_requests to look at the peak of the number of requests in flight over time.

Similar request for kube-apiserver: Bug #1798214

Comment 2 Ke Wang 2020-03-05 04:25:15 UTC
Verified with the following OCP env,
$ oc version
Client Version: 4.4.0-202002282323-bc08a48
Server Version: 4.4.0-0.nightly-2020-03-04-143604
Kubernetes Version: v1.17.1

Verification steps,

1. Check if the code changes of PR https://github.com/openshift/cluster-openshift-apiserver-operator/pull/307 in,
$ oc get ServiceMonitor -n openshift-kube-apiserver -o yaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  ...
relabelings:
      - action: replace
        replacement: openshift-apiserver
        targetLabel: apiserver
  ...

$ oc get PrometheusRule -n openshift-kube-apiserver -o yaml
...
   - name: apiserver-requests-in-flight
      rules:
      - expr: |
          max_over_time(sum(apiserver_current_inflight_requests{apiserver=~"openshift-apiserver|kube-apiserver"}) by (apiserver,requestKind)[2m:])
        record: cluster:apiserver_current_inflight_requests:sum:max_over_time:2m
...

$ oc -n openshift-monitoring get cm telemetry-config -oyaml | grep "cluster:apiserver_current_inflight_requests:sum:max_over_time:2m" 
    # cluster:apiserver_current_inflight_requests:sum:max_over_time:2m gives maximum number of requests in flight
    - '{__name__="cluster:apiserver_current_inflight_requests:sum:max_over_time:2m"}'

The code changes are checked as expected.

2. Check if the feature work fine with Metrics.

Open the OCP cluster web console, on the left panel, navigate to Monitoring-> Metrics,  enter the keyword ‘cluster:apiserver_current_inflight_requests:sum:max_over_time:2m’ in query textarea of displayed page , click on ‘Run  Queries’,
four items of openshift-apiserver and kube-apiserver are displayed, at column Value, we can see the requests number in 2 minutes.

Element 	                                                                                                             Value                                                                                                                                                                                      
cluster:apiserver_current_inflight_requests:sum:max_over_time:2m{apiserver="kube-apiserver",requestKind="mutating"}	      4
cluster:apiserver_current_inflight_requests:sum:max_over_time:2m{apiserver="kube-apiserver",requestKind="readOnly"}	      6
cluster:apiserver_current_inflight_requests:sum:max_over_time:2m{apiserver="openshift-apiserver",requestKind="mutating"}      1
cluster:apiserver_current_inflight_requests:sum:max_over_time:2m{apiserver="openshift-apiserver",requestKind="readOnly"}      3

We will see the feature work as expected.

Comment 4 errata-xmlrpc 2020-05-04 11:33:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581