Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1852355

Summary: Missing namespaces in cpu report and memory report
Product: OpenShift Container Platform Reporter: David <dsquirre>
Component: Metering OperatorAssignee: Emily Moss <emoss>
Status: CLOSED WONTFIX QA Contact: Peter Ruan <pruan>
Severity: medium Docs Contact:
Priority: high    
Version: 4.3.zCC: amdas, btofel, emoss, pweil, sd-operator-metering, tflannag, zhigwang
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1872425 (view as bug list) Environment:
Last Closed: 2020-09-18 20:59:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1875042    
Bug Blocks:    
Attachments:
Description Flags
OpenShift Metering Report
none
report-proc
none
ng-report
none
Image describing samples operator namespace none

Description David 2020-06-30 08:43:39 UTC
Description of problem:  Two reports, one for cpu and and one for memory, that have been run for the same period do not show exactly the same namespaces.

For example openshift-cluster-samples-operator is missing from the memory report, while Openshift-kube-scheduler, openshift-kube-scheduler-operator and openshift-service-catalog-apiserver-operator are missing from the cpu report


Version-Release number of selected component (if applicable):  OSDv4.3 or OCPv4.3.19


How reproducible: Not; 1) Don't have an environment to reproduce, 2) This might be related to specific CU data and only reproducible on the CU cluster.

So I am not sure if I would be able to reproduce the issue.


Steps to Reproduce: N/a

Actual results:  Missing namespaces


Expected results: The namespaces for both reports, run at the same time should match.  I assume if a namespaces is consuming memory it must be consuming CPU at the same time. Or vise versa.  Based on the fact that a pod must consume memory, and cpu, when running.


Additional info:
Spreadsheet attached with both reports included.

The CU and I would have expected that a namespace would consume both cpu and memory at the same time and not one OR the other.

One thing that I can think of that would cause this is that the reports were generated at different times or only for a short amount of time.  But this is not the case.  

Another thought is that the Metering Operator was only just switched on and so the data may not truely be reflective of a full 24 hours.  This has not been checked

I will check when then enabled the metering operator to see if it is the same day as the report.  I will also ask the if they can run the report again for a different and more recent 24 hour period, where they expect the Metering Operator to have run for a full 24 hours.

Comment 1 David 2020-06-30 08:46:19 UTC
Created attachment 1699258 [details]
OpenShift Metering Report

Comment 3 tflannag 2020-06-30 14:33:29 UTC
Going to move this to 4.6 to take it off the 4.5 blocker bugs list. I haven't had too much time to review this besides the description so I'm going to set the severity to 'medium' as it sounds like it may be contingent on further CU information.

Comment 4 Zhigang Wang 2020-07-06 14:06:20 UTC
Customer has provided the following information:


> When was the Metering Operator installed?
NonProd cluster - sometime early May
Prod cluster - last week

> Is this repeatable?  Could you try running the report(s) again to see if the same issue occurs?  And see if it is the same namespaces or a different set.  Don't run it for the current day, but perhaps for the previous day where there will be a full 24 hour time period of metrics to collect

Yes, refer to new attachment, there are different rows for CPU and memory

> Are you using your own CRD? 
no, we use the natural one from osd.

> Are those reports part of the examples installed with cost management?
> Are those reports part of the examples installed with the Metering Operator?
Yes, all reports are parts of existing examples, we have not customized anything on metering

Comment 5 Zhigang Wang 2020-07-06 14:08:01 UTC
Created attachment 1700029 [details]
report-proc

Comment 6 Zhigang Wang 2020-07-06 14:08:46 UTC
Created attachment 1700030 [details]
ng-report

Comment 7 David 2020-07-08 04:48:47 UTC
In Summary
-----------
Why are some namespaces missing from the standard reports?

Details
-------
When running standard (un-modified) Metering reports from the Metering Operator, some namespaces do not appear in some reports even though they exist on the cluster.

The memory report is missing namespace:
- openshift-cluster-samples-operator

While the cpu report is missing a number of namespaces:
- openshift-kube-scheduler -> missing in cpu
- openshift-kube-scheduler-operator -> missing in cpu
- openshift-service-catalog-apiserver-operator

The customer can reproduce this.  And has done so 3 times each for a report spanning a 24 hour period and on 2 different clusters.

I can not as I don't have access to a test cluster where I can install the Metering Operator.

The reports are the standard report provided by the Metering Operator (Unmodified).

The reports in question are memory and cpu

There is no CRD

----

I there anything else I can request from the CU

Comment 8 Emily Moss 2020-07-10 22:08:27 UTC
Can you please provide the reports you noticed this in?
If you do, I can run both reports and check this out. Often this is an easy fix which is related to aliasing.

Comment 9 David 2020-07-14 11:54:58 UTC
Example reports are attached.

The reports were generated from namespace-cpu-ultilization and namespace-memory-ultilization

Thanks

Comment 10 David 2020-07-20 01:04:49 UTC
Hi Emily, how did you go with this bug?

Comment 13 David 2020-07-27 04:54:43 UTC
Created attachment 1702470 [details]
Image describing samples operator namespace

Comment 15 David 2020-08-03 04:55:52 UTC
Has there been any movement on this issue?

Comment 19 David 2020-08-18 04:06:07 UTC
Hi Emily
Just a heads up Narayanan Raghavan followed up asking for an update on this case, in an email.  It looks like a RH solutions provider is blocked on this particular bug.


>>>  The differences are:
>>>  openshift-cluster-samples-operator is in cpu, not memory
>>>  openshift-kube-scheduler- is in memory, not cpu
>>>  openshift-kube-scheduler-operator memory, not cpu
>>>  openshift-service-catalog-api-server-operator is in memory not cpu

Correct this is what I documented back in #c7

>>>  Every time a cpu report is run, does your customer not have openshift-kube-scheduler and openshift-kube-scheduler-operator in the report?
Yes: the operators have not changed between reports.  The memory and CPU reports are always for the same period.

>>>  Are we certain that it does consumer cpu the same as other operators?
I am not sure what you mean....  (assuming consumer = consume) I assume there is no difference, I cant see how a pod/project would be able to alter this.

>>>  If you run another recent report on the cpu usage for openshift-kube-scheduler and it doesn't show up again this may be a useful thing to look into.
There are 3 reports attached that show the same issue (2 are 8 days apart and 1 is for a different cluster).  I am reluctant to ask the CU for another report that shows the same error and indicates we have not spent any time on this bug in the past 30 days:
22 June :: report (1).xlsx (both reports in same doc)
03 Jul  :: np-report-20200630.xlsx
03 Jul  :: report-prod-20200630.xlsx

Comment 22 tflannag 2020-08-19 16:09:53 UTC
Hey David,

I had some time this morning to look more into this in-depth and fire up a 4.3 cluster. From my understanding of poking around that cluster for a bit, the cluster-samples-operator does not set a memory request (only a cpu request), which results in a row mismatch when we join between the memory usage and memory request tables. The result is that we don't include the openshift-cluster-samples-operator namespace in the namespace-memory-utilization report as the cluster-samples-operator Pod, which is the only Pod in that namespace, doesn't set a memory request, despite using memory during a given reporting period. I'll have to touch base with the rest of the team as to whether it makes sense to alter that query to include results for namespaces that haven't specified memory requests, but have consumed memory. I haven't dived into the what's happening with the cpu-related utilization mismatches, but I imagine it's a similar case with the cluster-samples-operator.

I have to pause on this work for other things, so I'm just posting my findings here now in case the customer wanted an update from engineering, or someone else on the team wants to pick up this work if I can't get back to in time.

Tim

Comment 23 tflannag 2020-08-21 16:21:31 UTC
Adding the UpcomingSprint keyword - we started some initial investigations late in sprint 188 but plan on tackling this BZ in this next upcoming sprint.

Comment 31 Brett Tofel 2020-09-11 19:15:44 UTC
We're still waiting on the 4.4/4.5 PRs to merge so adding the UpcomingSprint keyword now.

Comment 32 tflannag 2020-09-18 20:59:57 UTC
The 4.4 BZ fix has landed and QE has verified those changes. Those fixes should land in the upcoming 4.4 z-stream release. Going to close this BZ as we're not going to backport these changes to 4.3.z as that release is in the maintenance phase and metering does not have the ability to upgrading between 4.3.z streams.