Bug 1852355
| Summary: | Missing namespaces in cpu report and memory report | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | David <dsquirre> | ||||||||||
| Component: | Metering Operator | Assignee: | Emily Moss <emoss> | ||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Peter Ruan <pruan> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 4.3.z | CC: | amdas, btofel, emoss, pweil, sd-operator-metering, tflannag, zhigwang | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | 4.3.z | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 1872425 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2020-09-18 20:59:57 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | 1875042 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
David
2020-06-30 08:43:39 UTC
Created attachment 1699258 [details]
OpenShift Metering Report
Going to move this to 4.6 to take it off the 4.5 blocker bugs list. I haven't had too much time to review this besides the description so I'm going to set the severity to 'medium' as it sounds like it may be contingent on further CU information. Customer has provided the following information: > When was the Metering Operator installed? NonProd cluster - sometime early May Prod cluster - last week > Is this repeatable? Could you try running the report(s) again to see if the same issue occurs? And see if it is the same namespaces or a different set. Don't run it for the current day, but perhaps for the previous day where there will be a full 24 hour time period of metrics to collect Yes, refer to new attachment, there are different rows for CPU and memory > Are you using your own CRD? no, we use the natural one from osd. > Are those reports part of the examples installed with cost management? > Are those reports part of the examples installed with the Metering Operator? Yes, all reports are parts of existing examples, we have not customized anything on metering Created attachment 1700029 [details]
report-proc
Created attachment 1700030 [details]
ng-report
In Summary ----------- Why are some namespaces missing from the standard reports? Details ------- When running standard (un-modified) Metering reports from the Metering Operator, some namespaces do not appear in some reports even though they exist on the cluster. The memory report is missing namespace: - openshift-cluster-samples-operator While the cpu report is missing a number of namespaces: - openshift-kube-scheduler -> missing in cpu - openshift-kube-scheduler-operator -> missing in cpu - openshift-service-catalog-apiserver-operator The customer can reproduce this. And has done so 3 times each for a report spanning a 24 hour period and on 2 different clusters. I can not as I don't have access to a test cluster where I can install the Metering Operator. The reports are the standard report provided by the Metering Operator (Unmodified). The reports in question are memory and cpu There is no CRD ---- I there anything else I can request from the CU Can you please provide the reports you noticed this in? If you do, I can run both reports and check this out. Often this is an easy fix which is related to aliasing. Example reports are attached. The reports were generated from namespace-cpu-ultilization and namespace-memory-ultilization Thanks Hi Emily, how did you go with this bug? Created attachment 1702470 [details]
Image describing samples operator namespace
Has there been any movement on this issue? Hi Emily Just a heads up Narayanan Raghavan followed up asking for an update on this case, in an email. It looks like a RH solutions provider is blocked on this particular bug. >>> The differences are: >>> openshift-cluster-samples-operator is in cpu, not memory >>> openshift-kube-scheduler- is in memory, not cpu >>> openshift-kube-scheduler-operator memory, not cpu >>> openshift-service-catalog-api-server-operator is in memory not cpu Correct this is what I documented back in #c7 >>> Every time a cpu report is run, does your customer not have openshift-kube-scheduler and openshift-kube-scheduler-operator in the report? Yes: the operators have not changed between reports. The memory and CPU reports are always for the same period. >>> Are we certain that it does consumer cpu the same as other operators? I am not sure what you mean.... (assuming consumer = consume) I assume there is no difference, I cant see how a pod/project would be able to alter this. >>> If you run another recent report on the cpu usage for openshift-kube-scheduler and it doesn't show up again this may be a useful thing to look into. There are 3 reports attached that show the same issue (2 are 8 days apart and 1 is for a different cluster). I am reluctant to ask the CU for another report that shows the same error and indicates we have not spent any time on this bug in the past 30 days: 22 June :: report (1).xlsx (both reports in same doc) 03 Jul :: np-report-20200630.xlsx 03 Jul :: report-prod-20200630.xlsx Hey David, I had some time this morning to look more into this in-depth and fire up a 4.3 cluster. From my understanding of poking around that cluster for a bit, the cluster-samples-operator does not set a memory request (only a cpu request), which results in a row mismatch when we join between the memory usage and memory request tables. The result is that we don't include the openshift-cluster-samples-operator namespace in the namespace-memory-utilization report as the cluster-samples-operator Pod, which is the only Pod in that namespace, doesn't set a memory request, despite using memory during a given reporting period. I'll have to touch base with the rest of the team as to whether it makes sense to alter that query to include results for namespaces that haven't specified memory requests, but have consumed memory. I haven't dived into the what's happening with the cpu-related utilization mismatches, but I imagine it's a similar case with the cluster-samples-operator. I have to pause on this work for other things, so I'm just posting my findings here now in case the customer wanted an update from engineering, or someone else on the team wants to pick up this work if I can't get back to in time. Tim Adding the UpcomingSprint keyword - we started some initial investigations late in sprint 188 but plan on tackling this BZ in this next upcoming sprint. We're still waiting on the 4.4/4.5 PRs to merge so adding the UpcomingSprint keyword now. The 4.4 BZ fix has landed and QE has verified those changes. Those fixes should land in the upcoming 4.4 z-stream release. Going to close this BZ as we're not going to backport these changes to 4.3.z as that release is in the maintenance phase and metering does not have the ability to upgrading between 4.3.z streams. |