Description of problem: Memory utilization in CloudForms not matching up with CloudForms. Version-Release number of selected component (if applicable): 5.8.3 (Also reproduced on 5.10.0.22) How reproducible: Always Steps to Reproduce: 1. Add Openshift provider with hawkular 2. Wait for metrics to populate 3. However over memory block under node utilization Actual results: Shows higher than expected memory utilization Expected results: Accurate memory utilization Additional info:
submited upstream: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/305
merged upstream: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/305 Note: The values in ManageIQ are day/month averages and should not be compared as is with cli `top` command or `oc adm top` command values.
New commit detected on ManageIQ/manageiq-providers-kubernetes/hammer: https://github.com/ManageIQ/manageiq-providers-kubernetes/commit/746ae7d41cb2f634bd09b548373eea965b07f7be commit 746ae7d41cb2f634bd09b548373eea965b07f7be Author: Adam Grare <agrare> AuthorDate: Tue Nov 20 06:59:20 2018 -0500 Commit: Adam Grare <agrare> CommitDate: Tue Nov 20 06:59:20 2018 -0500 Merge pull request #305 from yaacov/optional-working-set-as-memory-tag Use Hawkular memory tag working-set instead of usage (cherry picked from commit aeda8479fe7d8caa9e17e325a14be4022d77b213) https://bugzilla.redhat.com/show_bug.cgi?id=1650351 app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context.rb | 16 +- app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context.rb | 6 +- spec/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_spec.rb | 20 +- spec/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_spec.rb | 16 +- spec/models/manageiq/providers/kubernetes/container_manager/metrics_capture_spec.rb | 4 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_container_metrics.yml | 52 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_container_timespan.yml | 50 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_m_endpoint.yml | 36 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_node_metrics.yml | 44 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_node_timespan.yml | 46 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_pod_metrics.yml | 48 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_pod_timespan.yml | 50 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_refresh.yml | 181 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context_status.yml | 44 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_container_metrics.yml | 80 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_container_timespan.yml | 84 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_node_metrics.yml | 160 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_node_timespan.yml | 168 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_pod_metrics.yml | 160 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_pod_timespan.yml | 168 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_refresh.yml | 146 +- spec/vcr_cassettes/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context_status.yml | 44 +- 22 files changed, 902 insertions(+), 721 deletions(-)
Verified in: 5.10.0.27.20181128170555_43ed8cb Note: There is no clear way to verify this bug. Steps below are what I believe is sufficient verification The provider added for this test has been relatively unused for over two weeks. Adding a unused provider for this case because the CloudForms Dashboard will show average values over days/months. An unused provider will have a low memory utilization and make verification more accurate. Verification steps: 1) Set "hawkular_force_legacy" to "false" under the advanced configuration. This is a new key within the advanced configuration, see documentation for more information. 2) Added a OCP provider and enabled metrics collection permissions. 3) Compared the real time metrics on the OpenShift nodes ("oc adm top nodes") with the averages displayed on the CFME dashboard and verified they were within 5%, which I believe is a reasonable margin.