| Summary: | Platform plugin memory metrics are not representative of available memory | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Sanda <jsanda> | ||||||||||
| Component: | Plugins | Assignee: | John Sanda <jsanda> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 4.4 | CC: | hrupp | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | RHQ 4.4.0 | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 815979 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2013-09-01 10:11:15 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 782579, 815979 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
John Sanda
2012-03-22 15:56:31 UTC
Created attachment 572028 [details]
platform memory metrics
Here is a screenshot of the platform memory metrics reported by RHQ. The average free memory reported is about 1.23 GB. I have a total of 16 GB of RAM. That would mean I am about 92% memory utilization. While these numbers are not in and of themselves wrong, they are not representative of what's really going on. If I were actually at 92% memory utilization, my machine would be near worthless for development, but fortunately it's pretty snappy :)
Created attachment 572029 [details]
RHQ platform utilization report
Here is the platform utilization report which shows my system's overall memory usage at about 93.5%.
Created attachment 572030 [details]
Gnome System Monitor app
Here is a screenshot of System Monitor running on my box. Note that it reports about 47% memory usage which is in stark contrast to the 92% or 93% reported by RHQ.
Created attachment 572031 [details]
memory reported by htop
This screenshot shows memory usage reported by htop. It reports roughly 7 GB in use which works out to about 44% overall memory usage.
Note: (12:32:29 PM) ccrouch: so we're reporting MemUsed=MemTotal-MemFree ? (12:32:38 PM) jsanda: yeah Interesting analysis John. I agree that nothing appears broken here, but we could be doing a better job of collecting more representative metrics. My suggestion on a next step would be to raise an RFE on Sigar to add support for Buffers and Cached metrics. There may very well be Windows equivalents too we should be picking up. I really prefer to keep as much of our platform specific metrics going through Sigar for right now versus doing our own scanning of /proc/meminfo. The next step after that I think would be predicated on enhancements to the underlying alerts susbsystem, e.g. letting you compare relative size of two metrics. Looks like this feature has been in Sigar already for some time. See https://jira.hyperic.com/browse/SIGAR-188. We are collecting metrics for Native.MemoryInfo.free and Native.MemoryInfo.used, but the more representative metrics are Native.MemoryInfo.actualFree and Native.MemoryInfo.actualUsed, both of which are available in the Mem class in the version of Sigar that we currently use. I am not sure that I entirely understand the part of comparing the relative size of the two metrics. I propose the following. We collect both sets of metrics, and provide better, more accurate descriptions for the metrics. The description for the used memory metric is, "The total used system memory". That is simply is not accurate. And for the platform utilization report, I propose that we used the actualUsed metric. As it stands right now, I don't see how anyone can reliably use the free and used memory metrics for alerting. per BZ triage (crouch, loleary, asantos) If this is a small amount of work we should try to add those metrics for rhq4.4 This is a small amount of work. I can definitely knock it out for RHQ 4.4. The actual free and actual used metrics have been added to the platform plugins. The descriptions for the metrics have been updated as well to reflect which metric do and do not take into account caches and buffers. Lastly, the platform utilization report has been updated to use the new, more representative metrics for memory consumption. master commit hash: 5420259201d92a13da1c24b752410a1c853ade46 Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since. |