Bug 1250060

Summary: free changed output format and values are different
Product: Red Hat Enterprise Linux 7 Reporter: Dalibor Pospíšil <dapospis>
Component: net-snmpAssignee: Josef Ridky <jridky>
Status: CLOSED ERRATA QA Contact: David Jež <djez>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: dapospis, dbodnarc, djez, gnaik, jhansen, jridky, ksrot, mark, nat.guyton, ovasik, pasteur, pierre-yves.goubet, qguo, ravpatil, tcrider
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: net-snmp-5.7.2-43.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1695497 (view as bug list) Environment:
Last Closed: 2019-08-06 13:08:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1699264    
Bug Blocks: 1695497    
Attachments:
Description Flags
Patch with solution for net-snmp in el7 none

Description Dalibor Pospíšil 2015-08-04 12:23:59 UTC
Description of problem:
# rpm -qf `which free`
procps-ng-3.3.9-6.el7.x86_64
# free
             total       used       free     shared    buffers     cached
Mem:       1017216     613996     403220      29672         72     292988
-/+ buffers/cache:     320936     696280
Swap:      1048572       9408    1039164

# rpm -qf `which free`
procps-ng-3.3.10-3.el7.x86_64
# free
              total        used        free      shared  buff/cache   available
Mem:        1017216      134080      403428       29672      479708      636264
Swap:       1048572        9408     1039164

Note buffers and cached numbers. I cannot find any relation between those two outputs.

Version-Release number of selected component (if applicable):
procps-ng-3.3.10-3.el7

Comment 2 Jaromír Cápík 2015-08-05 18:17:07 UTC
Ahoj Dalibore.

It's for a longer discussion. What you see is absolutely wanted and correct.
The values in the -/+ buffers/cache were dropped completely, as they were misleading and used for undesirable evaluations and wrong assumptions about the memory layout for years. The whole concept of the evaluations was obsolete and didn't match the latest changes in the kernel memory management. One of the values lost its sense since we introduced a more accurate column called 'available', showing the estimation of available memory (=memory that is free or can be freed if needed in order to avoid swapping) and the second lost its sense as the meaning of the 'used' column is different now (we excluded caches and buffers). Previously it was calculated as 'total'-'free', but as the 'free' column became useless due to watermarks (that means the memory reported as free cannot be considered really free), the informative value of the old 'used' column became as low as the informative value of the 'free' column (=very low).
The only problem that makes me unhappy is that we didn't manage to redesign that sooner and so that the old wrong design made it to the first RHEL7 releases.

Comment 3 Jaromír Cápík 2015-08-05 18:21:06 UTC
If you have no objections till Friday, I'll close this as NOTABUG. But if you have, any additional questions, then I'm here for you.

Comment 4 Jaromír Cápík 2015-08-05 19:01:49 UTC
NOTE: After reading all this once more, I see you probably didn't mean the +/- values. The 'cached' and 'buffers' columns changed a bit too. In the default output (width up to 80 characters per line) we report their sum in a common column called 'buff/cache' (they're reported separately in the wide mode -w/--wide). In the old version there was an unwanted patch applied. It was subtracting the Shmem from Cached, but after a very long discussion with the kernel team we agreed it was completely wrong, as that was reinforcing a misconception that memory reported as 'cached' could be considered reclaimable. In fact, the Shmem lives in the page cache and it is not the only unreclaimable part of the cache. That's why the 'available' column born. In addition to that we now report slabs as a part of the 'cached'. And the last note about the 'cached' value is, that nowadays the caches grow aggressively and the increase of the value could also be caused by the package upgrade.

Comment 5 Dalibor Pospíšil 2015-08-13 15:59:33 UTC
As these old 'wrong' values are reported also by net-snmp it is worth of discussion whether net-snmp should (not) change / extend the reporting accordingly.

Comment 6 Josef Ridky 2017-10-11 11:33:02 UTC
Net-SNMP takes all information about available/used memory from /proc/meminfo file.

If information mentioned here are wrong, please tell me where I should take these information from.

Comment 7 Dalibor Pospíšil 2017-10-12 10:17:38 UTC
I have no idea, ovasik or someone from their team could probably give us better answer.

Comment 8 Ondrej Vasik 2017-10-12 13:15:16 UTC
Well - Josef is the current maintainer of net-snmp ... 
I think you may want to ask the former maintainer - Jan Safranek - he maintained net-snmp for a long time, so he probably still knows a lot about it.

Comment 9 Dalibor Pospíšil 2017-10-13 09:05:08 UTC
I meant we want to ask developer of the free tool and as Jaromír left RedHat I asked you as a forwarder to proper person.

Comment 10 Ondrej Vasik 2017-11-06 13:25:18 UTC
Although Jaromir left RH, there is new maintainer of procps ;) - moving needinfo on Jan Rybar.

Comment 11 Jan Rybar 2017-11-20 15:46:37 UTC
'free' is taking information from /proc/meminfo file too. I can see within code that then the data is processed as described in Jaromír's Comment #2 and Comment #4. It seems that it is only a matter of interpretation of the values to the user.
https://gitlab.com/procps-ng/procps/blob/master/proc/sysinfo.c#L696

Jaromir mentioned that there already was a long discussion with the kernel team about the representation of values obtained from /proc/meminfo and I haven't found any reason to doubt the results.

In case of any concern, please, do not hesitate to leave any message here or PM me.

Comment 12 Josef Ridky 2019-04-03 09:39:52 UTC
Moving to RHEL-7.8

Comment 13 Josef Ridky 2019-04-12 09:43:59 UTC
Created attachment 1554747 [details]
Patch with solution for net-snmp in el7

Comment 18 Josef Ridky 2019-05-31 11:35:06 UTC
*** Bug 1601012 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2019-08-06 13:08:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2239

Comment 22 Jacob 2019-08-21 13:07:30 UTC
I am uncertain that the calculation provided with this patch is correct. On my system the memAvailReal value fetched with snmpwalk is significantly higher than the Linux provided MemAvailable in /proc/meminfo. Would it not make much more sense to just use the value provided by the kernel in /proc/meminfo (if it is available) rather than trying to calculate it in the snmp lib?

More info: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773

Comment 23 Josef Ridky 2019-08-21 13:23:38 UTC
The value in memAvailReal is computed based on [1].

For memAvailReal, we have to count MemFree + Buffers + Cached + SReclaimable and this value is provided as memAvailReal.

[1] http://www.net-snmp.org/docs/mibs/ucdavis.html

Comment 24 Jacob 2019-08-21 13:49:07 UTC
According to the Linux kernel commit message linked in my previous reply, the cached memory metric includes memory which is not freeable.

The definition provided in the net-snmp docs is fairly vague, unless I am missing something. I see only: "The amount of real/physical memory currently unused or available.". The kernel has a similar definition for the MemAvailable metric and seem to disagree with the number provided by snmp, i.e on a RHEL 7.7 system:

# yum info net-snmp-libs
Name        : net-snmp-libs
Arch        : x86_64
Epoch       : 1
Version     : 5.7.2
Release     : 43.el7
...

# snmpwalk -v3 -l authPriv -u [] -A [] -X [] -x AES -a SHA localhost 1.3.6.1.4.1.2021.4
....
UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 1256256 kB

# cat /proc/meminfo 
MemTotal:        1882560 kB
MemFree:          478664 kB
MemAvailable:    1044716 kB
....

I see no reason not to use the kernel provided metric rather than trying to estimate these values in net-snmp itself, unless the system is on an older kernel which doesn't include the MemAvailable metric.

Comment 25 Josef Ridky 2019-08-21 14:02:21 UTC
Understand. The point is net-snmp is using just following values from /proc/meminfo file:
MemTotal, MemFree, MemShared/Shmem, Buffers, Cached, SwapTotal, SReclaimable and SwapFree. All other values are ignored.

So unless net-snmp upstream decides to change way of parsing /proc/meminfo, we can't do much more.

Of course, I'll contact upstream authors with PR, but it can take long time, till they accept it or bring their own solution.