Bug 1124722 - free does not report correct 'cached' value or segfault
Summary: free does not report correct 'cached' value or segfault
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Rafael Aquini
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-30 07:35 UTC by Damien Gombault
Modified: 2014-12-29 14:18 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-29 14:18:36 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
CentOS 7436 None None None Never

Description Damien Gombault 2014-07-30 07:35:00 UTC
Hello.

My host is a VirtualBox container on a Linux machine (2 GB RAM).

When I run free on my CentOS 7 VM, the value seems correct.
(but there is a little difference between free and /proc/meminfo)

$ free
             total used free shared buffers cached
Mem: 1885584 966008 919576 16772 1444 718004
-/+ buffers/cache: 246560 1639024
Swap: 1048572 0 1048572

$ free -m
             total used free shared buffers cached
Mem: 1841 943 897 16 1 701
-/+ buffers/cache: 240 1600
Swap: 1023 0 1023

$ free -h
             total used free shared buffers cached
Mem: 1.8G 943M 897M 16M 1.4M 701M
-/+ buffers/cache: 240M 1.6G
Swap: 1.0G 0B 1.0G

$ cat /proc/meminfo | grep ^Cached:
Cached: 734776 kB

If I run free in a libvirt LXC container (1 GB RAM) soon after boot, free returns a strange (negative or oversized) value for 'cached'.
It can also segfaults :
-bash-4.2# free
             total used free shared buffers cached
Mem: 1048576 20824 1027752 16772 0 -8476
-/+ buffers/cache: 29300 1019276
Swap: 1048572 0 1048572

-bash-4.2# free -m
             total used free shared buffers cached
Mem: 1024 20 1003 16 0 18014398509481976
-/+ buffers/cache: 28 995
Swap: 1023 0 1023

-bash-4.2# free -h
             total used free shared buffers cached
Segmentation fault

-bash-4.2# cat /proc/meminfo | grep ^Cached:
Cached: 8296 KB

The trace I get with gdb :

#0 0x0000000000401d51 in scale_size (size=18446744073709543144,
    flags=flags@entry=2, args=...) at free.c:145
0000001 0x0000000000401567 in main (argc=<optimized out>, argv=<optimized out>)
    at free.c:306

Comment 2 Jaromír Cápík 2014-08-01 15:13:57 UTC
Hello Damien.

Could you please attach a copy of your /proc/meminfo file here?

Thank you.

Regards,
Jaromir.

Comment 3 Jaromír Cápík 2014-08-01 15:20:41 UTC
... I meant a copy of the /proc/meminfo from your guest taken when the issue is reproducible ...

Comment 4 Damien Gombault 2014-08-01 16:57:04 UTC
Hello.

The segfault happens soon after the boot, when the 'cached' value is low :

-bash-4.2# cat /proc/meminfo ; free ; free -h
MemTotal:        1048576 KB
MemFree:         1029552 KB
MemAvailable:    1553204 kB
Buffers:               0 KB
Cached:             8320 KB
SwapCached:            0 kB
Active:            10692 KB
Inactive:           8280 KB
Active(anon):      10676 KB
Inactive(anon):     8216 KB
Active(file):         16 KB
Inactive(file):       64 KB
Unevictable:           0 KB
Mlocked:               0 kB
SwapTotal:       1048572 kB
SwapFree:        1048572 kB
Dirty:                24 kB
Writeback:             0 kB
AnonPages:        104420 kB
Mapped:            42716 kB
Shmem:             16804 kB
Slab:              85036 kB
SReclaimable:      55556 kB
SUnreclaim:        29480 kB
KernelStack:        1128 kB
PageTables:         6348 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1991364 kB
Committed_AS:     530920 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        7488 kB
VmallocChunk:   34359726244 kB
HardwareCorrupted:     0 kB
AnonHugePages:      8192 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       67520 kB
DirectMap2M:     2029568 kB
             total       used       free     shared    buffers     cached
Mem:       1048576      19008    1029568      16804          0      -8484
-/+ buffers/cache:      27492    1021084
Swap:      1048572          0    1048572
             total       used       free     shared    buffers     cached
Segmentation fault

Comment 5 Damien Gombault 2014-08-01 17:00:49 UTC
If I "use" the machine, the cached value grows up and the segfault disapears :

-bash-4.2# cat /proc/meminfo ; free ; free -h
MemTotal:        1048576 KB
MemFree:          805036 KB
MemAvailable:    1488868 kB
Buffers:               0 KB
Cached:           219928 KB
SwapCached:            0 kB
Active:            42128 KB
Inactive:         201316 KB
Active(anon):      23540 KB
Inactive(anon):     8216 KB
Active(file):      18588 KB
Inactive(file):   193100 KB
Unevictable:           0 KB
Mlocked:               0 kB
SwapTotal:       1048572 kB
SwapFree:        1048572 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        121276 kB
Mapped:            43256 kB
Shmem:             16804 kB
Slab:             155400 kB
SReclaimable:     115768 kB
SUnreclaim:        39632 kB
KernelStack:        1144 kB
PageTables:         6612 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1991364 kB
Committed_AS:     548316 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        7488 kB
VmallocChunk:   34359726888 kB
HardwareCorrupted:     0 kB
AnonHugePages:     12288 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       69568 kB
DirectMap2M:     2027520 kB
             total       used       free     shared    buffers     cached
Mem:       1048576     243512     805064      16804          0     203124
-/+ buffers/cache:      40388    1008188
Swap:      1048572          0    1048572
             total       used       free     shared    buffers     cached
Mem:          1.0G       237M       786M        16M         0B       198M
-/+ buffers/cache:        39M       984M
Swap:         1.0G         0B       1.0G

Notice the value given by free and /proc/meminfo are not equal.
I don't know if this is normal.

Comment 6 Jaromír Cápík 2014-08-01 18:40:10 UTC
Thanks, Damien.

The value returned by 'free' differs from 'free -h' because of KiBi/MiBi units usage. Note, that 1 MB = 1024 kB (not 1000kB) in binary units.

And the difference between 'free' and 'cat /proc/meminfo' is caused by the fact, that you took both outputs in a different time. The values in /proc/meminfo fluctuate really quickly and you can hardly get the same values in two consequent samples even when the delay between them is almost zero.

What makes me a bit curious are those capital 'K' letters in the 'kB' units on few important lines. I've never seen that. Anyway, that doesn't seem to be the root cause.
After looking at the entries, I see that "HardwareCorrupted" is longer than 16 characters reserved for the entry name. This very probably causes a buffer overrun and unexpected behaviour. We need to increase the buffer size and replace 'strcpy' with safer variant called 'strncpy' to prevent from buffer overruns in the future.

Comment 7 Jaromír Cápík 2014-08-01 18:44:10 UTC
I was wrong with the "HardwareCorrupted". There's a size check that skips too long names.

Comment 8 Jaromír Cápík 2014-08-01 19:01:25 UTC
I was able to reproduce the issue with your /proc/meminfo. The issue is caused by subtraction of Shmem from Cached that we already removed upstream.

So, from the procps-ng perspective this is a duplicate of Bug 1070736.

But, I'm not going to close this as this doesn't answer the question why Cached is lower than Shmem when Shmem is supposed to live in the page cache and therefore should always be lower than Cached.

We need to change the component to kernel.

Comment 9 Rafael Aquini 2014-12-19 19:14:33 UTC
Does this behaviour reproduces if running on bare-metal, or on KVM guests? We do not support/test RHEL on 3rd party hosted emulation platforms such as VirtualBox, and the funky behaviour observed here might be due to some addition (modules) these emulators load into guests to accelerate the general response time of the guest.

Comment 10 Rafael Aquini 2014-12-19 20:52:55 UTC
Please, disregard my last comment. I now figured I overlooked the part you wrote "If I run free in a libvirt LXC container ... "

Long story in short: as most of the Linux tools providing system resource metrics were created before cgroups even existed you can not rely on information provided by free, vmstat, top & friends being collected from within a container because they all gather their data samples from procfs (/proc/meminfo; /proc/vmstat; ...) which is not "containerized".

That statement is an oversimplification of the whole picture, so please take a look at this nice blog article for further details:
http://fabiokung.com/2014/03/13/memory-inside-linux-containers/


This is not a kernel bug, per se, but a design characteristic that will eventually will be addressed as containers get more popular and utilized.

For now, I'd be inclined to close this as NOTABUG, but I'll wait for your feedback first.

Comment 11 Damien Gombault 2014-12-19 23:37:48 UTC
Hi.

Thank you for this very interesting link.
I was not aware that I cannot rely on standard Linux tools in containers.

I think you can close this bug as NOTABUG.
Then I will close the CentOS bug I opened.


Note You need to log in before you can comment on or make changes to this bug.