RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 921676 - free() doesn't honor M_TRIM_THRESHOLD
Summary: free() doesn't honor M_TRIM_THRESHOLD
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glibc
Version: 7.0
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: glibc team
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1110700 1191021
TreeView+ depends on / blocked
 
Reported: 2013-03-14 16:08 UTC by Daniel Vrátil
Modified: 2020-04-28 17:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 17:03:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
testcase (712 bytes, text/x-c++src)
2013-03-14 16:08 UTC, Daniel Vrátil
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 14827 0 P2 NEW free() doesn't honor M_TRIM_THRESHOLD 2020-06-09 16:31:35 UTC

Description Daniel Vrátil 2013-03-14 16:08:00 UTC
Created attachment 710111 [details]
testcase

Description of problem:
free() isn't calling brk() to give memory back to kernel when M_TRIM_TRESHOLD is passed.

Version-Release number of selected component (if applicable):
glibc-2.16-29.el7.x86_64
kernel-3.7.0-0.36.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
Run the attached test-case.

What it does:
1. Calls malloc() 2800000 times
2. Calls free()   2800000 times
3. pauses, so you can inspect heap size
  
Actual results:
You'll see that the heap size is around 250 MB.

Expected results:
Heap size is 4 K.

Additional info:
Manually calling malloc_trim(), through gdb, decreases the heap size to 4 K.
----------------------------------------------------

How I measured heap size:

  $ cat /proc/12345/maps | grep heap
    01bc6000-0f180000 rw-p 00000000 00:00 0    [heap]

  $ python
    > (0x0f180000-0x01bc6000) / (1024*1024)
    > 213
    213 Megabytes

  $ top -p12345 # tested with top too
    227m 214m for VIRT and RES respectively

  $ gdb -pid 12345            # Lets attach gdb and call malloc_trim()
    > call malloc_trim(0)

  $ top -p12345
    14492 1076 for VIRT and RES respectively

  $ cat /proc/12345/maps | grep heap
    01bc6000-01bc7000 rw-p 00000000 00:00 0 [heap]

  $ python
    > (0x01bc7000-0x01bc6000) / (1024*1024)
    > 0.00390625 // 4KB 
------------------------------------------------------------

This seems to be caused due to the "fastbins" features.

free() doesn't trim fastbins because the malloc() was less than M_MXFAST.

But there really should be a limit to the number of fastbins that we keep
around.

In KDE we've seen 600MB of memory being freed after attaching gdb and calling
malloc_trim(0)

Comment 2 Carlos O'Donell 2013-03-14 18:16:04 UTC
I can reproduce this issue.

Allocating a large number of 64-byte chunks and freeing a large number of 64-bytes chunks (all fastbin allocations less than 128 bytes on x86_64) will never trigger the fastbin consolidation (consolidation triggered by a free of size 65k or larger) that eventually leads to full trimming.

The solution is to call `mallopt (M_MXFAST, 0)'. If you need trimming and have lots of small objects you must disable fastbins.

The concept of "fastbins" is completely opposed to trimming. The point is that you want to have a pool of small bins that are easy to use and not consolidated. Unfortunately you can't trim unless you consolidate the fastbins. You don't want to consolidate on each free because that's costly.

However, you are correct in that M_TRIM_THRESHOLD is not honoured because it interacts with the default value for M_MXFAST.

Would updating the documentation help?

That is:

(a) Document M_MXFAST, since it is currently not documented in the glibc manual.

(b) Indicate in M_TRIM_THRESHOLD that if M_MXFAST is not disabled that trimming will not happen unless a block of at least 65k is done.

Comments?

Comment 3 Sergio Martins 2013-03-14 19:18:29 UTC
IMHO a real solution would be to limit the number of fastbins. There's no benefit in having an infinite sized fastbin pool.

Comment 4 Carlos O'Donell 2013-03-15 14:25:20 UTC
My suggestion is as follows:

(a) Use mallinfo() to look at the number of bytes being used by fastbins e.g. fsmblks.

(b) If this size exceeds your application specific requirements for fastbins then call malloc_trim().

You can do (a) whenever your application is doing normal housekeeping, and you can do (b) whenever required.

We provide a lot of flexibility here, but we want the default cause to perform quickly, and use memory to increase that performance.

Limiting fastbins by default is going to require much more discussion and information about exactly what was consuming memory in "KDE" (which you can check by using mallinfo() to examine what is in use. Any new heuristics will need to be carefully balanced against the existing high performance implementation.

Does that make sense?

Comment 5 Sergio Martins 2013-03-17 20:57:30 UTC
The problem is that we're moving the burden of such fine grained memory management to the application developer, instead of it being done by the malloc implementation.

KDE has more than 100 applications, it's not viable to study memory requirements for each application.

How do we handle dependencies that are not part of the KDE project? I've seen savings of 300MB on the virtuoso server.


GNOME probably has the same problem. It's a matter of trying it out.

Comment 6 Carlos O'Donell 2013-03-17 23:24:14 UTC
Please file another BZ where we can discuss enhancements to the glibc allocator and the use of fastbins. Please provide detailed mallinfo statistics for KDE and gnome and use cases where you are seeing problems with fastbins and memory consumption.

This particular issue is about free() not honouring M_TRIM_THRESHOLD, which I plan to fix by enhancing the documentation to explain how fastbins interact with trimming.

Comment 7 Jon Levell 2013-09-17 22:32:18 UTC
Was that additional bug filed? The man page for malloc_trim seems to be entirely inaccurate and the need to call it manually seems remarkably Google-proof, fixing the allocator to make it less needed (as well as improving the docs) seems like a good idea. 

If it was filed, linking to it from this bug seems useful.

Comment 9 Carlos O'Donell 2013-10-07 02:46:35 UTC
(In reply to Jon Levell from comment #7)
> Was that additional bug filed? The man page for malloc_trim seems to be
> entirely inaccurate and the need to call it manually seems remarkably
> Google-proof, fixing the allocator to make it less needed (as well as
> improving the docs) seems like a good idea. 
> 
> If it was filed, linking to it from this bug seems useful.

I don't know if an additional bug was filed.

I've added to my list that the man pages project needs an update for this also.

Comment 14 Carlos O'Donell 2016-11-28 13:45:02 UTC
I'm moving this to rhel-7.5 for later review.

My expectation is that this is unsolvable with existing fastbins by virtue of the semantics of the way the trimming works, it's a per-free trim check not a cumulative integral of the freed back value.

Instead the solution I would propose is disabling fastbins and moving to lockless per-thread caches which have shown much better performance. These per-thread caches would be limited to a fixed RSS size. We have this work upstream in the dj/malloc branch but it still needs more operational hours of testing before we can deploy this into RHEL. However, that doesn't stop us from planning to move in that direction.

If anyone on this ticket has any objections to this plan, please comment here.

Comment 15 Carlos O'Donell 2018-04-03 04:32:44 UTC
We have thread-locale caches upstream now, but we also found fastbins continue to provide a significant performance benefit. The next step is going to be to look at limiting the number of fastbins in use. This work still needs to be done upstream.

Comment 17 Carlos O'Donell 2020-04-28 17:03:00 UTC
We are going to track this issue as part of this upstream bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=14827

I'm going to mark this bug as CLOSED/UPSTREAM. When we have an upstream solution in place we can consider the backport to RHEL7 or RHEL8 as appropriate.


Note You need to log in before you can comment on or make changes to this bug.