Bug 1685715

Summary: glibc: [RFE] Reduce brk(2) syscall usage by implementing scaling heuristic.
Product: Red Hat Enterprise Linux 8 Reporter: Paolo Bonzini <pbonzini>
Component: glibcAssignee: glibc team <glibc-bugzilla>
Status: CLOSED UPSTREAM QA Contact: qe-baseos-tools-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.2CC: ashankar, codonell, dj, fweimer, mnewsome, pbonzini, pfrankli
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-16 14:59:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
small testcase
none
medium testcase
none
large testcase none

Description Paolo Bonzini 2019-03-05 21:28:37 UTC
Created attachment 1541152 [details]
small testcase

The workload triggered by the attached synthetic makefiles causes glibc to trigger a lot of brk() calls and therefore up to 50% execution time is spent in the kernel:

$ time make -f GNUmakefile cleann
make: ***  Nessuna regola per generare l'obiettivo "cleann".  Arresto.

real	0m0,289s
user	0m0,212s
sys	0m0,072s

$ time make -f BIGmakefile cleann
make: ***  Nessuna regola per generare l'obiettivo "cleann".  Arresto.

real	0m3,143s
user	0m1,716s
sys	0m1,401s

$ time make -f HUGEmakefile cleann
make: ***  Nessuna regola per generare l'obiettivo "cleann".  Arresto.

real	0m17,870s
user	0m9,057s
sys	0m8,681s

Comment 1 Paolo Bonzini 2019-03-05 21:29:19 UTC
Created attachment 1541153 [details]
medium testcase

Comment 2 Paolo Bonzini 2019-03-05 21:29:46 UTC
Created attachment 1541154 [details]
large testcase

Comment 3 Florian Weimer 2019-03-06 06:30:20 UTC
This appears to be a workaround:

MALLOC_TRIM_THRESHOLD_=900000 MALLOC_MMAP_THRESHOLD_=900000 make -f HUGEmakefile cleann

It looks like make performs a lot of medium-sized allocations.  The malloc heuristics treat this as a signal to minimize heap size.

The culprit appears to be the f_append case in do_variable_definition:

1237	            /* Paste the old and new values together in VALUE.  */
1238	
1239	            unsigned int oldlen, vallen;
1240	            const char *val;
1241	            char *tp = NULL;
1242	
1243	            val = value;
1244	            if (v->recursive)
1245	              /* The previous definition of the variable was recursive.
1246	                 The new value is the unexpanded old and new values. */
1247	              flavor = f_recursive;
1248	            else
1249	              /* The previous definition of the variable was simple.
1250	                 The new value comes from the old value, which was expanded
1251	                 when it was set; and from the expanded new value.  Allocate
1252	                 memory for the expansion as we may still need the rest of the
1253	                 buffer if we're looking at a target-specific variable.  */
1254	              val = tp = allocated_variable_expand (val);
1255	
1256	            oldlen = strlen (v->value);
1257	            vallen = strlen (val);
1258	            p = alloc_value = xmalloc (oldlen + 1 + vallen + 1);
1259	            memcpy (alloc_value, v->value, oldlen);
1260	            alloc_value[oldlen] = ' ';
1261	            memcpy (&alloc_value[oldlen + 1], val, vallen + 1);
1262	
1263	            free (tp);

And the old value is freed here:

1400	  v = define_variable_in_set (varname, strlen (varname), p,
1401	                              origin, flavor == f_recursive,
1402	                              (target_var
1403	                               ? current_variable_set_list->set : NULL),
1404	                              flocp);

This is always going to have bad (quadratic) performance.

If you are auto-generating the makefile anyway, you could set the .depfiles variable at the end (and perhaps the .ruletargets.* variables as well).

Comment 4 Paolo Bonzini 2019-03-06 07:43:38 UTC
> If you are auto-generating the makefile anyway, you could set the .depfiles variable at the end (and perhaps the .ruletargets.* variables as well).

Yes, that's what I am doing in fact (I was suprised to see that Make is this bad at processing += but it's easy to workaround as you say).  Still it seemed like a useful testcase; maybe the trim threshold could be increased a little bit for every "brk-down/brk-up" sequence that reaches again the original amount of memory, or something like that.

Comment 5 Florian Weimer 2019-03-06 08:01:38 UTC
(In reply to Paolo Bonzini from comment #4)
> Yes, that's what I am doing in fact (I was suprised to see that Make is this
> bad at processing += but it's easy to workaround as you say).  Still it
> seemed like a useful testcase; maybe the trim threshold could be increased a
> little bit for every "brk-down/brk-up" sequence that reaches again the
> original amount of memory, or something like that.

We implement something like this for the mmap threshold, but it doesn't work in this case, which is why both heuristics have to be disabled.  We'd need to scale the thresholds with the allocation size, which is not something that will be beneficial in general.

The challenge here is that a different process with a similar allocation sequence could go to sleep at any point, and then we'd would be stuck with a large, unused allocation.

Maybe we could scale the trim threshold with live heap size (e.g. if trimming would free less than 5% of heap size, don't consider trimming worthwhile), but this is difficult to measure accurately.  getrusage includes shared mappings as well (which we should not count).  Heap statistics we can gather directly might over-count memory blocks which have never been dirtied by the application and are therefore not actually resident.

Comment 6 Carlos O'Donell 2019-12-16 14:59:53 UTC
We are going to track this change upstream here:
https://sourceware.org/bugzilla/show_bug.cgi?id=25286

I'm marking this CLOSED/UPSTREAM.

Paolo, Thank you for the reproducers!