Created attachment 1541152 [details] small testcase The workload triggered by the attached synthetic makefiles causes glibc to trigger a lot of brk() calls and therefore up to 50% execution time is spent in the kernel: $ time make -f GNUmakefile cleann make: *** Nessuna regola per generare l'obiettivo "cleann". Arresto. real 0m0,289s user 0m0,212s sys 0m0,072s $ time make -f BIGmakefile cleann make: *** Nessuna regola per generare l'obiettivo "cleann". Arresto. real 0m3,143s user 0m1,716s sys 0m1,401s $ time make -f HUGEmakefile cleann make: *** Nessuna regola per generare l'obiettivo "cleann". Arresto. real 0m17,870s user 0m9,057s sys 0m8,681s
Created attachment 1541153 [details] medium testcase
Created attachment 1541154 [details] large testcase
This appears to be a workaround: MALLOC_TRIM_THRESHOLD_=900000 MALLOC_MMAP_THRESHOLD_=900000 make -f HUGEmakefile cleann It looks like make performs a lot of medium-sized allocations. The malloc heuristics treat this as a signal to minimize heap size. The culprit appears to be the f_append case in do_variable_definition: 1237 /* Paste the old and new values together in VALUE. */ 1238 1239 unsigned int oldlen, vallen; 1240 const char *val; 1241 char *tp = NULL; 1242 1243 val = value; 1244 if (v->recursive) 1245 /* The previous definition of the variable was recursive. 1246 The new value is the unexpanded old and new values. */ 1247 flavor = f_recursive; 1248 else 1249 /* The previous definition of the variable was simple. 1250 The new value comes from the old value, which was expanded 1251 when it was set; and from the expanded new value. Allocate 1252 memory for the expansion as we may still need the rest of the 1253 buffer if we're looking at a target-specific variable. */ 1254 val = tp = allocated_variable_expand (val); 1255 1256 oldlen = strlen (v->value); 1257 vallen = strlen (val); 1258 p = alloc_value = xmalloc (oldlen + 1 + vallen + 1); 1259 memcpy (alloc_value, v->value, oldlen); 1260 alloc_value[oldlen] = ' '; 1261 memcpy (&alloc_value[oldlen + 1], val, vallen + 1); 1262 1263 free (tp); And the old value is freed here: 1400 v = define_variable_in_set (varname, strlen (varname), p, 1401 origin, flavor == f_recursive, 1402 (target_var 1403 ? current_variable_set_list->set : NULL), 1404 flocp); This is always going to have bad (quadratic) performance. If you are auto-generating the makefile anyway, you could set the .depfiles variable at the end (and perhaps the .ruletargets.* variables as well).
> If you are auto-generating the makefile anyway, you could set the .depfiles variable at the end (and perhaps the .ruletargets.* variables as well). Yes, that's what I am doing in fact (I was suprised to see that Make is this bad at processing += but it's easy to workaround as you say). Still it seemed like a useful testcase; maybe the trim threshold could be increased a little bit for every "brk-down/brk-up" sequence that reaches again the original amount of memory, or something like that.
(In reply to Paolo Bonzini from comment #4) > Yes, that's what I am doing in fact (I was suprised to see that Make is this > bad at processing += but it's easy to workaround as you say). Still it > seemed like a useful testcase; maybe the trim threshold could be increased a > little bit for every "brk-down/brk-up" sequence that reaches again the > original amount of memory, or something like that. We implement something like this for the mmap threshold, but it doesn't work in this case, which is why both heuristics have to be disabled. We'd need to scale the thresholds with the allocation size, which is not something that will be beneficial in general. The challenge here is that a different process with a similar allocation sequence could go to sleep at any point, and then we'd would be stuck with a large, unused allocation. Maybe we could scale the trim threshold with live heap size (e.g. if trimming would free less than 5% of heap size, don't consider trimming worthwhile), but this is difficult to measure accurately. getrusage includes shared mappings as well (which we should not count). Heap statistics we can gather directly might over-count memory blocks which have never been dirtied by the application and are therefore not actually resident.
We are going to track this change upstream here: https://sourceware.org/bugzilla/show_bug.cgi?id=25286 I'm marking this CLOSED/UPSTREAM. Paolo, Thank you for the reproducers!