Bug 799327
Summary: | MALLOC_ARENA_MAX=1 does not work in RHEL 6.2 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Peter Klotz <peter.klotz> | ||||
Component: | glibc | Assignee: | Jeff Law <law> | ||||
Status: | CLOSED ERRATA | QA Contact: | qe-baseos-tools-bugs | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.2 | CC: | fweimer, mfranc | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 800240 (view as bug list) | Environment: | |||||
Last Closed: | 2012-03-02 16:21:55 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 800240 | ||||||
Attachments: |
|
Description
Peter Klotz
2012-03-02 13:56:56 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0058.html Thanks for the information about the errata. I was not aware of that. Will there be such an errata von RHEL 5, too? There the bug is also present. After using the updated glibc, MALLOC_ARENA_MAX works as expected and can no longer be exceeded. However I am still observing strange behavior. Performing a well defined test against one of our servers shows massive RSS memory usage. When the process is run with MALLOC_CHECK_=3, no such behavior occurs. No environment variable set (4 CPU cores -> 32 arenas) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND After startup: 11334 pkl 20 0 594m 10m 8420 S 0.3 0.1 0:00.70 InboundServer.c After test run: 11334 pkl 20 0 2233m 481m 9100 S 0.3 6.1 0:25.23 InboundServer.c export MALLOC_ARENA_MAX=1 (one arena due to errata RPM) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND After startup: 11472 pkl 20 0 208m 9.8m 8136 S 0.8 0.1 0:00.07 InboundServer.c After test run: 11472 pkl 20 0 3481m 3.2g 9120 S 0.3 41.2 0:57.83 InboundServer.c export MALLOC_CHECK_=3 (always a single arena) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND After startup: 11714 pkl 20 0 209m 9.9m 8128 S 0.3 0.1 0:00.06 InboundServer.c After test run: 11714 pkl 20 0 258m 19m 8872 S 0.0 0.3 0:33.15 InboundServer.c The test run with "MALLOC_CHECK_=3" is perfectly reproducible. The other two cases lead to different RSS values in each run. This makes me suspect, that there may be another bug hidden somewhere in glibc that is only triggered, if MALLOC_CHECK_ is not used. Until recently there hasn't been any customer interest in this issue for Red Hat Enterprise Linux 5. I can certainly put it on the list of issues to review/prioritize for the next release of Red Hat Enterprise Linux 5. MALLOC_CHECK=3 uses a completely different allocator which avoids most of the optimizations for the sake of providing an environment which makes debugging malloc problems easier. Setting M_TRIM_THRESHOLD to zero and rerunning your tests would tell us if it's a problem with free memory at the top of heap not being returned promptly to the system. Setting M_MMAP_THRESHOLD to a lower value will force more allocations (anything larger than M_MMAP_THRESHOLD) to be made via mmap. This can often help with fragmentation of the heap as mmaped areas can always be returned while only the free memory between the last live object and the top of the heap can be returned when allocating via the heap. Setting M_MMAP_MAX to a high value can also help if you've got a large number of live mmapped objects. Basically M_MMAP_MAX is the maximum number of live objects allocated by mmap at any given time. You can also use malloc_stats() to get information about where/how memory has been allocated. Similarly mallinfo can give similar information, but in a form more readily useable by a program rather than a human. Hello Jeff Thank you very much for your hints regarding this problem and for creating the RHEL 5 bug. The problem I am seeing disappears when setting M_TRIM_THRESHOLD. What is still strange is the following: The value of M_TRIM_THRESHOLD in the mallopt() call does not matter. My problem disappers for values 0, 128kB and even 30MB. Since 128kB is the default value of M_TRIM_THRESHOLD I assumed, this would make the massive RSS usage reappear. Digging through the glibc source I found the following: Setting M_TRIM_THRESHOLD with mallopt() also sets internal flag mp_.no_dyn_threshold to 1. Initially this flag is always zero. This is the only code, where this flag is used (malloc/malloc.c): if (!mp_.no_dyn_threshold && p->size > mp_.mmap_threshold && p->size <= DEFAULT_MMAP_THRESHOLD_MAX) { mp_.mmap_threshold = chunksize (p); mp_.trim_threshold = 2 * mp_.mmap_threshold; } Since mp_.no_dyn_threshold is 1 after the mallopt() call, this code is no longer executed. Could this flag in fact be the trigger, not the value of M_TRIM_THRESHOLD? Sigh. I forgot that tweaking many of the M_* environment variables will turn off the dynamic sbrk/mmap heuristics. Basically those heuristics attempt to identify transient vs long lived allocations with the former landing in the heap and the latter in mmap'd space. It starts with the assumption that < 128Kb allocations are transient and > 128Kb are long lived. When memory is freed that was allocated via mmap, then the dynamic threshold goes up (with a clamp at 64Mb). So what's happening is the mmap threshold is being held constant after you set M_TRIM_THRESHOLD. If that's keeping your process from growing, that's a good indication that you've got a fragmentation issue. By keeping the mmap threshold from growing, more allocations are occurring via mmap which are trivially released back to the system. If the mmap threshold grows, then those allocations will occur via sbrk which are then subject to the restriction that only the free memory beyond the last live allocation can be returned to the system. It can be worth the effort to spend some time tuning the trim and mmap thresholds, particularly for long running processes. Hello Jeff Sorry for the late response but it took us a while to update a production server to RHEL 5.8 and get our latest binaries there up and running. Internally I now have two quite different test programs that show memory fragmentation and are "cured" using M_TRIM_THRESHOLD with a value of 128kB. I observe identical behavior in RHEL 5.8 and 6.2. Both test programs extract certain parts of our production server that I suspected to be part of the problem. To my disappointment the M_TRIM_THRESHOLD fix (with 128kB and even with value 0) does not work in our RHEL 5.8 production server. It still shows growing RSS usage. So there seems to be another aspect of the problem that I was not able to reproduce in my tests so far. But what is much worse: I am seeing calls blocking in memcpy() for quite some time when a process using the M_TRIM_THRESHOLD runs for a longer time period under high load. Only a few of several thousand client requests end up like this. Even if all clients are stopped, server threads calling memcpy() keep running for hours. Then switching back to a process that does not call mallopt() does not help. We must reboot the machine to get sane memory allocation behavior back. So far this behavior only occurs in our production system (2*Opteron CPU/8 cores, 32GB RAM). I never managed to reproduce anything like that internally but I have to admit that I do not have similar hardware to test. This is a typical backtrace taken from a single thread in a live process: Thread 2 (Thread 0x4cc6c940 (LWP 1459)): #0 0x0000003b0b87c629 in memcpy () from /lib64/libc.so.6 #1 0x0000003b1049c260 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib64/libstdc++.so.6 #2 0x0000003b1049cb45 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned long) () from /usr/lib64/libstdc++.so.6 #3 0x0000003b1049cfaf in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::append(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6 #4 0x0000000000dd82e8 in Poco::XML::CharacterData::appendData(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () #5 0x0000000000dd9567 in Poco::XML::DOMBuilder::characters(char const*, int, int) () #6 0x0000000000df0445 in doContent(XML_ParserStruct*, int, encoding const*, char const*, char const*, char const**, unsigned char) () #7 0x0000000000df1404 in contentProcessor(XML_ParserStruct*, char const*, char const*, char const**) () Interestingly all backtraces that end in memcpy() look the same although we allocate memory using several more code paths. Is this memcpy() blocking issue a known side effect of using mallopt() or should this be considered a kernel bug? I assume it has something to do with the kernel since why would only a reboot finally fix the blocking issue. I'm not aware of any interaction between memcpy & mallopt, nor anything which would cause memcpy to block. Certainly there's no blocking/locking within memcpy itself. The only thing I could think of would be if the memory was lazily allocated by the kernel, then real memory used to back the allocation upon receiving a page fault. I don't know of a way to directly check this, and it's just a theory, I don't know if the kernel actually works this way for mmap'd regions or not. We might get some clues if you could attach to the blocked process via gdb and do something like x/10i $pc i all-r Then dump /proc/<pid>/maps That should allow us to determine what address is locked up, and we ought to be able to determine if it's a heap, stack or other address. Hello Jeff The blocking memcpy() issue was in fact a poorly scaling algorithm in our software that was triggered by client requests. This special kind of requests started exactly when I was investigating the fragmentation issue. So I falsely assumed a connection with the mallopt() call. I hope that by eliminating this algorithm I got rid of another memory fragmentation candidate. The code performed far too many string copy operations. I will continue to monitor memory usage of our production system. Hello Jeff Finally I have good news. I was able to find and fix one memory leak in our production server. Setting M_TRIM_THRESHOLD to 128kB now limits the memory usage of our production server process to approx. 2GB. This value is reached a few hours after starting the server. Currently the server process is up and running for 4 months. Thanks a lot for your help! Regards, Peter. Peter, Glad to hear it! |