Bug 118574
Summary: | malloc exhausts memory to fast in mulithreaded program | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Sergey Kosenko <skosenko> | ||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.0 | CC: | drepper, tao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-12-20 18:14:12 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Sergey Kosenko
2004-03-17 21:18:42 UTC
200 threads with what thread stack size? By default thread stack is ulimit -s KB big, unless unlimited (in which case it is 2MB on IA-32). The default ulimit -s setting is 8MB, so 200 threads occupies ~1.6GB RAM, plus your ~ 1.4GB and unless you're using -bigmem kernel, 3GB is all virtual address space you have. You can change thread stack sizes via pthread_attr_setstacksize, or ulimit -s. ulimit -s 2048. I run no more then 20 threads at the time. I am getting attacment ready to post. I got "# define __PAGE_OFFSET (0xc0000000)" in asm/page.h of kernel so I assume 3 Gb of user memory minus VMALOC_RESERVE. Created attachment 98632 [details]
Reproducer
Comment on attachment 98632 [details]
Reproducer
Number of threads shoud be power of 10
Correction: Number of threads should be multiples of 10 Why do you consider ranbytes = 100; a // big alloc ? That sounds like a really small allocation, so your program attempts to do about 87000 malloc (20) and 87000 malloc (100) calls per thread. If you fix the ranbytes allocation, so that it does what you probably meant to do, the test passes just fine. The problem with the really small allocations from contending threads is that malloc uses separate arenas for each such thread to avoid the locking overhead. For big allocations (>= MMAP_THRESHOLD, which is by default 128K), each allocation is a separate mmap, but for small allocations malloc uses arenas with 1MB size which must be aligned to 1MB (so that malloc can quickly find out which arena a particular object belongs etc.). The OS doesn't provide any such way for mmap to be aligned, so arena.c uses (HEAP_MAX_SIZE == 1MB): /* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed. No swap space needs to be reserved for the following large mapping (on Linux, this is the case for all non-writable mappings anyway). */ p1 = (char *)MMAP(0, HEAP_MAX_SIZE<<1, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE); if(p1 != MAP_FAILED) { p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE-1)) & ~(HEAP_MAX_SIZE-1)); ul = p2 - p1; munmap(p1, ul); munmap(p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul); } else { /* Try to take the chance that an allocation of only HEAP_MAX_SIZE is already aligned. */ p2 = (char *)MMAP(0, HEAP_MAX_SIZE, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE); if(p2 == MAP_FAILED) return 0; if((unsigned long)p2 & (HEAP_MAX_SIZE-1)) { munmap(p2, HEAP_MAX_SIZE); return 0; } } This works well if mmap addresses usually grow up (if malloc is the only user of mmap in certain timeframe, it will most probably return addresses growing by 1MB), but in RHEL 3 the mmap addresses are allocated from top to bottom, so that sbrk is big enough etc. When mmap addresses given by kernel grow down, the address space will become fragmented (always 1MB mapped, 1MB unmapped). If malloc were the only user of mmap, this still wouldn't be a problem, once there are no more 2MB areas available, p1 = mmap will fail, but p2 = mmap will most probably succeed and return an aligned address. But your testcase dies on pthread_create, which unless you trim thread stack size from the default needs typically 8MB mmap, which once all of address space becomes fragmented is of course no longer available. Thanks, Jakub. I will have our sysadmins to apply the patch ASAP and than I'll try it. Sergey Kosenko, Banc of America Securities LLC 212-847-5486 I downloaded glibc-2.3.2-95.6.src.rpm from RHN, our sa's patched and installed it as glibc-2.3.2-95.6.1, and we tested it. Alloced memory amount grew to ~2266 Mb (from 1433 Mb before) but allocation speed dropped significantly(several times). Are we using the right glibc source? If not, where do I get the wright one from? Thanks. Were you building the i686 glibc? I.e. rpmbuild --target i686 -ba -v glibc.spec? No, sa did it for i386. Will do it for i686 now. Thanks The problem is solved! Thanks. I was able to alloc 2555 Mb with no speed penalty. Why didnât you guys put the fix into Update 2 of RHEL 3.0? I still could allocate only about 1.7Gb after U2 was applied? Part of 2.3.3-65 of FC3 now. The patch is in glibc-2.3.2-95.28 which ought to appear in U4 beta. An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-586.html |