Bug 1271754
Summary: | tcmalloc blinds valgrind | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Markus Armbruster <armbru> |
Component: | valgrind | Assignee: | Mark Wielaard <mjw> |
Status: | CLOSED ERRATA | QA Contact: | Miloš Prchlík <mprchlik> |
Severity: | unspecified | Docs Contact: | Tomas Capek <tcapek> |
Priority: | unspecified | ||
Version: | 7.0 | CC: | codonell, eblake, famz, fche, fweimer, fziglio, huding, jakub, jsnow, juzhang, knoel, mbenitez, mcermak, mjw, mprchlik, mst, ohudlick, pbonzini, thuth, virt-maint, xfu |
Target Milestone: | rc | ||
Target Release: | 7.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | valgrind-3.11.0-20.el7 | Doc Type: | Enhancement |
Doc Text: |
Interception of user-defined allocation functions in *valgrind*
Some applications do not use the *glibc* allocator. Consequently, it was not always convenient to run such applications under *valgrind*. With this update, *valgrind* tries to automatically intercept user-defined memory allocation functions as if the program used the normal *glibc* allocator, making it possible to use memory tracing utilities such as *memcheck* on those programs out of the box.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-04 02:55:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Markus Armbruster
2015-10-14 15:19:48 UTC
The glibc team is still interested in working with virt team on figuring out exactly where in glibc malloc we're loosing performance and patching to help provide better performance. We recently found a free list issue which could cause high arena lock contention if the workload creates and destroys threads on a regular basis (as opposed to long running threads). In bug 693262 the average performance gain for tcmalloc was ~4%, so there is a rather small margin there for deciding one way or the other, particularly if valgrind isn't going to work well, and if you see regressions in other use cases (bug 1251353). I'm not here to judge what the virt team should do, I am here to help provide options if we want to continue with glibc's allocator and make a few enhancements to get back that ~4%. We are currently fixing a glibc bug which may be related, particularly if you are switching away from glibc malloc due to high contention. qemu-kvm appears to be affected by this glibc bug when running on top glibc malloc (not tcmalloc): <https://sourceware.org/bugzilla/show_bug.cgi?id=19048> <https://bugzilla.redhat.com/show_bug.cgi?id=1264189> This bug can result in serious contention because the regular arena selection logic is completely bypassed, and many threads could end up hitting very few arenas (I've seen it go down to a single arena, see below). I have written a short script which allows to check running processes non-destructively: <https://sourceware.org/bugzilla/attachment.cgi?id=8718> The bug is sticky. Once a process is in this state, it will remain in it until it exits, and the script above will show that the arena free list has turned cyclic. I don't have many VMs to test, but one harmless one (mostly libvirt defaults, with one attached disk in raw format) has run into this bug: (gdb) print free_list $1 = (mstate) 0x7ff088000020 (gdb) print free_list->next_free == free_list $10 = 1 And indeed, threads 2 and 3 share the same arena: (gdb) thread apply all print __libc_tsd_MALLOC Thread 5 (Thread 0x7ff16ed5a700 (LWP 20032)): $5 = (void *) 0x7ff168000020 Thread 4 (Thread 0x7ff16e559700 (LWP 20033)): $6 = (void *) 0x7ff160000020 Thread 3 (Thread 0x7ff16d9ff700 (LWP 16821)): $7 = (void *) 0x7ff088000020 Thread 2 (Thread 0x7ff06affd700 (LWP 16822)): $8 = (void *) 0x7ff088000020 Thread 1 (Thread 0x7ff18531ab00 (LWP 20030)): $9 = (void *) 0x7ff17b763760 <main_arena> Thread 2 and 3 are QEMU worker threads. I don't know what these threads are doing, and what the actual impact of increased malloc contention between the two threads is. This was observed with qemu-kvm-1.6.2-13.fc20.x86_64 (which I know is quite old). I can trigger the glibc bug pretty reliably with qemu-kvm-2.3.1-3.fc22.x86_64 while interacting with the VM through virt-manager (note that it's still the qemu-kvm process that runs into this). It would be interesting to see if any your performance tests hit this bug. If they do, I can provide fixed glibc scratch builds. If your workload is producer-consumer-style and you hit contention during deallaction, maybe there is something we can do inside glibc malloc to reduce that (while still being conservative enough to qualify for backporting). I attached a new check script to the upstream glibc bug: <https://sourceware.org/bugzilla/show_bug.cgi?id=19048> This version prints an error if no glibc debugging information is available (the old version would print nothing). You can use the following options to get I/O patterns that match the interesting ones: -drive if=none,id=hd,file=$PATH,aio=native,format=raw -object iothread,id=io -device virtio-blk-pci,drive=hd,iothread=io Some worker threads will still be periodically created, but they do not allocate any memory. All memory allocations happens in the thread whose start function is in iothread.c. Our need is basically a very fast (a few hundred clock cycles), entirely thread-local path for reusing objects that have just been freed and are reallocated. (In reply to Paolo Bonzini from comment #6) > You can use the following options to get I/O patterns that match the > interesting ones: > > -drive if=none,id=hd,file=$PATH,aio=native,format=raw > -object iothread,id=io > -device virtio-blk-pci,drive=hd,iothread=io > > Some worker threads will still be periodically created, but they do not > allocate any memory. All memory allocations happens in the thread whose > start function is in iothread.c. I get the following backtraces: #0 0x000056042924e4b0 in virtio_blk_alloc_request (s=s@entry=0x56042c90d590) at /usr/src/debug/qemu-2.3.1/hw/block/virtio-blk.c:32 #1 0x000056042924f7e4 in handle_notify (e=0x56042cb25108) at /usr/src/debug/qemu-2.3.1/hw/block/dataplane/virtio-blk.c:107 #2 0x0000560429454e88 in aio_dispatch (ctx=ctx@entry=0x56042b7bcd30) at aio-posix.c:158 #3 0x0000560429455062 in aio_poll (ctx=0x56042b7bcd30, blocking=<optimized out>) at aio-posix.c:248 #4 0x00005604292ec569 in iothread_run (opaque=0x56042b7ba020) at iothread.c:44 #0 0x000056042924e5a0 in virtio_blk_free_request (req=req@entry=0x7f0b58218770) at /usr/src/debug/qemu-2.3.1/hw/block/virtio-blk.c:44 #1 0x000056042924f80a in handle_notify (e=0x56042cb25108) at /usr/src/debug/qemu-2.3.1/hw/block/dataplane/virtio-blk.c:111 #2 0x0000560429454e88 in aio_dispatch (ctx=ctx@entry=0x56042b7bcd30) at aio-posix.c:158 #3 0x0000560429455062 in aio_poll (ctx=0x56042b7bcd30, blocking=<optimized out>) at aio-posix.c:248 #4 0x00005604292ec569 in iothread_run (opaque=0x56042b7ba020) at iothread.c:44 Does this look right? Allocation and deallocation happens on the same thread. Did I pick the right allocation/deallocation functions? I maintain valgrind and would like to make it so that at least on fedora/rhel/dts usage of tcmalloc/jemalloc as library or staticly linked into the executable is detected by valgrind automatically. But what the best way to do that is still a bit of a question. valgrind doesn't really know about ELF symbol interposition, it matches the symbols against specific library DT_SONAME. There is a simple replacement matching mechanism to tell valgrind to match the "somalloc" functions against a replacement library so name. As used in the description --soname-synonyms='somalloc=*tcmalloc*' to intercept the somalloc functions in any library with so name matching the regular expression "*tcmalloc*". And to intercept the somalloc functions in the main executable one would use --soname-synonyms=somalloc=NONE). So if the user know what library (or the executable) interposes an alternative malloc/free implementation it can make valgrind aware. So the simplest option might be to make valgrind aware of tcmalloc, jemalloc and NONE by default. I think there is a small risk when using NONE if an executable plays tricks interposing some symbols but not really override them (maybe to keep statistics). But there should not be much/any risk always intercepting tcmalloc or jemalloc share library symbols. A few questions about this bug: - Should this bug be assigned to valgrind? - What other packages should we make sure work with this? I found the following, but except for firefox none are in main RHEL only in Fedora/EPEL, but maybe they are they in layered products? I might have missed some since finding any that staticly link tcmalloc/jemalloc is not easy as far as I can tell. - qemu-kvm-rhev (this bug) - firefox (staticly links jemalloc, but might need some other tricks to really work under valgrind. Upstream tells me that only running debug builds under valgrind is support.) - varnish (use jemalloc, EPEL) - nfs-ganesha (jemalloc, EPEL) - ceph (tcmalloc, EPEL) - nginx (tcmalloc, EPEL) - mongodb (tcmalloc, EPEL) - Pound (tcmalloc, EPEL) - redis (jemalloc, EPEL) And a question on how to actually get the qemu-kvm-rhev. I have RHEL7 installed on my workstation through RHN with Employee SKU. But I couldn't figure out how to install qemu-kvm-rhev. So I cheated and pulled it out of brew (which does work and I can replicate the issue and workarounds). Mark, as far as I know tcmalloc is not used outside virt by RHEL, only by layered products. Ceph is in Red Hat Storage and MongoDB is in Satellite. The glibc performance aspect is investigated in bug 1275472. (In reply to Florian Weimer from comment #12) > The glibc performance aspect is investigated in bug 1275472. OK, then if people don't mind I'll take this as a valgrind enhancement bug to make sure valgrind will recognize alternative malloc implementations by default in the future. I am already working on an (upstream) patch. I posted an upstream patch: https://bugs.kde.org/show_bug.cgi?id=355188 Patch pushed upsteam as valgrind svn r15726 and backported to fedora rawhide valgrind-3.11.0-5. Already backported and being tested in Fedora, should be good to go. Perhaps would be worth trying to compile Qemu with the --disable-tcmalloc flags which disable tcmalloc usage. Maybe some tools like sanitizer/valgrind are not that comfortable with a no standard (glibc) memory allocator. (In reply to Frediano Ziglio from comment #19) > Perhaps would be worth trying to compile Qemu with the --disable-tcmalloc > flags which disable tcmalloc usage. Maybe some tools like sanitizer/valgrind > are not that comfortable with a no standard (glibc) memory allocator. Sorry, wrong bug, please ignore this comment (In reply to Frediano Ziglio from comment #20) > (In reply to Frediano Ziglio from comment #19) > > Perhaps would be worth trying to compile Qemu with the --disable-tcmalloc > > flags which disable tcmalloc usage. Maybe some tools like sanitizer/valgrind > > are not that comfortable with a no standard (glibc) memory allocator. > > Sorry, wrong bug, please ignore this comment No worries. This bug is actually about making sure valgrind does automagically detect tcmalloc. The code is already in the valgrind package for Fedora 23. So if your issue was with valgrind, then please do try that version to see if it solves your issue without having the rebuild with --disable-tcmalloc. Thanks. QA note: the test case in #c18 is wrapmalloc*. Verified for build valgrind-3.11.0-22.el7. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2297.html |