Description of problem: To speed up the processing of large ELF files which are common in HPC please resolve the data races which prevent parallel parsing of CU's DIEs within ELF files. We'd like to do something like: for(Dwarf_Off cu_off = 0; dwarf_next_unit(dbg(), cu_off, &next_cu_header, &cu_header_length, NULL, &abbrev_offset, &addr_size, &offset_size, &type_signaturep, NULL) == 0; cu_off = next_cu_header) { if(!dwarf_offdie_types(dbg(), cu_off + cu_header_length, ¤t_cu_die)) continue; // Modified for parallelism: rather than a single DwarfWalker holding state, // create a new context and clone before we spawn a thread push(); DwarfWalker mod_walker(*this); pop(); bool ret = cilk_spawn mod_walker.parseModule(false, fixUnknownMod); // bool ret = parseModule(false, fixUnknownMod); if(!ret) { cilk_sync; return false; } compile_offset = next_cu_header; } cilk_sync; Code similar to this is being worked on for dyninst http://www.dyninst.org/ Dyninst is a heavily used tool at LLNL and is used to make other tools including systemtap and HPC Toolkit.
It is understood that this may change some APIs and that is acceptable.
Since our initial attempts to parallelize reading of symbols, the code was refactored: std::for_each(module_dies.begin(), module_dies.end(), [&](Dwarf_Die cur) { // tbb::parallel_for_each(module_dies, [&](Dwarf_Die cur) { int basis = open(symtab()->file().c_str(), O_RDONLY); Dwarf* temp_dwarf = dwarf_begin(basis, DWARF_C_READ); DwarfWalker w(symtab_, temp_dwarf); w.push(); bool ok = w.parseModule(cur, fixUnknownMod); w.pop(); dwarf_end(temp_dwarf); close(basis); return ok; }); DwarfWalker has no static members, and the body of the lambda is starting from a fresh fd to create a fresh Dwarf handle for each iteration (regardless of what thread it's on). This does stash Dwarf_Dies and retrieve them, and crash when calling dwarf_hasattr on one of these saved DIEs; is this an illegal thing to do that happens to work in single-threaded code?  Subsequent investigation discovered that if instead of stashing the DIEs, the offsets are stashed and the DIEs are recreated as needed locally from those offsets, no corruption occurs. I would guess that what's happening is that DIEs are holding a pointer to a Dwarf_Abbrev that's stack-allocated somewhere (address in the 7ffffffff neighborhood) but is going out of scope before a copy of the DIE does. So you can copy DIE objects all you want as long as the original is still around, but you can't have a copy outlive the original safely. This further complicates the processing of DWARF for large applications.
Created attachment 1389635 [details] Stand alone reproducer for parallelism challenges with libdw I have yet to test this but they were able to construct a test case more quickly than I was. At the very least this will be an initial start to demonstrate what they would like to do.
Unlikely to happen anytime soon, sorry. Moving to DTS 9.0.
There is some work being done here: https://github.com/blue42u/elfutils
Although there is some work done, it hasn't been integrated upstream yet. Moving to a later product release.
I believe the parallelism improvements merged into elfutils 0.178 cover this request.
(In reply to Frank Ch. Eigler from comment #8) > I believe the parallelism improvements merged into elfutils 0.178 cover this > request. I believe so indeed, from the NEWS file for elfutils 0.178: libdw: Abbrevs and DIEs can now be read concurrently by multiple threads through the same Dwarf handle. And we will try to target 0.180 for DTS10.
Note the following paper that explains the thread safety changes: https://dyninst.github.io/scalable_tools_workshop/petascale2019/assets/slides/ElfutilsInParallel.pdf
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: devtoolset-10-elfutils), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5289