Bug 1498605 - [LLNL RHDTS FEAT] elfutils Parallel parsing of CU's DIEs from libdw
Summary: [LLNL RHDTS FEAT] elfutils Parallel parsing of CU's DIEs from libdw
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Developer Toolset
Classification: Red Hat
Component: elfutils
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: alpha
: 10.0
Assignee: Mark Wielaard
QA Contact: Martin Cermak
URL:
Whiteboard:
Depends On:
Blocks: 1517824
TreeView+ depends on / blocked
 
Reported: 2017-10-04 18:27 UTC by Ben Woodard
Modified: 2020-12-01 12:13 UTC (History)
9 users (show)

Fixed In Version: devtoolset-10-elfutils-0.180-2
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-01 12:13:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Stand alone reproducer for parallelism challenges with libdw (1.62 KB, text/x-csrc)
2018-02-01 16:40 UTC, Ben Woodard
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 22252 0 P2 RESOLVED Parallel parsing of CU's DIEs from libdw 2020-12-02 18:14:13 UTC

Description Ben Woodard 2017-10-04 18:27:44 UTC
Description of problem:
To speed up the processing of large ELF files which are common in HPC please resolve the data races which prevent parallel parsing of CU's DIEs within ELF files. We'd like to do something like:

 for(Dwarf_Off cu_off = 0;
            dwarf_next_unit(dbg(), cu_off, &next_cu_header, &cu_header_length,
                NULL, &abbrev_offset, &addr_size, &offset_size,
                &type_signaturep, NULL) == 0;
            cu_off = next_cu_header)
    {
        if(!dwarf_offdie_types(dbg(), cu_off + cu_header_length, &current_cu_die))
            continue;
// Modified for parallelism: rather than a single DwarfWalker holding state,
// create a new context and clone before we spawn a thread
        push();
        DwarfWalker mod_walker(*this);
        pop();
        bool ret = cilk_spawn mod_walker.parseModule(false, fixUnknownMod);
//        bool ret = parseModule(false, fixUnknownMod);
        if(!ret) {
            cilk_sync;
            return false;
        }
        compile_offset = next_cu_header;
    }
    cilk_sync;

Code similar to this is being worked on for dyninst http://www.dyninst.org/

Dyninst is a heavily used tool at LLNL and is used to make other tools including systemtap and HPC Toolkit.

Comment 2 Ben Woodard 2017-10-04 18:29:56 UTC
It is understood that this may change some APIs and that is acceptable.

Comment 3 Ben Woodard 2018-01-30 22:03:01 UTC
Since our initial attempts to parallelize reading of symbols, the code was refactored:

    std::for_each(module_dies.begin(), module_dies.end(), [&](Dwarf_Die cur) {
//    tbb::parallel_for_each(module_dies, [&](Dwarf_Die cur) {
        int basis = open(symtab()->file().c_str(), O_RDONLY);
        Dwarf* temp_dwarf = dwarf_begin(basis, DWARF_C_READ);
        DwarfWalker w(symtab_, temp_dwarf);
        w.push();
        bool ok = w.parseModule(cur, fixUnknownMod);
        w.pop();
        dwarf_end(temp_dwarf);
        close(basis);
        return ok;
    });
DwarfWalker has no static members, and the body of the lambda is starting from a fresh fd to create a fresh Dwarf handle for each iteration (regardless of what thread it's on). 

This does stash Dwarf_Dies and retrieve them, and crash when calling dwarf_hasattr on one of these saved DIEs; is this an illegal thing to do that happens to work in single-threaded code?

Subsequent investigation discovered that if instead of stashing the DIEs, the offsets are stashed and the DIEs are recreated as needed locally from those offsets, no corruption occurs.

I would guess that what's happening is that DIEs are holding a pointer to a Dwarf_Abbrev that's stack-allocated somewhere (address in the 7ffffffff neighborhood) but is going out of scope before a copy of the DIE does. So you can copy DIE objects all you want as long as the original is still around, but you can't have a copy outlive the original safely.

This further complicates the processing of DWARF for large applications.

Comment 4 Ben Woodard 2018-02-01 16:40:39 UTC
Created attachment 1389635 [details]
Stand alone reproducer for parallelism challenges with libdw

I have yet to test this but they were able to construct a test case more quickly than I was. At the very least this will be an initial start to demonstrate what they would like to do.

Comment 5 Mark Wielaard 2018-07-11 12:43:36 UTC
Unlikely to happen anytime soon, sorry. Moving to DTS 9.0.

Comment 6 Mark Wielaard 2019-05-10 20:27:29 UTC
There is some work being done here: https://github.com/blue42u/elfutils

Comment 7 Mark Wielaard 2019-07-02 14:21:49 UTC
Although there is some work done, it hasn't been integrated upstream yet. Moving to a later product release.

Comment 8 Frank Ch. Eigler 2020-04-02 19:24:37 UTC
I believe the parallelism improvements merged into elfutils 0.178 cover this request.

Comment 9 Mark Wielaard 2020-07-01 12:33:27 UTC
(In reply to Frank Ch. Eigler from comment #8)
> I believe the parallelism improvements merged into elfutils 0.178 cover this
> request.

I believe so indeed, from the NEWS file for elfutils 0.178:

libdw: Abbrevs and DIEs can now be read concurrently by multiple
       threads through the same Dwarf handle.

And we will try to target 0.180 for DTS10.

Comment 13 Mark Wielaard 2020-09-09 13:40:59 UTC
Note the following paper that explains the thread safety changes: https://dyninst.github.io/scalable_tools_workshop/petascale2019/assets/slides/ElfutilsInParallel.pdf

Comment 19 errata-xmlrpc 2020-12-01 12:13:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: devtoolset-10-elfutils), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5289


Note You need to log in before you can comment on or make changes to this bug.