Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1498605

Summary: [LLNL RHDTS FEAT] elfutils Parallel parsing of CU's DIEs from libdw
Product: Red Hat Developer Toolset Reporter: Ben Woodard <woodard>
Component: elfutilsAssignee: Mark Wielaard <mjw>
Status: CLOSED ERRATA QA Contact: Martin Cermak <mcermak>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: fche, jakub, kanderso, mbenitez, mcermak, mjw, mnewsome, ohudlick, tgummels
Target Milestone: alphaKeywords: FutureFeature
Target Release: 10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: devtoolset-10-elfutils-0.180-2 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-01 12:13:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1517824    
Attachments:
Description Flags
Stand alone reproducer for parallelism challenges with libdw none

Description Ben Woodard 2017-10-04 18:27:44 UTC
Description of problem:
To speed up the processing of large ELF files which are common in HPC please resolve the data races which prevent parallel parsing of CU's DIEs within ELF files. We'd like to do something like:

 for(Dwarf_Off cu_off = 0;
            dwarf_next_unit(dbg(), cu_off, &next_cu_header, &cu_header_length,
                NULL, &abbrev_offset, &addr_size, &offset_size,
                &type_signaturep, NULL) == 0;
            cu_off = next_cu_header)
    {
        if(!dwarf_offdie_types(dbg(), cu_off + cu_header_length, &current_cu_die))
            continue;
// Modified for parallelism: rather than a single DwarfWalker holding state,
// create a new context and clone before we spawn a thread
        push();
        DwarfWalker mod_walker(*this);
        pop();
        bool ret = cilk_spawn mod_walker.parseModule(false, fixUnknownMod);
//        bool ret = parseModule(false, fixUnknownMod);
        if(!ret) {
            cilk_sync;
            return false;
        }
        compile_offset = next_cu_header;
    }
    cilk_sync;

Code similar to this is being worked on for dyninst http://www.dyninst.org/

Dyninst is a heavily used tool at LLNL and is used to make other tools including systemtap and HPC Toolkit.

Comment 2 Ben Woodard 2017-10-04 18:29:56 UTC
It is understood that this may change some APIs and that is acceptable.

Comment 3 Ben Woodard 2018-01-30 22:03:01 UTC
Since our initial attempts to parallelize reading of symbols, the code was refactored:

    std::for_each(module_dies.begin(), module_dies.end(), [&](Dwarf_Die cur) {
//    tbb::parallel_for_each(module_dies, [&](Dwarf_Die cur) {
        int basis = open(symtab()->file().c_str(), O_RDONLY);
        Dwarf* temp_dwarf = dwarf_begin(basis, DWARF_C_READ);
        DwarfWalker w(symtab_, temp_dwarf);
        w.push();
        bool ok = w.parseModule(cur, fixUnknownMod);
        w.pop();
        dwarf_end(temp_dwarf);
        close(basis);
        return ok;
    });
DwarfWalker has no static members, and the body of the lambda is starting from a fresh fd to create a fresh Dwarf handle for each iteration (regardless of what thread it's on). 

This does stash Dwarf_Dies and retrieve them, and crash when calling dwarf_hasattr on one of these saved DIEs; is this an illegal thing to do that happens to work in single-threaded code?

Subsequent investigation discovered that if instead of stashing the DIEs, the offsets are stashed and the DIEs are recreated as needed locally from those offsets, no corruption occurs.

I would guess that what's happening is that DIEs are holding a pointer to a Dwarf_Abbrev that's stack-allocated somewhere (address in the 7ffffffff neighborhood) but is going out of scope before a copy of the DIE does. So you can copy DIE objects all you want as long as the original is still around, but you can't have a copy outlive the original safely.

This further complicates the processing of DWARF for large applications.

Comment 4 Ben Woodard 2018-02-01 16:40:39 UTC
Created attachment 1389635 [details]
Stand alone reproducer for parallelism challenges with libdw

I have yet to test this but they were able to construct a test case more quickly than I was. At the very least this will be an initial start to demonstrate what they would like to do.

Comment 5 Mark Wielaard 2018-07-11 12:43:36 UTC
Unlikely to happen anytime soon, sorry. Moving to DTS 9.0.

Comment 6 Mark Wielaard 2019-05-10 20:27:29 UTC
There is some work being done here: https://github.com/blue42u/elfutils

Comment 7 Mark Wielaard 2019-07-02 14:21:49 UTC
Although there is some work done, it hasn't been integrated upstream yet. Moving to a later product release.

Comment 8 Frank Ch. Eigler 2020-04-02 19:24:37 UTC
I believe the parallelism improvements merged into elfutils 0.178 cover this request.

Comment 9 Mark Wielaard 2020-07-01 12:33:27 UTC
(In reply to Frank Ch. Eigler from comment #8)
> I believe the parallelism improvements merged into elfutils 0.178 cover this
> request.

I believe so indeed, from the NEWS file for elfutils 0.178:

libdw: Abbrevs and DIEs can now be read concurrently by multiple
       threads through the same Dwarf handle.

And we will try to target 0.180 for DTS10.

Comment 13 Mark Wielaard 2020-09-09 13:40:59 UTC
Note the following paper that explains the thread safety changes: https://dyninst.github.io/scalable_tools_workshop/petascale2019/assets/slides/ElfutilsInParallel.pdf

Comment 19 errata-xmlrpc 2020-12-01 12:13:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: devtoolset-10-elfutils), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5289