Bug 1373197 - dyninst: Repeated execution of Process::temporaryDetach(), Process::reAttach() fails
Summary: dyninst: Repeated execution of Process::temporaryDetach(), Process::reAttach(...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dyninst
Version: 24
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
Assignee: Josh Stone
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-05 13:12 UTC by Florian Weimer
Modified: 2016-12-07 05:24 UTC (History)
6 users (show)

Fixed In Version: dyninst-9.2.0-5.fc25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-07 05:24:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dyninst-reattach-loop.cpp (1.45 KB, text/plain)
2016-09-05 13:12 UTC, Florian Weimer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/dyninst dyninst issues 208 0 None None None 2016-11-22 18:32:42 UTC

Description Florian Weimer 2016-09-05 13:12:38 UTC
Created attachment 1197942 [details]
dyninst-reattach-loop.cpp

Description of problem:

The dyninst process either fails in reAttach() or deadlocks after repeatedly detaching and reattaching to a target process.

Version-Release number of selected component (if applicable):

dyninst-9.0.3-3.fc23.x86_64

How reproducible:

The loop has to execute for a few thousand times, likely depending on the activity in the target process and overall system load.

Steps to Reproduce:
1. Run the reproducer against a target PID and wait.

Actual results:

Process hangs (with the message “error: process appears to be stuck”), or reAttach() fails.

I have seen a reAttach() failure only with a target process which frequently starts and terminates threads.

Expected results:

No hang.

Additional info:

A backtrace from the hang looks like this:

Thread 5 (Thread 0x7ffff3404700 (LWP 31816)):
#0  0x00007ffff74d334d in pause () at ../sysdeps/unix/syscall-template.S:84
#1  0x000000000040142e in <lambda()>::operator() (__closure=0x62fda8) at dyninst-reattach-loop.cpp:43
#2  std::_Bind_simple<main(int, char**)::<lambda()>()>::_M_invoke<> (this=0x62fda8)
    at /usr/include/c++/5.3.1/functional:1531
#3  std::_Bind_simple<main(int, char**)::<lambda()>()>::operator() (this=0x62fda8)
    at /usr/include/c++/5.3.1/functional:1520
#4  std::thread::_Impl<std::_Bind_simple<main(int, char**)::<lambda()>()> >::_M_run(void) (this=0x62fd90)
    at /usr/include/c++/5.3.1/thread:115
#5  0x00007ffff71f9f20 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#6  0x00007ffff74ca61a in start_thread (arg=0x7ffff3404700) at pthread_create.c:334
#7  0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7ffff40af700 (LWP 31815)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff7a7d5e8 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x62dc60)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:184
#2  CondVar<Mutex<false> >::wait (this=0x62dc60)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread.h:121
#3  Dyninst::ProcControlAPI::GeneratorMT::processWait (this=0x62fa80, block=true)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:438
#4  0x00007ffff7a7e218 in Dyninst::ProcControlAPI::Generator::getAndQueueEventInt (this=this@entry=0x62fa80, 
    block=block@entry=true) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:188
#5  0x00007ffff7a6e381 in Dyninst::ProcControlAPI::GeneratorMT::main (this=this@entry=0x62fa80)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:418
#6  0x00007ffff7a7ebeb in Dyninst::ProcControlAPI::GeneratorMT::start (this=0x62fa80)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:410
#7  0x00007ffff7b525cc in thread_init (d=<optimized out>)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread-unix.C:56
#8  0x00007ffff74ca61a in start_thread (arg=0x7ffff40af700) at pthread_create.c:334
#9  0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ffff4ab8700 (LWP 31814)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff7a6f05a in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x62ded0)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:184
#2  CondVar<Mutex<false> >::wait (this=0x62ded0)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread.h:121
#3  LinuxPtrace::main (this=this@entry=0x62de10)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3346
#4  0x00007ffff7b3e869 in start_ptrace (lp=lp@entry=0x62de10)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3328
#5  0x00007ffff7b525cc in thread_init (d=<optimized out>)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread-unix.C:56
#6  0x00007ffff74ca61a in start_thread (arg=0x7ffff4ab8700) at pthread_create.c:334
#7  0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ffff52b9700 (LWP 31813)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff74cca46 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x624b88) at ../nptl/pthread_mutex_lock.c:116
#2  0x00007ffff7ac3c85 in boost::recursive_mutex::lock (this=0x624b88)
    at /usr/include/boost/thread/pthread/recursive_mutex.hpp:113
#3  MTManager::startWork (this=0x624b10)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:8640
#4  MTManager::evhandler_main (this=0x624b10)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:8583
#5  0x00007ffff7b525cc in thread_init (d=<optimized out>)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread-unix.C:56
#6  0x00007ffff74ca61a in start_thread (arg=0x7ffff52b9700) at pthread_create.c:334
#7  0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ffff7fc9740 (LWP 31812)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff7b3e9e4 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x62df38)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:184
#2  CondVar<Mutex<false> >::wait (this=0x62df38)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread.h:121
#3  LinuxPtrace::waitfor_ret (this=this@entry=0x62de10)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3381
#4  0x00007ffff7b3ef48 in LinuxPtrace::ptrace_write (this=0x62de10, inTrace=139903394877056, size_=1, 
    inSelf=0xa47750, pid_=31390) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3446
#5  0x00007ffff7a9b43a in int_process::writeMem (this=this@entry=0x62e458, local=local@entry=0xa47750, 
    remote=139903394877056, size=1, result=..., thr=0x633b78, thr@entry=0x0, bp_write=int_process::bp_clear)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:1548
#6  0x00007ffff7a9e322 in sw_breakpoint::restoreBreakpointData (this=this@entry=0xa476c0, proc=proc@entry=0x62e458, 
    res_resp=...) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:5075
#7  0x00007ffff7ab3127 in sw_breakpoint::suspend (this=0xa476c0, proc=0x62e458, resps=std::set with 0 elements)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:5119
#8  0x00007ffff7a89141 in HandleDetach::handleEvent (this=<optimized out>, ev=...)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/handler.C:1862
#9  0x00007ffff7a8de5c in HandlerPool::handleEvent (this=this@entry=0x62f880, orig_ev=...)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/handler.C:438
#10 0x00007ffff7ac15db in int_process::waitAndHandleEvents (block=block@entry=false)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:1106
#11 0x00007ffff7aecba3 in do_detach (procset=0xa3bd20, temporary=temporary@entry=true, 
    leaveStopped=leaveStopped@entry=false)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/procset.C:1560
#12 0x00007ffff7aecca0 in Dyninst::ProcControlAPI::ProcessSet::temporaryDetach (this=<optimized out>)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/procset.C:1591
#13 0x00007ffff7aad02e in Dyninst::ProcControlAPI::Process::temporaryDetach (this=<optimized out>)
    at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:6470
#14 0x00000000004010f5 in main (argc=<optimized out>, argv=0x7fffffffe058) at dyninst-reattach-loop.cpp:52

Comment 1 Florian Weimer 2016-09-05 14:43:22 UTC
Stuck process also reproduces with dyninst-9.1.0-5.fc24.x86_64.

Comment 2 Florian Weimer 2016-09-05 15:02:04 UTC
I can test rawhide due to bug 1373239.

Comment 3 Josh Stone 2016-09-14 19:51:31 UTC
With dyninst-9.2.0-4.fc26, rawhide now gets stuck too. (yay?)

Comment 4 Josh Stone 2016-09-27 21:50:01 UTC
Aha - the watchdog is rigged!

      while (true) {
        if (!running) {
          fprintf(stderr,
                  "error: process appears to be stuck (%zu iterations)\n",
                  loops.load());
          pause();
        }
        usleep (1000 * 1000);
        running = false;
      }

That's an almost impossible window between "running = false;" and looping around to check it again.  I was baffled why this always seemed to get stuck in "about" one second, when in fact it was exactly one usleep's worth.

Having swapped the usleep line after "running = false", I have not seen the watchdog report a problem anymore.

> Process hangs (with the message “error: process appears to be stuck”), or
> reAttach() fails.
> 
> I have seen a reAttach() failure only with a target process which frequently
> starts and terminates threads.

Now I'll see if I can reproduce this actual reAttach failure.  Do you have a suggested process which behaves this way?  (Or I can mock one up, of course.)

Comment 5 Josh Stone 2016-09-27 23:12:31 UTC
I can reproduce a reAttach failure by targeting Firefox.  It's not consistent in how long that takes to fail.

Comment 6 Josh Stone 2016-09-28 00:29:39 UTC
This fails reliably quickly as a target process:


#include <thread>
#include <unistd.h>

using namespace std;

void
spawn() {
    this_thread::sleep_for(100ms);
    thread(spawn).detach();
}

int
main()
{
    for (int i = 0; i < 10; ++i)
        spawn();
    pause();
    return 0;
}

Comment 7 Fedora Update System 2016-11-22 22:30:12 UTC
dyninst-9.2.0-5.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-2401f115af

Comment 8 Fedora Update System 2016-11-25 09:40:52 UTC
dyninst-9.2.0-5.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-2401f115af

Comment 9 Fedora Update System 2016-12-07 05:24:01 UTC
dyninst-9.2.0-5.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.