Created attachment 1197942 [details] dyninst-reattach-loop.cpp Description of problem: The dyninst process either fails in reAttach() or deadlocks after repeatedly detaching and reattaching to a target process. Version-Release number of selected component (if applicable): dyninst-9.0.3-3.fc23.x86_64 How reproducible: The loop has to execute for a few thousand times, likely depending on the activity in the target process and overall system load. Steps to Reproduce: 1. Run the reproducer against a target PID and wait. Actual results: Process hangs (with the message “error: process appears to be stuck”), or reAttach() fails. I have seen a reAttach() failure only with a target process which frequently starts and terminates threads. Expected results: No hang. Additional info: A backtrace from the hang looks like this: Thread 5 (Thread 0x7ffff3404700 (LWP 31816)): #0 0x00007ffff74d334d in pause () at ../sysdeps/unix/syscall-template.S:84 #1 0x000000000040142e in <lambda()>::operator() (__closure=0x62fda8) at dyninst-reattach-loop.cpp:43 #2 std::_Bind_simple<main(int, char**)::<lambda()>()>::_M_invoke<> (this=0x62fda8) at /usr/include/c++/5.3.1/functional:1531 #3 std::_Bind_simple<main(int, char**)::<lambda()>()>::operator() (this=0x62fda8) at /usr/include/c++/5.3.1/functional:1520 #4 std::thread::_Impl<std::_Bind_simple<main(int, char**)::<lambda()>()> >::_M_run(void) (this=0x62fd90) at /usr/include/c++/5.3.1/thread:115 #5 0x00007ffff71f9f20 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:84 #6 0x00007ffff74ca61a in start_thread (arg=0x7ffff3404700) at pthread_create.c:334 #7 0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 4 (Thread 0x7ffff40af700 (LWP 31815)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ffff7a7d5e8 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x62dc60) at /usr/include/boost/thread/pthread/condition_variable.hpp:184 #2 CondVar<Mutex<false> >::wait (this=0x62dc60) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread.h:121 #3 Dyninst::ProcControlAPI::GeneratorMT::processWait (this=0x62fa80, block=true) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:438 #4 0x00007ffff7a7e218 in Dyninst::ProcControlAPI::Generator::getAndQueueEventInt (this=this@entry=0x62fa80, block=block@entry=true) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:188 #5 0x00007ffff7a6e381 in Dyninst::ProcControlAPI::GeneratorMT::main (this=this@entry=0x62fa80) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:418 #6 0x00007ffff7a7ebeb in Dyninst::ProcControlAPI::GeneratorMT::start (this=0x62fa80) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/generator.C:410 #7 0x00007ffff7b525cc in thread_init (d=<optimized out>) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread-unix.C:56 #8 0x00007ffff74ca61a in start_thread (arg=0x7ffff40af700) at pthread_create.c:334 #9 0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 3 (Thread 0x7ffff4ab8700 (LWP 31814)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ffff7a6f05a in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x62ded0) at /usr/include/boost/thread/pthread/condition_variable.hpp:184 #2 CondVar<Mutex<false> >::wait (this=0x62ded0) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread.h:121 #3 LinuxPtrace::main (this=this@entry=0x62de10) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3346 #4 0x00007ffff7b3e869 in start_ptrace (lp=lp@entry=0x62de10) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3328 #5 0x00007ffff7b525cc in thread_init (d=<optimized out>) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread-unix.C:56 #6 0x00007ffff74ca61a in start_thread (arg=0x7ffff4ab8700) at pthread_create.c:334 #7 0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 2 (Thread 0x7ffff52b9700 (LWP 31813)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007ffff74cca46 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x624b88) at ../nptl/pthread_mutex_lock.c:116 #2 0x00007ffff7ac3c85 in boost::recursive_mutex::lock (this=0x624b88) at /usr/include/boost/thread/pthread/recursive_mutex.hpp:113 #3 MTManager::startWork (this=0x624b10) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:8640 #4 MTManager::evhandler_main (this=0x624b10) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:8583 #5 0x00007ffff7b525cc in thread_init (d=<optimized out>) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread-unix.C:56 #6 0x00007ffff74ca61a in start_thread (arg=0x7ffff52b9700) at pthread_create.c:334 #7 0x00007ffff69695fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 1 (Thread 0x7ffff7fc9740 (LWP 31812)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ffff7b3e9e4 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x62df38) at /usr/include/boost/thread/pthread/condition_variable.hpp:184 #2 CondVar<Mutex<false> >::wait (this=0x62df38) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/common/src/dthread.h:121 #3 LinuxPtrace::waitfor_ret (this=this@entry=0x62de10) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3381 #4 0x00007ffff7b3ef48 in LinuxPtrace::ptrace_write (this=0x62de10, inTrace=139903394877056, size_=1, inSelf=0xa47750, pid_=31390) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/linux.C:3446 #5 0x00007ffff7a9b43a in int_process::writeMem (this=this@entry=0x62e458, local=local@entry=0xa47750, remote=139903394877056, size=1, result=..., thr=0x633b78, thr@entry=0x0, bp_write=int_process::bp_clear) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:1548 #6 0x00007ffff7a9e322 in sw_breakpoint::restoreBreakpointData (this=this@entry=0xa476c0, proc=proc@entry=0x62e458, res_resp=...) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:5075 #7 0x00007ffff7ab3127 in sw_breakpoint::suspend (this=0xa476c0, proc=0x62e458, resps=std::set with 0 elements) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:5119 #8 0x00007ffff7a89141 in HandleDetach::handleEvent (this=<optimized out>, ev=...) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/handler.C:1862 #9 0x00007ffff7a8de5c in HandlerPool::handleEvent (this=this@entry=0x62f880, orig_ev=...) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/handler.C:438 #10 0x00007ffff7ac15db in int_process::waitAndHandleEvents (block=block@entry=false) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:1106 #11 0x00007ffff7aecba3 in do_detach (procset=0xa3bd20, temporary=temporary@entry=true, leaveStopped=leaveStopped@entry=false) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/procset.C:1560 #12 0x00007ffff7aecca0 in Dyninst::ProcControlAPI::ProcessSet::temporaryDetach (this=<optimized out>) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/procset.C:1591 #13 0x00007ffff7aad02e in Dyninst::ProcControlAPI::Process::temporaryDetach (this=<optimized out>) at /usr/src/debug/dyninst-9.0.3/DyninstAPI-9.0.3/proccontrol/src/process.C:6470 #14 0x00000000004010f5 in main (argc=<optimized out>, argv=0x7fffffffe058) at dyninst-reattach-loop.cpp:52
Stuck process also reproduces with dyninst-9.1.0-5.fc24.x86_64.
I can test rawhide due to bug 1373239.
With dyninst-9.2.0-4.fc26, rawhide now gets stuck too. (yay?)
Aha - the watchdog is rigged! while (true) { if (!running) { fprintf(stderr, "error: process appears to be stuck (%zu iterations)\n", loops.load()); pause(); } usleep (1000 * 1000); running = false; } That's an almost impossible window between "running = false;" and looping around to check it again. I was baffled why this always seemed to get stuck in "about" one second, when in fact it was exactly one usleep's worth. Having swapped the usleep line after "running = false", I have not seen the watchdog report a problem anymore. > Process hangs (with the message “error: process appears to be stuck”), or > reAttach() fails. > > I have seen a reAttach() failure only with a target process which frequently > starts and terminates threads. Now I'll see if I can reproduce this actual reAttach failure. Do you have a suggested process which behaves this way? (Or I can mock one up, of course.)
I can reproduce a reAttach failure by targeting Firefox. It's not consistent in how long that takes to fail.
This fails reliably quickly as a target process: #include <thread> #include <unistd.h> using namespace std; void spawn() { this_thread::sleep_for(100ms); thread(spawn).detach(); } int main() { for (int i = 0; i < 10; ++i) spawn(); pause(); return 0; }
dyninst-9.2.0-5.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-2401f115af
dyninst-9.2.0-5.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-2401f115af
dyninst-9.2.0-5.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.