Bug 592377

Summary: Qpid C++ IOThread destructor blocks grid program exit on Linux and Windows
Product: Red Hat Enterprise MRG Reporter: Pete MacKinnon <pmackinn>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED CURRENTRELEASE QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: urgent Docs Contact:
Priority: high    
Version: DevelopmentCC: gsim, jross, matt, tstclair
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-11 19:01:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pete MacKinnon 2010-05-14 17:24:26 UTC
~IOThread() {
        ScopedLock<Mutex> l(threadLock);
        while (connections > 0) {
    >>>     noConnections.wait(threadLock); <<<
        }
        if (poller_)
            poller_->shutdown();
        for (int i=0; i<ioThreads; ++i) {
            t[i].join();
        }
    }

Linux issue:
If a Condor daemon raises an exception (and it can in various points of the code), it will try to minimally log and quickly exit. This is accounted for in Condor process management - a failed process will be restarted. However, when QMF is enabled the failed daemon is hung in the dtor above.

Windows issue:
<may need more info from Tim St Clair here>
Condor QMF-enabled daemons on Windows hang at exit at same line of code in dtor.

Suggestion:
Modify wait to overloaded version that takes time arg (wraps pthread_cond_wait on Linux, boost condition.timed_wait on Windows). Use a reasonable configurable timer value for shutdown (10 sec?).

Comment 1 Gordon Sim 2010-05-15 15:43:40 UTC
Your program calls exit() rather than returning from main? That means that the destructors will not be called so you will need to delete any QMF/Qpid objects prior to the exit in your exception handling logic.

Comment 2 Matthew Farrellee 2010-05-15 22:17:05 UTC
It absolutely calls exit() rather than returning from main. "exception" is not a true C++ exception, but an EXCEPT (exit now) macro. EXCEPT cannot be easily instrumented to cleanup QMF/Qpid objects introduced via dlopen, which is how we include QMF functionality.

From the description, it looks like a dtor is being called during exit.

Needing to call the destructors in a particular order raises the bar for entry, especially for existing software. It is also a new requirement since 1.2.

Pete, you might investigate __attribute__((destructor)) while Gordon, will you see if this dtor requirement is necessary?

Comment 3 Gordon Sim 2010-05-16 10:01:58 UTC
I don't think its the order in which destructors are called in this case, I think it is that some are not called at all. The exit() is the key difference between this case and the normal cases that I assume works ok. 

Can you describe in more detail how the executable is made up (what libs are loaded, what QMF/Qpid variables are held, are they global, static, etc)?

Comment 4 Gordon Sim 2010-06-01 07:55:40 UTC
Will no longer be a problem as of qpid-cpp-client-0.7.946106-2 as the IOThread destructor no longer waits on the condition.