Bug 695886

Summary: Severe but difficult to diagnose lock error in qpid::sys::StateMonitor
Product: Red Hat Enterprise MRG Reporter: Andrew Stitcher <astitcher>
Component: qpid-cppAssignee: Andrew Stitcher <astitcher>
Status: CLOSED CURRENTRELEASE QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: urgent    
Version: betaCC: esammons, freznice, iboverma, jross, mcressma, tross
Target Milestone: 2.0Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: https://issues.apache.org/jira/browse/QPID-3199
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-29 20:06:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andrew Stitcher 2011-04-12 21:17:36 UTC
Description of problem:

A potentially serious bug was discovered by automated inspection (by the clang compiler).

See QPID-3199 for details.

https://issues.apache.org/jira/browse/QPID-3199

Comment 1 Andrew Stitcher 2011-04-12 21:24:52 UTC
This is fixed on the qpid trunk - r1091224

It has also been fixed on the qpid 0.10 release branch.

The fix is very low risk, and the potential bug symptoms might be quite hard to diagnose.

Comment 5 Ted Ross 2012-03-29 20:06:17 UTC
Fixed well prior to the 0.14 rebase

Comment 6 Frantisek Reznicek 2012-03-30 09:48:25 UTC
CLOSED/CRELEASE -> ASSIGNED -> ON_QA
The defect has to go through QA process.

It seems impossible to trigger the issue, but c++ compiler or linting tools can highlight whether the code got fixed.

Limited effort in reproducing is planned (where DUT is just qpid::sys::StateMonitor and possible dependencies).
Compiler / linting tool should confirm change made.

Comment 7 Frantisek Reznicek 2012-04-02 11:38:51 UTC
Core review was done and confirms r1091224 commit correctness.

Comment 8 Frantisek Reznicek 2012-04-02 12:49:06 UTC
IMHO issue was not visible due to g++ compiler optimization settings.

I do not see added value in making special test to prove that in certain situations the old code may fail - foreseen as huge effort.

Version:
  qpid-0.8   - codebase w/o fix
  qpid-0.10  - codebase with r1091224 fix
  qpid-0.12+ - codebase with r1091224 + r1101180 fix (qpid-cpp-*0.14-14.el5/6)

Issue verified by code inspection (*).


-> VERIFIED


(*) from C++ spec
Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that (lexically) contains the point where they were created. [12.2/3]

In light of above definition old code (mark -) means that ~ScopedWait can be called at any time corresponding to lines 3-5, while new code (+) ensures destruction of ScopedLock happens exactly at the end of method (line 5)

Old code is dangerous due to fact that ScopedWait object counter might be decremented sooner than expected which may cause strange hangs.

  1 void waitFor(Enum s) {
  2-   ScopedWait(*this);
  2+   ScopedWait w(*this);
  3   while (s != state)
  4     wait();
  5 }

Comment 9 Frantisek Reznicek 2012-04-02 15:02:24 UTC
Additional info:
  Clang static code analyzer proved fix improved code quality.
  cppcheck static code analyzer did not detected this case.