Bug 705437

Summary: schedd crash on Windows due to bug in timed_queue<>
Product: Red Hat Enterprise MRG Reporter: Erik Erlandson <eerlands>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED ERRATA QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 2.0CC: jneedle, matt, mkudlej, tstclair
Target Milestone: 2.0.1Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Windows   
Whiteboard:
Fixed In Version: condor-7.6.2-0.1 Doc Type: Bug Fix
Doc Text:
The scheduler could have potentially suffered a memory access error due to a missing check for an empty queue. This check has been implemented, thus eliminating the chance of incurring a memory access error.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-07 16:44:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 723887    

Description Erik Erlandson 2011-05-17 16:52:28 UTC
Description of problem:
A bug in the new timed_queue<> class introduced with the schedd performance stats has caused a crash when the schedd runs on windows (upstream).  

This crash is only known to have occurred on Windows, which we do not support for scheduler, however it might in principle occur on other OS.  It is unknown why it has not manifested on RHEL or Fedora


How reproducible:
100% (?) on Windows.   So far 0% on other OS.

Steps to Reproduce:
1. Start up the schedd on Windows
  
Actual results:
Crash

Expected results:
normal execution


Additional info:
currently a fix has been committed to upstream master, which can be backported:

Upstream commit diff:

$ git diff l/uw/master~1..l/uw/master
diff --git a/src/condor_utils/timed_queue.h b/src/condor_utils/timed_queue.h
index da2794d..3e88b7d 100644
--- a/src/condor_utils/timed_queue.h
+++ b/src/condor_utils/timed_queue.h
@@ -70,7 +70,7 @@ struct timed_queue : public std::deque<std::pair<time_t, Data> > {
 
     void max_time(size_type t) {
         _max_time = t;
-        if (max_time() > 0) trim_time(base_type::front().first - max_time());
+        if ((!base_type::empty()) && (max_time() > 0)) trim_time(base_type::front().first - max_time());
     }
     size_type max_time() const {
         return _max_time;

Comment 1 Erik Erlandson 2011-06-01 00:00:41 UTC
pushed fix to: UPSTREAM-7.7.0-BZ705437-timed-queue-crash

Comment 2 Erik Erlandson 2011-06-01 00:02:32 UTC
Not sure how to repro this.  It didn't show up when I tested with MALLOC_PERTURB_, and upstream reported they couldn't see it with valgrind either.

Comment 6 Erik Erlandson 2011-07-25 22:25:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
timed_queue<> data structure was missing a check for empty queue.

Consequence:
Left scheduler open to a potential memory access error.

Fix:
Proper check for empty queue was added to the data structure code.

Result:
The potential memory access error is now eliminated.

Comment 7 Douglas Silas 2011-08-08 14:25:34 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause:
+The scheduler could have potentially suffered a memory access error due to a missing check for an empty queue. This check has been implemented, thus eliminating the chance of incurring a memory access error.-timed_queue<> data structure was missing a check for empty queue.
-
-Consequence:
-Left scheduler open to a potential memory access error.
-
-Fix:
-Proper check for empty queue was added to the data structure code.
-
-Result:
-The potential memory access error is now eliminated.

Comment 9 Martin Kudlej 2011-08-09 12:18:50 UTC
Code inspection made by me and ltoscano. -->VERIFIED

Comment 10 errata-xmlrpc 2011-09-07 16:44:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html