Bug 705437 - schedd crash on Windows due to bug in timed_queue<>
Summary: schedd crash on Windows due to bug in timed_queue<>
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.0
Hardware: Unspecified
OS: Windows
low
high
Target Milestone: 2.0.1
: ---
Assignee: Erik Erlandson
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks: 723887
TreeView+ depends on / blocked
 
Reported: 2011-05-17 16:52 UTC by Erik Erlandson
Modified: 2011-09-07 16:44 UTC (History)
4 users (show)

Fixed In Version: condor-7.6.2-0.1
Doc Type: Bug Fix
Doc Text:
The scheduler could have potentially suffered a memory access error due to a missing check for an empty queue. This check has been implemented, thus eliminating the chance of incurring a memory access error.
Clone Of:
Environment:
Last Closed: 2011-09-07 16:44:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1249 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update 2011-09-07 16:40:45 UTC

Description Erik Erlandson 2011-05-17 16:52:28 UTC
Description of problem:
A bug in the new timed_queue<> class introduced with the schedd performance stats has caused a crash when the schedd runs on windows (upstream).  

This crash is only known to have occurred on Windows, which we do not support for scheduler, however it might in principle occur on other OS.  It is unknown why it has not manifested on RHEL or Fedora


How reproducible:
100% (?) on Windows.   So far 0% on other OS.

Steps to Reproduce:
1. Start up the schedd on Windows
  
Actual results:
Crash

Expected results:
normal execution


Additional info:
currently a fix has been committed to upstream master, which can be backported:

Upstream commit diff:

$ git diff l/uw/master~1..l/uw/master
diff --git a/src/condor_utils/timed_queue.h b/src/condor_utils/timed_queue.h
index da2794d..3e88b7d 100644
--- a/src/condor_utils/timed_queue.h
+++ b/src/condor_utils/timed_queue.h
@@ -70,7 +70,7 @@ struct timed_queue : public std::deque<std::pair<time_t, Data> > {
 
     void max_time(size_type t) {
         _max_time = t;
-        if (max_time() > 0) trim_time(base_type::front().first - max_time());
+        if ((!base_type::empty()) && (max_time() > 0)) trim_time(base_type::front().first - max_time());
     }
     size_type max_time() const {
         return _max_time;

Comment 1 Erik Erlandson 2011-06-01 00:00:41 UTC
pushed fix to: UPSTREAM-7.7.0-BZ705437-timed-queue-crash

Comment 2 Erik Erlandson 2011-06-01 00:02:32 UTC
Not sure how to repro this.  It didn't show up when I tested with MALLOC_PERTURB_, and upstream reported they couldn't see it with valgrind either.

Comment 6 Erik Erlandson 2011-07-25 22:25:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
timed_queue<> data structure was missing a check for empty queue.

Consequence:
Left scheduler open to a potential memory access error.

Fix:
Proper check for empty queue was added to the data structure code.

Result:
The potential memory access error is now eliminated.

Comment 7 Douglas Silas 2011-08-08 14:25:34 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause:
+The scheduler could have potentially suffered a memory access error due to a missing check for an empty queue. This check has been implemented, thus eliminating the chance of incurring a memory access error.-timed_queue<> data structure was missing a check for empty queue.
-
-Consequence:
-Left scheduler open to a potential memory access error.
-
-Fix:
-Proper check for empty queue was added to the data structure code.
-
-Result:
-The potential memory access error is now eliminated.

Comment 9 Martin Kudlej 2011-08-09 12:18:50 UTC
Code inspection made by me and ltoscano. -->VERIFIED

Comment 10 errata-xmlrpc 2011-09-07 16:44:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html


Note You need to log in before you can comment on or make changes to this bug.