Description of problem: A bug in the new timed_queue<> class introduced with the schedd performance stats has caused a crash when the schedd runs on windows (upstream). This crash is only known to have occurred on Windows, which we do not support for scheduler, however it might in principle occur on other OS. It is unknown why it has not manifested on RHEL or Fedora How reproducible: 100% (?) on Windows. So far 0% on other OS. Steps to Reproduce: 1. Start up the schedd on Windows Actual results: Crash Expected results: normal execution Additional info: currently a fix has been committed to upstream master, which can be backported: Upstream commit diff: $ git diff l/uw/master~1..l/uw/master diff --git a/src/condor_utils/timed_queue.h b/src/condor_utils/timed_queue.h index da2794d..3e88b7d 100644 --- a/src/condor_utils/timed_queue.h +++ b/src/condor_utils/timed_queue.h @@ -70,7 +70,7 @@ struct timed_queue : public std::deque<std::pair<time_t, Data> > { void max_time(size_type t) { _max_time = t; - if (max_time() > 0) trim_time(base_type::front().first - max_time()); + if ((!base_type::empty()) && (max_time() > 0)) trim_time(base_type::front().first - max_time()); } size_type max_time() const { return _max_time;
pushed fix to: UPSTREAM-7.7.0-BZ705437-timed-queue-crash
Not sure how to repro this. It didn't show up when I tested with MALLOC_PERTURB_, and upstream reported they couldn't see it with valgrind either.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: timed_queue<> data structure was missing a check for empty queue. Consequence: Left scheduler open to a potential memory access error. Fix: Proper check for empty queue was added to the data structure code. Result: The potential memory access error is now eliminated.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,11 +1 @@ -Cause: +The scheduler could have potentially suffered a memory access error due to a missing check for an empty queue. This check has been implemented, thus eliminating the chance of incurring a memory access error.-timed_queue<> data structure was missing a check for empty queue. - -Consequence: -Left scheduler open to a potential memory access error. - -Fix: -Proper check for empty queue was added to the data structure code. - -Result: -The potential memory access error is now eliminated.
Code inspection made by me and ltoscano. -->VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1249.html