Bug 867989 - Cumin missing scheduler stats
Cumin missing scheduler stats
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-qmf (Show other bugs)
Development
All Linux
high Severity high
: 2.3
: ---
Assigned To: Pete MacKinnon
Daniel Horák
:
Depends On:
Blocks: 869414
  Show dependency treegraph
 
Reported: 2012-10-18 15:37 EDT by Pete MacKinnon
Modified: 2013-03-06 13:47 EST (History)
7 users (show)

See Also:
Fixed In Version: condor-7.8.7-0.1
Doc Type: Bug Fix
Doc Text:
Cause: Upstream changes in HTCondor modified the names of various condor_schedd daemon ClassAd statistical attributes. Consequence: Statistics for QMF scheduler object shows 0 values for attributes that should be non-zero. Fix: Enhanced the implementation of the QMF schedd plug-in to implicitly map from the old attribute names (7.6 series) to those renamed in the 7.8 series. Result: Statistics for QMF scheduler object shows correct values for attributes as appropriate.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-06 13:47:16 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Pete MacKinnon 2012-10-18 15:37:34 EDT
Scheduler statistical attributes are internally computed and represented slightly differently between the upstream 7.6 and 7.8 series. This BZ represents work within the QMF scheduler plugin to ensure that the existing statistical attributes exposed to QMF are mapped and represented the same as they were in previous versions, particularly as consumed by Cumin. For example, 

Job scheduler info:
    Job submission rate
    Job start rate
    Job completion rate
    Mean time to start

should all have non-zero values when the pool/scheduler is at steady state and processing jobs.
Comment 1 Erik Erlandson 2012-10-18 15:48:03 EDT
For historical and reference purposes, here's a table that represents a mapping of semantics between previous and current statistic attributes


Old Names                       New Names
-------------------------------------------

WINDOWED_STAT_WIDTH             STATISTICS_WINDOW_SECONDS  // quantized to schedd_stats_window_quantum = 200

WindowedStatWidth               RecentStatsLifetime

JobsSubmitted                   RecentJobsSubmitted
JobsSubmittedCumulative         JobsSubmitted

JobsStarted                     RecentJobsStarted
JobsStartedCumulative           JobsStarted

JobsExited                      RecentJobsExited
JobsExitedCumulative            JobsExited

JobsCompleted                   RecentJobsCompleted
JobsCompletedCumulative         JobsCompleted

ShadowExceptions                RecentJobsExitException
ShadowExceptionsCumulative      JobsExitException

<null>                          RecentJobsAccumTimeToStart
SumTimeToStartCumulative        JobsAccumTimeToStart

<null>                          RecentJobsAccumRunningTime
SumRunningTimeCumulative        JobsAccumRunningTime

JobSubmissionRate               RecentJobsSubmitted / RecentStatsLifetime
JobCompletionRate               RecentJobsCompleted / RecentStatsLifetime
JobStartRate                    RecentJobsStarted / RecentStatsLifetime

MeanTimeToStart                 RecentJobsAccumTimeToStart / RecentJobsStarted
MeanTimeToStartCumulative       JobsAccumTimeToStart / JobsStarted

MeanRunningTime                 RecentJobsAccumRunningTime / RecentJobsCompleted
MeanRunningTimeCumulative       JobsAccumRunningTime / JobsCompleted

UpdateTime                      <null> // subtract StatsLastUpdateTime from consecutive ads
Comment 2 Erik Erlandson 2012-10-18 16:18:01 EDT
Upstream confirms that recent/windowed stats make use of a ring buffer whose behavior is essentially equivalent to the previous windowed stat behavior, but with lower resolution in time.   

The main impact is that when a ring-buffer bin falls off the back end of the time window, it can cause a larger step-function drop in the value.  How visible this is to anybody consuming the stats depends on how the timing interacts with the ad publication interval.

Upstream confirmed that they are amenable to exposing the quantization level to configuration.   Such a feature would require relatively little effort to implement and pull back via a tracking branch.
Comment 3 Pete MacKinnon 2012-10-24 12:44:42 EDT
Addressed for cumin in the QMF schedd plugin based on provided stat mapping
Comment 5 Luigi Toscano 2012-10-30 08:39:00 EDT
Are there any visible changes, or is it just an internal change which should lead to "working exactly as before"? 
Or will the "working exactly as before" be ready when both this and 869414 are fixed?
Comment 6 Pete MacKinnon 2012-10-31 11:33:33 EDT
Ideally, "working exactly as before" with the resolution of 869414.
Comment 8 Erik Erlandson 2012-11-16 13:59:18 EST
Note, we are planning to include the old->new mapping table in Comment 1 in the tech note.
Comment 9 Daniel Horák 2013-01-16 12:01:42 EST
Are the changes mentioned in comment 1 visible somewhere? I check the scheduler from qpid-tool and there are attributes (only) from the left "Old" column. Is it ok? (Are the changes made only somewhere internally?)
Comment 10 Pete MacKinnon 2013-01-16 12:10:58 EST
Internal changes only so that there is zero impact to the QMF schema.
Comment 11 Daniel Horák 2013-01-17 04:29:37 EST
Tested on RHEL 5.9, 6.4 - i386, x86_64.

Compared between:
# rpm -qa | grep -e condor -e cumin -e qmd | sort
  condor-7.8.8-0.3.el6.x86_64
  condor-aviary-7.8.8-0.3.el6.x86_64
  condor-classads-7.8.8-0.3.el6.x86_64
  condor-qmf-7.8.8-0.3.el6.x86_64
  cumin-0.1.5648-1.el6.noarch
and:
# rpm -qa | grep -e condor -e cumin -e qmd | sort
  condor-7.6.5-0.22.el6.i686
  condor-aviary-7.6.5-0.22.el6.i686
  condor-classads-7.6.5-0.22.el6.i686
  condor-qmf-7.6.5-0.22.el6.i686
  cumin-0.1.5444-3.el6.noarch

And it is "working exactly as before".

>>> VERIFIED
Comment 14 errata-xmlrpc 2013-03-06 13:47:16 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html

Note You need to log in before you can comment on or make changes to this bug.