Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 867989

Summary: Cumin missing scheduler stats
Product: Red Hat Enterprise MRG Reporter: Pete MacKinnon <pmackinn>
Component: condor-qmfAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: dahorak, eerlands, ltoscano, matt, sgraf, tmckay, tstclair
Target Milestone: 2.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: condor-7.8.7-0.1 Doc Type: Bug Fix
Doc Text:
Cause: Upstream changes in HTCondor modified the names of various condor_schedd daemon ClassAd statistical attributes. Consequence: Statistics for QMF scheduler object shows 0 values for attributes that should be non-zero. Fix: Enhanced the implementation of the QMF schedd plug-in to implicitly map from the old attribute names (7.6 series) to those renamed in the 7.8 series. Result: Statistics for QMF scheduler object shows correct values for attributes as appropriate.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 18:47:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 869414    

Description Pete MacKinnon 2012-10-18 19:37:34 UTC
Scheduler statistical attributes are internally computed and represented slightly differently between the upstream 7.6 and 7.8 series. This BZ represents work within the QMF scheduler plugin to ensure that the existing statistical attributes exposed to QMF are mapped and represented the same as they were in previous versions, particularly as consumed by Cumin. For example, 

Job scheduler info:
    Job submission rate
    Job start rate
    Job completion rate
    Mean time to start

should all have non-zero values when the pool/scheduler is at steady state and processing jobs.

Comment 1 Erik Erlandson 2012-10-18 19:48:03 UTC
For historical and reference purposes, here's a table that represents a mapping of semantics between previous and current statistic attributes


Old Names                       New Names
-------------------------------------------

WINDOWED_STAT_WIDTH             STATISTICS_WINDOW_SECONDS  // quantized to schedd_stats_window_quantum = 200

WindowedStatWidth               RecentStatsLifetime

JobsSubmitted                   RecentJobsSubmitted
JobsSubmittedCumulative         JobsSubmitted

JobsStarted                     RecentJobsStarted
JobsStartedCumulative           JobsStarted

JobsExited                      RecentJobsExited
JobsExitedCumulative            JobsExited

JobsCompleted                   RecentJobsCompleted
JobsCompletedCumulative         JobsCompleted

ShadowExceptions                RecentJobsExitException
ShadowExceptionsCumulative      JobsExitException

<null>                          RecentJobsAccumTimeToStart
SumTimeToStartCumulative        JobsAccumTimeToStart

<null>                          RecentJobsAccumRunningTime
SumRunningTimeCumulative        JobsAccumRunningTime

JobSubmissionRate               RecentJobsSubmitted / RecentStatsLifetime
JobCompletionRate               RecentJobsCompleted / RecentStatsLifetime
JobStartRate                    RecentJobsStarted / RecentStatsLifetime

MeanTimeToStart                 RecentJobsAccumTimeToStart / RecentJobsStarted
MeanTimeToStartCumulative       JobsAccumTimeToStart / JobsStarted

MeanRunningTime                 RecentJobsAccumRunningTime / RecentJobsCompleted
MeanRunningTimeCumulative       JobsAccumRunningTime / JobsCompleted

UpdateTime                      <null> // subtract StatsLastUpdateTime from consecutive ads

Comment 2 Erik Erlandson 2012-10-18 20:18:01 UTC
Upstream confirms that recent/windowed stats make use of a ring buffer whose behavior is essentially equivalent to the previous windowed stat behavior, but with lower resolution in time.   

The main impact is that when a ring-buffer bin falls off the back end of the time window, it can cause a larger step-function drop in the value.  How visible this is to anybody consuming the stats depends on how the timing interacts with the ad publication interval.

Upstream confirmed that they are amenable to exposing the quantization level to configuration.   Such a feature would require relatively little effort to implement and pull back via a tracking branch.

Comment 3 Pete MacKinnon 2012-10-24 16:44:42 UTC
Addressed for cumin in the QMF schedd plugin based on provided stat mapping

Comment 5 Luigi Toscano 2012-10-30 12:39:00 UTC
Are there any visible changes, or is it just an internal change which should lead to "working exactly as before"? 
Or will the "working exactly as before" be ready when both this and 869414 are fixed?

Comment 6 Pete MacKinnon 2012-10-31 15:33:33 UTC
Ideally, "working exactly as before" with the resolution of 869414.

Comment 8 Erik Erlandson 2012-11-16 18:59:18 UTC
Note, we are planning to include the old->new mapping table in Comment 1 in the tech note.

Comment 9 Daniel Horák 2013-01-16 17:01:42 UTC
Are the changes mentioned in comment 1 visible somewhere? I check the scheduler from qpid-tool and there are attributes (only) from the left "Old" column. Is it ok? (Are the changes made only somewhere internally?)

Comment 10 Pete MacKinnon 2013-01-16 17:10:58 UTC
Internal changes only so that there is zero impact to the QMF schema.

Comment 11 Daniel Horák 2013-01-17 09:29:37 UTC
Tested on RHEL 5.9, 6.4 - i386, x86_64.

Compared between:
# rpm -qa | grep -e condor -e cumin -e qmd | sort
  condor-7.8.8-0.3.el6.x86_64
  condor-aviary-7.8.8-0.3.el6.x86_64
  condor-classads-7.8.8-0.3.el6.x86_64
  condor-qmf-7.8.8-0.3.el6.x86_64
  cumin-0.1.5648-1.el6.noarch
and:
# rpm -qa | grep -e condor -e cumin -e qmd | sort
  condor-7.6.5-0.22.el6.i686
  condor-aviary-7.6.5-0.22.el6.i686
  condor-classads-7.6.5-0.22.el6.i686
  condor-qmf-7.6.5-0.22.el6.i686
  cumin-0.1.5444-3.el6.noarch

And it is "working exactly as before".

>>> VERIFIED

Comment 14 errata-xmlrpc 2013-03-06 18:47:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html