Bug 699413 - Cumin interaction with multiple hierarchical collectors
Summary: Cumin interaction with multiple hierarchical collectors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: cumin
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 2.0.1
: ---
Assignee: Trevor McKay
QA Contact: Jan Sarenik
URL:
Whiteboard:
Depends On:
Blocks: 723887
TreeView+ depends on / blocked
 
Reported: 2011-04-25 15:05 UTC by Trevor McKay
Modified: 2012-02-08 10:36 UTC (History)
8 users (show)

Fixed In Version: cumin-0.1.4840-1
Doc Type: Bug Fix
Doc Text:
Cause Cumin filtered objects in the UI based on the condor Colletor with which they were associated. Objects not associated directly with the Collector discovered by cumin would not be displayed. Consequence In hierarchical Collector configurations, some objects published by condor daemons with COLLECTOR_HOST set to a subcollector would not be visible in cumin. Fix All Collector-based filtering in cumin was removed. Result Cumin will process any objects it receives from the messaging broker. This allows all objects to be visible in a hierarchical collector configuration. However, it is now considered a configuration error with unpredictable results to deploy multiple condor pools on the same messaging broker or to configure cumin to use multiple brokers simultaneously when those brokers serve different pools. In other words, cumin should only be able to discover objects from a single pool.
Clone Of:
Environment:
Last Closed: 2011-09-07 16:41:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to remove pool filtering (16.83 KB, text/plain)
2011-05-04 14:16 UTC, Trevor McKay
no flags Details
Patch to remove pool filtering, updated (20.67 KB, text/plain)
2011-05-04 18:31 UTC, Trevor McKay
no flags Details
Patch to remove pool filtering, updated (20.79 KB, text/plain)
2011-05-04 19:57 UTC, Trevor McKay
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1249 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update 2011-09-07 16:40:45 UTC

Comment 1 Trevor McKay 2011-04-28 20:36:38 UTC
Basic config, doc changes only:

Setting up hierarchical collectors and disabling QMF publishing on the subcollectors can be covered in documentation for MRG Grid and MRG Console.  Configuration works in preliminary testing.

Cumin complications:

Code changes are necessary in cumin to remove existing filters on "Pool".  The filters hail from a time when cumin handled multiple pools simultaneoulsy, where the pool identifier internally is the host/port of the Collector.

In a hierarchical collector configuration, agents such as the startds will set "Pool" in QMF objects based on their COLLECTOR_HOST configuration value.  This means for example that the "Pool" value for slot objects published by startds which are connected to a subcollector will not match the "Pool" value defined by the top level collector.  As a result, cumin will not display these objects, even though it knows about them.

The solution is to remove the filters on Pool from queries in cumin, so that all agents publishing on the qpid broker will be visible.  Everything connected to a broker and publishing relevant QMF objects will be considered by Cumin to be part of the current pool.

To run multiple disparate pools, multiple brokers should be used with condor instances and cumin instances configured to point at the appropriate brokers.

Comment 4 Trevor McKay 2011-05-04 14:16:33 UTC
Created attachment 496801 [details]
Patch to remove pool filtering

Comment 5 Trevor McKay 2011-05-04 18:31:23 UTC
Created attachment 496869 [details]
Patch to remove pool filtering, updated

Comment 6 Trevor McKay 2011-05-04 18:37:39 UTC
Comment on attachment 496869 [details]
Patch to remove pool filtering, updated

Oops, left test lines in to add grid overview.  Retry.

Comment 7 Trevor McKay 2011-05-04 19:56:30 UTC
Modified TopSubmissionTable which is visible in default persona.  Filtering on collector should be removed, but filtering on the presence of the jobserver should not be removed.

Added common "find_youngest" method for locating newest object of a certain class, code was replicated in multiple places.

Comment 8 Trevor McKay 2011-05-04 19:57:21 UTC
Created attachment 496906 [details]
Patch to remove pool filtering, updated

Comment 11 Trevor McKay 2011-06-09 14:56:31 UTC
Fixed in revision 4805.

As noted above, synthetic agent deletes (BZ702440) becomes more important as a result of this change.

Comment 12 Jan Sarenik 2011-07-18 12:46:59 UTC
"multiple hierarchical collectors" - do you mean "flocking"
in Condor terminology?

http://www.cs.wisc.edu/condor/manual/v7.6/5_2Connecting_Condor.html

Comment 13 Trevor McKay 2011-07-18 13:38:35 UTC
No, flocking is different.  In this configuration there is one top level collector, and multiple sub-collectors that report back to the main collector.  Startds are assigned (randomly or otherwise) to one of the subcollectors.  (Other daemons could point at the subcollectors as well).

https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors

Comment 14 Trevor McKay 2011-07-18 13:48:34 UTC
As an example, here is a wallaby feature I've tested with that sets up 3 subcollectors.  Then, I created a group of test machines and gave them the SubCollector000 feature (excuse the name :) )

# condor_configure_store -l -f HierarchicalCollector
Feature "HierarchicalCollector":
Feature ID: 28
Name: HierarchicalCollector
Included Parameters:
  COLLECTOR001 = $(COLLECTOR)
  COLLECTOR000_ENVIRONMENT = _CONDOR_COLLECTOR_LOG=$(LOG)/CollectorLog000
  COLLECTOR.COLLECTOR001.CONDOR_VIEW_HOST = $(COLLECTOR_HOST)
  COLLECTOR.COLLECTOR002.CONDOR_VIEW_HOST = $(COLLECTOR_HOST)
  COLLECTOR.COLLECTOR000.CONDOR_VIEW_HOST = $(COLLECTOR_HOST)
  COLLECTOR001_ENVIRONMENT = _CONDOR_COLLECTOR_LOG=$(LOG)/CollectorLog001
  COLLECTOR000_ARGS = -f -p 10000 -local-name COLLECTOR000
  COLLECTOR.COLLECTOR000.PLUGINS = 
  COLLECTOR.COLLECTOR001.PLUGINS = 
  DAEMON_LIST = >= COLLECTOR000,COLLECTOR001,COLLECTOR002
  COLLECTOR002_ARGS = -f -p 10002 -local-name COLLECTOR002
  COLLECTOR.COLLECTOR001.COLLECTOR_NAME = COLLECTOR001
  COLLECTOR002_ENVIRONMENT = _CONDOR_COLLECTOR_LOG=$(LOG)/CollectorLog002
  COLLECTOR001_ARGS = -f -p 10001 -local-name COLLECTOR001
  COLLECTOR.COLLECTOR000.COLLECTOR_NAME = COLLECTOR000
  COLLECTOR002 = $(COLLECTOR)
  COLLECTOR.COLLECTOR002.PLUGINS = 
  COLLECTOR.COLLECTOR002.COLLECTOR_NAME = COLLECTOR002
  COLLECTOR000 = $(COLLECTOR)
Included Features:
Conflicts:
Dependencies:

# condor_configure_store -l -f SubCollector000
Feature "SubCollector000":
Feature ID: 21
Name: SubCollector000
Included Parameters:
  STARTD.COLLECTOR_HOST = $(COLLECTOR_HOST):$RANDOM_CHOICE(10000,10001,10002)
Included Features:
Conflicts:
Dependencies:

Comment 15 Jan Sarenik 2011-07-21 10:46:48 UTC
Verified with cumin-0.1.4878-1.el5.

Comment 16 Trevor McKay 2011-07-25 15:28:56 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    Cumin filtered objects in the UI based on the condor Colletor with which they were associated.  Objects not associated directly with the Collector discovered by cumin would not be displayed.

Consequence
    In hierarchical Collector configurations, some objects published by condor daemons with COLLECTOR_HOST set to a subcollector would not be visible in cumin.

Fix
    All Collector-based filtering in cumin was removed.

Result
    Cumin will process any objects it receives from the messaging broker.  This allows all objects to be visible in a hierarchical collector configuration.  However, it is now considered a configuration error with unpredictable results to deploy multiple condor pools on the same messaging broker or to configure cumin to use multiple brokers simultaneously when those brokers serve different pools.  In other words, cumin should only be able to discover objects from a single pool.

Comment 17 errata-xmlrpc 2011-09-07 16:41:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html


Note You need to log in before you can comment on or make changes to this bug.