Basic config, doc changes only: Setting up hierarchical collectors and disabling QMF publishing on the subcollectors can be covered in documentation for MRG Grid and MRG Console. Configuration works in preliminary testing. Cumin complications: Code changes are necessary in cumin to remove existing filters on "Pool". The filters hail from a time when cumin handled multiple pools simultaneoulsy, where the pool identifier internally is the host/port of the Collector. In a hierarchical collector configuration, agents such as the startds will set "Pool" in QMF objects based on their COLLECTOR_HOST configuration value. This means for example that the "Pool" value for slot objects published by startds which are connected to a subcollector will not match the "Pool" value defined by the top level collector. As a result, cumin will not display these objects, even though it knows about them. The solution is to remove the filters on Pool from queries in cumin, so that all agents publishing on the qpid broker will be visible. Everything connected to a broker and publishing relevant QMF objects will be considered by Cumin to be part of the current pool. To run multiple disparate pools, multiple brokers should be used with condor instances and cumin instances configured to point at the appropriate brokers.
Created attachment 496801 [details] Patch to remove pool filtering
Created attachment 496869 [details] Patch to remove pool filtering, updated
Comment on attachment 496869 [details] Patch to remove pool filtering, updated Oops, left test lines in to add grid overview. Retry.
Modified TopSubmissionTable which is visible in default persona. Filtering on collector should be removed, but filtering on the presence of the jobserver should not be removed. Added common "find_youngest" method for locating newest object of a certain class, code was replicated in multiple places.
Created attachment 496906 [details] Patch to remove pool filtering, updated
Fixed in revision 4805. As noted above, synthetic agent deletes (BZ702440) becomes more important as a result of this change.
"multiple hierarchical collectors" - do you mean "flocking" in Condor terminology? http://www.cs.wisc.edu/condor/manual/v7.6/5_2Connecting_Condor.html
No, flocking is different. In this configuration there is one top level collector, and multiple sub-collectors that report back to the main collector. Startds are assigned (randomly or otherwise) to one of the subcollectors. (Other daemons could point at the subcollectors as well). https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors
As an example, here is a wallaby feature I've tested with that sets up 3 subcollectors. Then, I created a group of test machines and gave them the SubCollector000 feature (excuse the name :) ) # condor_configure_store -l -f HierarchicalCollector Feature "HierarchicalCollector": Feature ID: 28 Name: HierarchicalCollector Included Parameters: COLLECTOR001 = $(COLLECTOR) COLLECTOR000_ENVIRONMENT = _CONDOR_COLLECTOR_LOG=$(LOG)/CollectorLog000 COLLECTOR.COLLECTOR001.CONDOR_VIEW_HOST = $(COLLECTOR_HOST) COLLECTOR.COLLECTOR002.CONDOR_VIEW_HOST = $(COLLECTOR_HOST) COLLECTOR.COLLECTOR000.CONDOR_VIEW_HOST = $(COLLECTOR_HOST) COLLECTOR001_ENVIRONMENT = _CONDOR_COLLECTOR_LOG=$(LOG)/CollectorLog001 COLLECTOR000_ARGS = -f -p 10000 -local-name COLLECTOR000 COLLECTOR.COLLECTOR000.PLUGINS = COLLECTOR.COLLECTOR001.PLUGINS = DAEMON_LIST = >= COLLECTOR000,COLLECTOR001,COLLECTOR002 COLLECTOR002_ARGS = -f -p 10002 -local-name COLLECTOR002 COLLECTOR.COLLECTOR001.COLLECTOR_NAME = COLLECTOR001 COLLECTOR002_ENVIRONMENT = _CONDOR_COLLECTOR_LOG=$(LOG)/CollectorLog002 COLLECTOR001_ARGS = -f -p 10001 -local-name COLLECTOR001 COLLECTOR.COLLECTOR000.COLLECTOR_NAME = COLLECTOR000 COLLECTOR002 = $(COLLECTOR) COLLECTOR.COLLECTOR002.PLUGINS = COLLECTOR.COLLECTOR002.COLLECTOR_NAME = COLLECTOR002 COLLECTOR000 = $(COLLECTOR) Included Features: Conflicts: Dependencies: # condor_configure_store -l -f SubCollector000 Feature "SubCollector000": Feature ID: 21 Name: SubCollector000 Included Parameters: STARTD.COLLECTOR_HOST = $(COLLECTOR_HOST):$RANDOM_CHOICE(10000,10001,10002) Included Features: Conflicts: Dependencies:
Verified with cumin-0.1.4878-1.el5.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause Cumin filtered objects in the UI based on the condor Colletor with which they were associated. Objects not associated directly with the Collector discovered by cumin would not be displayed. Consequence In hierarchical Collector configurations, some objects published by condor daemons with COLLECTOR_HOST set to a subcollector would not be visible in cumin. Fix All Collector-based filtering in cumin was removed. Result Cumin will process any objects it receives from the messaging broker. This allows all objects to be visible in a hierarchical collector configuration. However, it is now considered a configuration error with unpredictable results to deploy multiple condor pools on the same messaging broker or to configure cumin to use multiple brokers simultaneously when those brokers serve different pools. In other words, cumin should only be able to discover objects from a single pool.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1249.html