Red Hat Bugzilla – Bug 568502
Collector should advertise itself immediately
Last modified: 2010-10-14 11:58:23 EDT
Description of problem: The collector doesn't advertise itself until it has been running for COLLECTOR_UPDATE_INTERVAL seconds. This is contrary to other daemons, who advertise themselves to the collector as soon as they start. The Collector should advertise itself immediately on startup, then ever COLLECTOR_UPDATE_INTERVAL. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The issue is that the collector won't advertise itself when it doesn't have any startds in its hashtable. The code in the comments seems to indicate there's an issue with people running collectors on every node. The offending code in collector.cpp: // compute machine information machinesTotal = 0; machinesUnclaimed = 0; machinesClaimed = 0; machinesOwner = 0; ustatsAccum.Reset( ); if (!collector.walkHashTable (STARTD_AD, reportMiniStartdScanFunc)) { dprintf (D_ALWAYS, "Error making collector ad (startd scan) \n"); } // If we don't have any machines, then bail out. You oftentimes // see people run a collector on each macnine in their pool. Duh. if(machinesTotal == 0) { return 1; }
Moved the check for machinesTotal until after the collector has registered with local collectors. This allows the collectors to register locally, but not with the UW pool. Fixed in next build of condor.
Tested with (version): condor-7.4.4-0.9 Tested on: RHEL5 i386,x86_64 - passed RHEL4 i386,x86_64 - passed >>> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The collector did not advertise itself until it has been running for the amount of seconds specified in the 'COLLECTOR_UPDATE_INTERVAL' variable. With this update, the collector advertises itself immediately on startup and every 'COLLECTOR_UPDATE_INTERVAL' seconds.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html