Bug 518301

Summary: condor_status -any doesn't show a collector
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: gridAssignee: Robert Rati <rrati>
Status: CLOSED ERRATA QA Contact: Martin Kudlej <mkudlej>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.1CC: lbrindle, mkudlej
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Grid bug fix C: The collector would not include itself in queries for all daemon adtype C: condor_status -any would not list a collector F: The collector daemon now registers with itself and is included in queries for collector and any adtypes R: condor_status -any now lists the collector The collector would not include itself in queries for all daemon adtype, resulting in the query not returning a collector. The collector daemon now registers with itself and is included in queries for collector and any adtypes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-03 09:16:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    

Description Robert Rati 2009-08-19 19:07:40 UTC
Description of problem:
Running 'condor_collector -any' won't show any collectors.  The collectors in the pool should appear in the list.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2009-08-27 15:39:26 UTC
Fixed in:
condor-7.3.2-0.5

Comment 2 Martin Kudlej 2009-09-09 10:26:44 UTC
Do you mean 'condor_status -any'? Which parameters has condor_collector except http://www.cs.wisc.edu/condor/manual/v7.2/3_9DaemonCore.html#SECTION00492000000000000000 ?
I've tried "condor_status -any" and there isn't collector in the list of daemons (condor-7.4.0-0.3.el5).

Comment 3 Robert Rati 2009-09-09 15:09:55 UTC
The collector can take up to 15 minutes (default timeout) to register itself.  This shorted by setting COLLECTOR_UPDATE_INTERVAL.

And yes, condor_status -any is the command to test.

Comment 4 Martin Kudlej 2009-09-10 07:12:35 UTC
I've set up UPDATE_INTERVAL=10 and COLLECTOR_UPDATE_INTERVAL=10 and there isn't collector in list of daemons in "condor_status -any". There are Scheduler, Negotiator, 3 times DaemonMaster(pool from 3 machines) and Machine for every slot.
Teste on condor 7.4.0-0.3.el5 and Rhel 5.3.

Comment 5 Martin Kudlej 2009-09-10 08:14:18 UTC
CollectorLog: 
09/10 09:04:00 (Sending 7 ads in response to query)
09/10 09:04:10 DC_AUTHENTICATE: session mrg-qe-04:27794:1252587550:3 NOT FOUND...
09/10 09:04:10 Unable to get daemon information because no subsystem specified
09/10 09:04:11 ERROR: receiving new UDP message but found a long message still waiting to be closed (consumed=0). Closing it now.
09/10 09:04:11 StartdAd     : Inserting ** "< slot2.eng.brq.redhat.com , 10.34.33.58 >"
09/10 09:04:11 stats: Inserting new hashent for 'Start':'slot2.eng.brq.redhat.com':'10.34.33.58'
09/10 09:04:11 StartdPvtAd  : Inserting ** "< slot2.eng.brq.redhat.com , 10.34.33.58 >"
09/10 09:04:11 stats: Inserting new hashent for 'StartdPvt':'slot2.eng.brq.redhat.com':'10.34.33.58'
Stack dump for process 695 at timestamp 1252587852 (9 frames)
condor_collector(dprintf_dump_stack+0xd0)[0x8120663]
condor_collector[0x812081e]
[0x9e8420]
condor_collector(_ZN15CollectorDaemon15sendCollectorAdEv+0x2db)[0x80db747]
condor_collector(_ZN12TimerManager7TimeoutEv+0x28f)[0x811e11f]
condor_collector(_ZN10DaemonCore6DriverEv+0x79e)[0x8101146]
condor_collector(main+0x1814)[0x81181fc]
/lib/libc.so.6(__libc_start_main+0xdc)[0x7bfe8c]
condor_collector[0x80c5741]


MasterLog
09/10 09:02:59 attempt to connect to <10.34.33.57:9618> failed: Connection refused (connect errno = 111).
09/10 09:02:59 ERROR: SECMAN:2004:Failed to create security session to <10.34.33.57:9618> with TCP.|SECMAN:2003:TCP connection to <10.34.33.57:9618> failed.
09/10 09:02:59 Failed to start non-blocking update to <10.34.33.57:9618>.
09/10 09:03:09 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 695
09/10 09:04:12 The COLLECTOR (pid 695) died due to signal 11 (Segmentation fault)
09/10 09:04:12 Sending obituary for "/usr/sbin/condor_collector"
09/10 09:04:12 restarting /usr/sbin/condor_collector in 11 seconds
09/10 09:04:12 attempt to connect to <10.34.33.57:9618> failed: Connection refused (connect errno = 111).
09/10 09:04:12 ERROR: SECMAN:2004:Failed to create security session to <10.34.33.57:9618> with TCP.|SECMAN:2003:TCP connection to <10.34.33.57:9618> failed.
09/10 09:04:12 Failed to start non-blocking update to <10.34.33.57:9618>.
09/10 09:04:23 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 2024
09/10 09:05:26 The COLLECTOR (pid 2024) died due to signal 11 (Segmentation fault)
09/10 09:05:26 Sending obituary for "/usr/sbin/condor_collector"
09/10 09:05:26 restarting /usr/sbin/condor_collector in 13 seconds

I've set up CREATE_CORE_FILES=TRUE but there aren't any core files in condor log directory.

Comment 6 Robert Rati 2009-09-11 13:53:31 UTC
The core should be fixed in:
condor-7.4.0-0.4

Comment 7 Martin Kudlej 2009-09-23 14:17:58 UTC
Tested on condor-7.4.0-0.5 on RHEL 5.4 i386/x86_64 and on condor-7.4.0-0.4 on RHEL 4.8 i386/x86_64 and it works --> VERIFIED

Comment 8 Irina Boverman 2009-10-29 14:29:55 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
please see bug summary.

Comment 9 Lana Brindley 2009-11-05 04:21:08 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-please see bug summary.+Grid bug fix
+
+C: 
+C: Running 'condor_status -any' does not show any collectors
+F:
+R:
+
+MORE INFORMATION REQUIRED FOR RELNOTE

Comment 10 Robert Rati 2009-11-05 17:50:36 UTC
condor_status -any now lists the collector

Comment 11 Lana Brindley 2009-11-08 23:49:26 UTC
Thanks Rob. Still looking for Cause and Fix information.

LKB

Comment 12 Lana Brindley 2009-11-08 23:49:26 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -3,6 +3,6 @@
 C: 
 C: Running 'condor_status -any' does not show any collectors
 F:
-R:
+R: condor_status -any now lists the collector 
 
 MORE INFORMATION REQUIRED FOR RELNOTE

Comment 13 Robert Rati 2009-11-09 21:06:16 UTC
C: The collector would not include itself in queries for all daemon adtype
C: condor_status -any would not list a collector
F: The collector daemon now registers with itself and is included in queries for collector and any adtypes
r: condor_status -any now lists the collector

Comment 14 Lana Brindley 2009-11-11 20:25:45 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,8 +1,10 @@
 Grid bug fix
 
-C: 
-C: Running 'condor_status -any' does not show any collectors
-F:
-R: condor_status -any now lists the collector 
+C: The collector would not include itself in queries for all daemon adtype
+C: condor_status -any would not list a collector
+F: The collector daemon now registers with itself and is included in queries
+for collector and any adtypes
+R: condor_status -any now lists the collector
 
-MORE INFORMATION REQUIRED FOR RELNOTE+The collector would not include itself in queries for all daemon adtype, resulting in the query not returning a collector. The collector daemon now registers with itself and is included in queries
+for collector and any adtypes.

Comment 15 errata-xmlrpc 2009-12-03 09:16:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html