Bug 518301
Summary: | condor_status -any doesn't show a collector | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Robert Rati <rrati> |
Component: | grid | Assignee: | Robert Rati <rrati> |
Status: | CLOSED ERRATA | QA Contact: | Martin Kudlej <mkudlej> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 1.1 | CC: | lbrindle, mkudlej |
Target Milestone: | 1.2 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Grid bug fix
C: The collector would not include itself in queries for all daemon adtype
C: condor_status -any would not list a collector
F: The collector daemon now registers with itself and is included in queries
for collector and any adtypes
R: condor_status -any now lists the collector
The collector would not include itself in queries for all daemon adtype, resulting in the query not returning a collector. The collector daemon now registers with itself and is included in queries
for collector and any adtypes.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2009-12-03 09:16:17 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 527551 |
Description
Robert Rati
2009-08-19 19:07:40 UTC
Fixed in: condor-7.3.2-0.5 Do you mean 'condor_status -any'? Which parameters has condor_collector except http://www.cs.wisc.edu/condor/manual/v7.2/3_9DaemonCore.html#SECTION00492000000000000000 ? I've tried "condor_status -any" and there isn't collector in the list of daemons (condor-7.4.0-0.3.el5). The collector can take up to 15 minutes (default timeout) to register itself. This shorted by setting COLLECTOR_UPDATE_INTERVAL. And yes, condor_status -any is the command to test. I've set up UPDATE_INTERVAL=10 and COLLECTOR_UPDATE_INTERVAL=10 and there isn't collector in list of daemons in "condor_status -any". There are Scheduler, Negotiator, 3 times DaemonMaster(pool from 3 machines) and Machine for every slot. Teste on condor 7.4.0-0.3.el5 and Rhel 5.3. CollectorLog: 09/10 09:04:00 (Sending 7 ads in response to query) 09/10 09:04:10 DC_AUTHENTICATE: session mrg-qe-04:27794:1252587550:3 NOT FOUND... 09/10 09:04:10 Unable to get daemon information because no subsystem specified 09/10 09:04:11 ERROR: receiving new UDP message but found a long message still waiting to be closed (consumed=0). Closing it now. 09/10 09:04:11 StartdAd : Inserting ** "< slot2.eng.brq.redhat.com , 10.34.33.58 >" 09/10 09:04:11 stats: Inserting new hashent for 'Start':'slot2.eng.brq.redhat.com':'10.34.33.58' 09/10 09:04:11 StartdPvtAd : Inserting ** "< slot2.eng.brq.redhat.com , 10.34.33.58 >" 09/10 09:04:11 stats: Inserting new hashent for 'StartdPvt':'slot2.eng.brq.redhat.com':'10.34.33.58' Stack dump for process 695 at timestamp 1252587852 (9 frames) condor_collector(dprintf_dump_stack+0xd0)[0x8120663] condor_collector[0x812081e] [0x9e8420] condor_collector(_ZN15CollectorDaemon15sendCollectorAdEv+0x2db)[0x80db747] condor_collector(_ZN12TimerManager7TimeoutEv+0x28f)[0x811e11f] condor_collector(_ZN10DaemonCore6DriverEv+0x79e)[0x8101146] condor_collector(main+0x1814)[0x81181fc] /lib/libc.so.6(__libc_start_main+0xdc)[0x7bfe8c] condor_collector[0x80c5741] MasterLog 09/10 09:02:59 attempt to connect to <10.34.33.57:9618> failed: Connection refused (connect errno = 111). 09/10 09:02:59 ERROR: SECMAN:2004:Failed to create security session to <10.34.33.57:9618> with TCP.|SECMAN:2003:TCP connection to <10.34.33.57:9618> failed. 09/10 09:02:59 Failed to start non-blocking update to <10.34.33.57:9618>. 09/10 09:03:09 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 695 09/10 09:04:12 The COLLECTOR (pid 695) died due to signal 11 (Segmentation fault) 09/10 09:04:12 Sending obituary for "/usr/sbin/condor_collector" 09/10 09:04:12 restarting /usr/sbin/condor_collector in 11 seconds 09/10 09:04:12 attempt to connect to <10.34.33.57:9618> failed: Connection refused (connect errno = 111). 09/10 09:04:12 ERROR: SECMAN:2004:Failed to create security session to <10.34.33.57:9618> with TCP.|SECMAN:2003:TCP connection to <10.34.33.57:9618> failed. 09/10 09:04:12 Failed to start non-blocking update to <10.34.33.57:9618>. 09/10 09:04:23 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 2024 09/10 09:05:26 The COLLECTOR (pid 2024) died due to signal 11 (Segmentation fault) 09/10 09:05:26 Sending obituary for "/usr/sbin/condor_collector" 09/10 09:05:26 restarting /usr/sbin/condor_collector in 13 seconds I've set up CREATE_CORE_FILES=TRUE but there aren't any core files in condor log directory. The core should be fixed in: condor-7.4.0-0.4 Tested on condor-7.4.0-0.5 on RHEL 5.4 i386/x86_64 and on condor-7.4.0-0.4 on RHEL 4.8 i386/x86_64 and it works --> VERIFIED Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: please see bug summary. Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,8 @@ -please see bug summary.+Grid bug fix + +C: +C: Running 'condor_status -any' does not show any collectors +F: +R: + +MORE INFORMATION REQUIRED FOR RELNOTE condor_status -any now lists the collector Thanks Rob. Still looking for Cause and Fix information. LKB Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -3,6 +3,6 @@ C: C: Running 'condor_status -any' does not show any collectors F: -R: +R: condor_status -any now lists the collector MORE INFORMATION REQUIRED FOR RELNOTE C: The collector would not include itself in queries for all daemon adtype C: condor_status -any would not list a collector F: The collector daemon now registers with itself and is included in queries for collector and any adtypes r: condor_status -any now lists the collector Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,8 +1,10 @@ Grid bug fix -C: -C: Running 'condor_status -any' does not show any collectors -F: -R: condor_status -any now lists the collector +C: The collector would not include itself in queries for all daemon adtype +C: condor_status -any would not list a collector +F: The collector daemon now registers with itself and is included in queries +for collector and any adtypes +R: condor_status -any now lists the collector -MORE INFORMATION REQUIRED FOR RELNOTE+The collector would not include itself in queries for all daemon adtype, resulting in the query not returning a collector. The collector daemon now registers with itself and is included in queries +for collector and any adtypes. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html |