Bug 602766
Summary: | condor_triggerd: re-enable absent node feature | |||
---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> | |
Component: | condor | Assignee: | Robert Rati <rrati> | |
Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 1.2 | CC: | iboverma, mhusnain, mkudlej, trusnak | |
Target Milestone: | 2.0 | Keywords: | FutureFeature | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | condor-7.5.6-0.2 | Doc Type: | Enhancement | |
Doc Text: |
C: Added the ability to detect node expected to be in the pool but aren't found (absent nodes)
C: Absent nodes were not detected
C: The condor_triggerd can detect absent nodes if ENABLE_ABSENT_NODES_DETECTION is set to TRUE
R: If absent node detection is enabled, the condor_triggerd will raise an event for each node configured in wallaby for which a master qmf object is not detected
Release Note Entry:
Previously, _triggerd's C++ Console interface in Condor could not detect and report absent nodes because ENABLE_ABSENT_NODES_DETECTION was set to FALSE as a default. The ENABLE_ABSENT_NODES_DETECTION is now set to TRUE as a default in Condor, which allows _triggerd to raise an event for each node in wallaby that does not have a corresponding master qmf object.
|
Story Points: | --- | |
Clone Of: | ||||
: | 705325 (view as bug list) | Environment: | ||
Last Closed: | 2011-06-23 15:41:24 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 705343, 705722 | |||
Bug Blocks: | 693778, 705325 |
Description
Matthew Farrellee
2010-06-10 17:30:36 UTC
Fixed upstream. A configuration store (wallaby) needs to be contactable, so the feature is controlled by setting ENABLE_ABSENT_NODES_DETECTION which defaults to false in condor. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: The MRG Grid 2.0 added the ability to detect node expected to be in the pool but aren't found (absent nodes) C: Absent nodes were not detected C: The condor_triggerd can detect absent nodes if ENABLE_ABSENT_NODES_DETECTION is set to TRUE R: If absent node detection is enabled, the condor_triggerd will raise an event for each node configured in wallaby for which a master qmf object is not detected Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,4 @@ -C: The MRG Grid 2.0 added the ability to detect node expected to be in the pool but aren't found (absent nodes) +C: Added the ability to detect node expected to be in the pool but aren't found (absent nodes) C: Absent nodes were not detected C: The condor_triggerd can detect absent nodes if ENABLE_ABSENT_NODES_DETECTION is set to TRUE R: If absent node detection is enabled, the condor_triggerd will raise an event for each node configured in wallaby for which a master qmf object is not detected New bug created for RHEL6 based on this due to depended issue with QMF: bz705325 Retested on RHEL5/x86_64,x86: ruby-wallaby-0.10.5-4.el5 condor-wallaby-tools-4.0-6.el5 qpid-cpp-server-devel-0.10-6.el5 qpid-qmf-devel-0.10-6.el5 wallaby-utils-0.10.5-4.el5 qpid-qmf-0.10-6.el5 qpid-cpp-client-devel-docs-0.10-6.el5 qpid-tools-0.10-4.el5 python-condorutils-1.5-3.el5 ruby-qpid-qmf-0.10-6.el5 qpid-cpp-client-0.10-6.el5 qpid-cpp-server-cluster-0.10-6.el5 qpid-cpp-server-store-0.10-6.el5 qpid-java-common-0.10-4.el5 condor-7.6.1-0.4.el5 python-wallabyclient-4.0-6.el5 condor-wallaby-base-db-1.12-1.el5 python-qpid-qmf-0.10-6.el5 qpid-cpp-client-ssl-0.10-6.el5 qpid-cpp-server-xml-0.10-6.el5 condor-classads-7.6.1-0.4.el5 qpid-java-client-0.10-4.el5 condor-wallaby-client-4.0-6.el5 qpid-cpp-server-ssl-0.10-6.el5 qpid-java-example-0.10-4.el5 wallaby-0.10.5-4.el5 python-qpid-0.10-1.el5 qpid-cpp-server-0.10-6.el5 qpid-cpp-client-devel-0.10-6.el5 condor-qmf-7.6.1-0.4.el5 Config: CREATE_CORE_FILES=True ABORT_ON_EXCEPTION=True QMF_BROKER_HOST=localhost ALL_DEBUG=D_FULLDEBUG CONFIGD_ARGS = -d ALLOW_WRITE = * ALLOW_READ = * ALLOW_NEGOTIATOR = * ALLOW_ADMINISTRATOR_READ = * STARTD_CRON_NAME = TRIGGER_DATA STARTD_CRON_AUTOPUBLISH = If_Changed TRIGGER_DATA_JOBLIST = GetData TRIGGER_DATA_GETDATA_PREFIX = Triggerd TRIGGER_DATA_GETDATA_EXECUTABLE = $(BIN)/get_trigger_data TRIGGER_DATA_GETDATA_PERIOD = 5m TRIGGER_DATA_GETDATA_RECONFIG = FALSE DAEMON_LIST = $(DAEMON_LIST), TRIGGERD ENABLE_ABSENT_NODES_DETECTION=True DC_DAEMON_LIST = $(DAEMON_LIST) QMF_BROKER_AUTH_MECH = ANONYMOUS qpid: list Summary of Objects by Type: Package Class Active Deleted ======================================================== com.redhat.grid condortriggerservice 1 0 com.redhat.grid master 1 0 com.redhat.grid negotiator 1 0 com.redhat.grid collector 1 0 # condor_trigger_config -i `hostname` Connecting to broker 'hostname'... Initializing, adding default triggers... Adding trigger 'High CPU Usage'... Adding trigger 'Low Free Mem'... Adding trigger 'Low Free Disk Space (/)'... Adding trigger 'Busy and Swapping'... Adding trigger 'Busy but Idle'... Adding trigger 'Idle for long time'... Adding trigger 'Logs with ERROR entries'... Adding trigger 'Logs with error entries'... Adding trigger 'Logs with DENIED entries'... Adding trigger 'Logs with denied entries'... Adding trigger 'Logs with WARNING entries'... Adding trigger 'Logs with warning entries'... Adding trigger 'dprintf Logs'... Adding trigger 'Logs with stack dumps'... Adding trigger 'Core Files'... TriggerLog: 05/17/11 14:49:24 Triggerd::AddTriggerToCollection called 05/17/11 14:49:24 Triggerd::AddTriggerToCollection exited with return value 0 05/17/11 14:49:24 Triggerd::config called 05/17/11 14:49:24 Triggerd::SetInterval called 05/17/11 14:49:24 Triggerd: Registered PerformQueries() to evaluate triggers every 10 seconds 05/17/11 14:49:24 Updating collector every 300 seconds 05/17/11 14:49:24 Will use UDP to update collector rhel5_64.mrg-qe-12.lab.eng.brq.redhat.com <IP:9618> 05/17/11 14:49:24 DaemonCore: in SendAliveToParent() 05/17/11 14:49:24 Initialized the following authorization table: 05/17/11 14:49:24 Authorizations yet to be resolved: 05/17/11 14:49:24 allow ADMINISTRATOR: */IP */IP */IP */hostname */hostname 05/17/11 14:49:24 allow OWNER: */IP */IP */IP */IP */hostname */hostname */hostname 05/17/11 14:49:24 Completed DC_CHILDALIVE to daemon at <IP:55842> 05/17/11 14:49:24 DaemonCore: Leaving SendAliveToParent() - success 05/17/11 14:49:24 Triggerd::UpdateCollector called 05/17/11 14:49:24 Trying to update collector <IP:9618> 05/17/11 14:49:24 Attempting to send update via UDP to collector hostname <IP:9618> 05/17/11 14:49:34 Triggerd: Evaluating 15 triggers 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Parsing trigger text '$(Machine) has $(TriggerdCondorLogCapitalErrorCount) ERROR messages in the following log files: $(TriggerdCondorLogCapitalError)' 05/17/11 14:49:34 Adding text string prior to variable substitution to event text Stack dump for process 12272 at timestamp 1305636574 (10 frames) condor_triggerd(dprintf_dump_stack+0x56)[0x529986] condor_triggerd[0x51f662] /lib64/libpthread.so.0[0x353020eb10] condor_triggerd(_ZN3com6redhat4grid8Triggerd8RemoveWSEPKc+0xc)[0x46904c] condor_triggerd(_ZN3com6redhat4grid8Triggerd14PerformQueriesEv+0x398)[0x46b608] condor_triggerd(_ZN12TimerManager7TimeoutEv+0x155)[0x49abb5] condor_triggerd(_ZN10DaemonCore6DriverEv+0x248)[0x4853b8] condor_triggerd(main+0xe57)[0x4993a7] /lib64/libc.so.6(__libc_start_main+0xf4)[0x352f61d994] condor_triggerd[0x464fe9] No core file was generated. # ps ax | grep condor 7433 pts/0 S+ 0:00 grep condor 22730 ? Ssl 0:03 condor_master -pidfile /var/run/condor/condor_master.pid 22734 ? Ssl 0:01 condor_collector -f 22737 ? Ssl 0:00 condor_negotiator -f 22738 ? Ssl 0:00 condor_schedd -f 22739 ? Ssl 0:00 condor_startd -f 22741 ? S 0:00 condor_procd -A /var/run/condor/procd_pipe.SCHEDD -R 10000000 -S 60 -C 64 condor_trigged is down - MasterLog: 05/17/11 15:20:28 DaemonCore: No more children processes to reap. 05/17/11 15:20:28 The TRIGGERD (pid 20398) died due to signal 11 (Segmentation fault) New bugzilla created for this error and added as a blocker - bz705343 Retested over current packages on RHEL5/x86,x86_64:
ruby-wallaby-0.10.5-4.el5
condor-wallaby-tools-4.0-6.el5
qpid-java-common-0.10-6.el5
qpid-qmf-devel-0.10-6.el5
wallaby-utils-0.10.5-4.el5
qpid-qmf-0.10-6.el5
qpid-cpp-client-ssl-0.10-7.el5
qpid-cpp-server-cluster-0.10-7.el5
python-condorutils-1.5-3.el5
ruby-qpid-qmf-0.10-6.el5
qpid-cpp-client-0.10-7.el5
condor-7.6.1-0.5.el5
qpid-cpp-server-ssl-0.10-7.el5
qpid-java-client-0.10-6.el5
qpid-java-example-0.10-6.el5
python-wallabyclient-4.0-6.el5
condor-wallaby-base-db-1.12-1.el5
python-qpid-qmf-0.10-6.el5
qpid-cpp-server-0.10-7.el5
qpid-cpp-client-devel-0.10-7.el5
qpid-cpp-server-store-0.10-7.el5
qpid-cpp-server-devel-0.10-7.el5
qpid-tools-0.10-5.el5
condor-wallaby-client-4.0-6.el5
condor-classads-7.6.1-0.5.el5
qpid-cpp-server-xml-0.10-7.el5
wallaby-0.10.5-4.el5
python-qpid-0.10-1.el5
condor-qmf-7.6.1-0.5.el5
# kill -9 `pidof condor_master`
# tail -f /var/log/condor/TriggerLog | grep -i missing
05/18/11 12:38:50 Triggerd: Found 1 missing nodes
05/18/11 12:38:50 Triggerd: Raised event with text 'hostname is missing from the pool'
>>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,8 @@ C: Added the ability to detect node expected to be in the pool but aren't found (absent nodes) C: Absent nodes were not detected C: The condor_triggerd can detect absent nodes if ENABLE_ABSENT_NODES_DETECTION is set to TRUE -R: If absent node detection is enabled, the condor_triggerd will raise an event for each node configured in wallaby for which a master qmf object is not detected+R: If absent node detection is enabled, the condor_triggerd will raise an event for each node configured in wallaby for which a master qmf object is not detected + +Release Note Entry: + +Previously, _triggerd's C++ Console interface in Condor could not detect and report absent nodes because ENABLE_ABSENT_NODES_DETECTION was set to FALSE as a default. The ENABLE_ABSENT_NODES_DETECTION is now set to TRUE as a default in Condor, which allows _triggerd to raise an event for each node in wallaby that does not have a corresponding master qmf object. Technical note can be viewed in the release notes for 2.0 at the documentation stage here: http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-GRID_Update_Notes-RHM_Known_Issues An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html |