Bug 698782

Summary: triggerd excepts if bad QMF_BROKER_HOST
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: condor-qmfAssignee: Robert Rati <rrati>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: iboverma, matt, trusnak
Target Milestone: 2.0   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: condor-7.6.1-0.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-27 15:32:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Robert Rati 2011-04-21 18:38:42 UTC
Description of problem:
If absent nodes detection is enabled, the triggerd will except if QMF_BROKER_HOST isn't set to a valid host.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2011-04-21 18:42:00 UTC
Repro with the following entries:
QMF_BROKER_HOST=host1
ENABLE_ABSENT_NODES_DETECTION = TRUE

In addition to other normal triggerd config.

Comment 2 Robert Rati 2011-04-21 20:52:05 UTC
The setup of the qpid/qmfv2 connections didn't handle exceptions, which would be thrown if the broker wasn't reachable.

Fixed upstream on V7_6-branch

Comment 3 Tomas Rusnak 2011-05-30 12:00:59 UTC
Reproduced on:

$CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $
$CondorPlatform: X86_64-Redhat_5.6 $

Config:
QMF_BROKER_HOST=host1
ALL_DEBUG=D_FULLDEBUG

STARTD_CRON_NAME = TRIGGER_DATA
STARTD_CRON_AUTOPUBLISH = If_Changed
TRIGGER_DATA_JOBLIST = GetData
TRIGGER_DATA_GETDATA_PREFIX = Triggerd
TRIGGER_DATA_GETDATA_EXECUTABLE = $(BIN)/get_trigger_data
TRIGGER_DATA_GETDATA_PERIOD = 5m
TRIGGER_DATA_GETDATA_RECONFIG = FALSE

DAEMON_LIST = $(DAEMON_LIST), TRIGGERD
ENABLE_ABSENT_NODES_DETECTION=True
DC_DAEMON_LIST = $(DC_DAEMON_LIST), TRIGGERD

MasterLog:
05/30/11 14:59:16 DaemonCore: No more children processes to reap.
05/30/11 14:59:16 The TRIGGERD (pid 8464) died due to signal 6 (Aborted)
05/30/11 14:59:16 ProcAPI::buildFamily failed: parent 8464 not found on system.
05/30/11 14:59:16 Sending obituary for "/usr/sbin/condor_triggerd"
05/30/11 14:59:16 Forking Mailer process...
05/30/11 14:59:16 restarting /usr/sbin/condor_triggerd in 11 seconds

Comment 4 Tomas Rusnak 2011-05-30 12:15:41 UTC
Retested over all supported platforms RHEL5,RHEL6/x86,x86_64 with:

condor-7.6.1-0.6

TriggerLog:
05/30/11 15:06:35 main_init() called
05/30/11 15:06:35 Triggerd::Triggerd called
05/30/11 15:06:35 Triggerd::init called
05/30/11 15:06:36 Triggerd Error: Failed to contact AMQP broker on host 'host1'.  Absent nodes detection disabled
05/30/11 15:06:36 Triggerd::config called
05/30/11 15:06:36 Triggerd::SetInterval called
05/30/11 15:06:36 Triggerd: Registered PerformQueries() to evaluate triggers every 10 seconds

MasterLog:
05/30/11 15:06:35 ::RealStart; TRIGGERD on_hold=0
05/30/11 15:06:35 Create_Process: using fast clone() to create child process.
05/30/11 15:06:35 SharedPortEndpoint: Inside destructor.
05/30/11 15:06:35 Started DaemonCore process "/usr/sbin/condor_triggerd -f", pid and pgroup = 17291

No such crash found in logs. 

>>> VERIFIED