Bug 625607 - condor_configd (incorrectly) concludes there is no startd running when startd(s) given nonstandard names
Summary: condor_configd (incorrectly) concludes there is no startd running when startd...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: wallaby-utils
Version: 1.3
Hardware: All
OS: All
low
medium
Target Milestone: 2.1.1
: ---
Assignee: Robert Rati
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-19 22:22 UTC by Erik Erlandson
Modified: 2011-12-08 19:59 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-22 01:15:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Erik Erlandson 2010-08-19 22:22:20 UTC
Description of problem:

In case of configurations where startds are run under nonstandard daemon-list names (i.e., not STARTD), condor_configd will incorrectly conclude no startd is running, as it searches for "STARTD" in DAEMON_LIST.

For example, if I have a machine configured with:

DAEMON_LIST = MASTER,STARTD_ST1,STARTD_ST2,STARTD_ST3 ...

If I attempt to activate a config store, I will see in the configd log output like this:

08/19 17:58:30 INFO: Retrieving configuration version "1282255107129214" from the store
08/19 17:58:42 DEBUG: Retrieved configuration from the store
08/19 17:58:42 DEBUG: Daemons to restart: [u'startd']
08/19 17:58:42 DEBUG: Daemons to reconfig: []
08/19 17:58:42 DEBUG: Not sending "condor_restart" to subsystem "startd" since it is not currently running



Steps to Reproduce:
1. configure a condor node where there is a startd running, but named something nonstandard in DAEMON_LIST (e.g.  STARTD_ST1, or some such)
2. make a modification to a parameter requiring a restart (or reconfig?) for that condor node
3. activate the configuration (condor_configure_pool --activate), while watching the log output on ConfigLog for the condor node
  
Actual results:

condor_configd will claim there is no startd running, and not restart the startds (see above).

Expected results:

supposed to restart any startds running.


Additional info:
see line 428 of condor_configd:
(retval, daemons, err) = run_cmd('condor_config_val -master DAEMON_LIST')

And also method act_upon_subsys_list()

Comment 1 Erik Erlandson 2010-08-25 16:04:50 UTC
Proposal for fix:

define a config variable:

<subsys>_WALLABY_EQUIV = <equiv1>, <equiv2> ...

for example:

STARTD_WALLABY_EQUIV = STARTD_ST1, STARTD_ST2, ...  STARTD_ST90

Update the configd script to check for these variables -- if one is defined, then replace <subsys> with <equiv1>, <equiv2> ... as paramter to condor_restart (or reconfig).

Comment 2 Will Benton 2010-08-26 15:12:48 UTC
Another idea:  for params that are of the form X.Y, assume that X is a subsystem and that X.Y has the same restart/reconfigure behavior as (unqualified) Y. Then derive subsystems for qualified parameters implicitly.

Comment 3 Robert Rati 2010-08-26 15:37:31 UTC
A subsystem in the wallaby store corresponds to a condor daemon to be restarted.  In the case where there are multiple similar daemons, like multiple startds, a subsystem in the store should correspond to a subsystem/daemon condor will be running and monitoring.  To copy a startd subsystem to a new subsystem called startd_1:

condor_configure_store -a -s startd_1

condor_configure_store -e -s startd,startd_1

Then copy all entries from startd to startd_1.


Note You need to log in before you can comment on or make changes to this bug.