Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 625607 - condor_configd (incorrectly) concludes there is no startd running when startd(s) given nonstandard names
condor_configd (incorrectly) concludes there is no startd running when startd...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: wallaby-utils (Show other bugs)
1.3
All All
low Severity medium
: 2.1.1
: ---
Assigned To: Robert Rati
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-19 18:22 EDT by Erik Erlandson
Modified: 2011-12-08 14:59 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-11-21 20:15:41 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Erik Erlandson 2010-08-19 18:22:20 EDT
Description of problem:

In case of configurations where startds are run under nonstandard daemon-list names (i.e., not STARTD), condor_configd will incorrectly conclude no startd is running, as it searches for "STARTD" in DAEMON_LIST.

For example, if I have a machine configured with:

DAEMON_LIST = MASTER,STARTD_ST1,STARTD_ST2,STARTD_ST3 ...

If I attempt to activate a config store, I will see in the configd log output like this:

08/19 17:58:30 INFO: Retrieving configuration version "1282255107129214" from the store
08/19 17:58:42 DEBUG: Retrieved configuration from the store
08/19 17:58:42 DEBUG: Daemons to restart: [u'startd']
08/19 17:58:42 DEBUG: Daemons to reconfig: []
08/19 17:58:42 DEBUG: Not sending "condor_restart" to subsystem "startd" since it is not currently running



Steps to Reproduce:
1. configure a condor node where there is a startd running, but named something nonstandard in DAEMON_LIST (e.g.  STARTD_ST1, or some such)
2. make a modification to a parameter requiring a restart (or reconfig?) for that condor node
3. activate the configuration (condor_configure_pool --activate), while watching the log output on ConfigLog for the condor node
  
Actual results:

condor_configd will claim there is no startd running, and not restart the startds (see above).

Expected results:

supposed to restart any startds running.


Additional info:
see line 428 of condor_configd:
(retval, daemons, err) = run_cmd('condor_config_val -master DAEMON_LIST')

And also method act_upon_subsys_list()
Comment 1 Erik Erlandson 2010-08-25 12:04:50 EDT
Proposal for fix:

define a config variable:

<subsys>_WALLABY_EQUIV = <equiv1>, <equiv2> ...

for example:

STARTD_WALLABY_EQUIV = STARTD_ST1, STARTD_ST2, ...  STARTD_ST90

Update the configd script to check for these variables -- if one is defined, then replace <subsys> with <equiv1>, <equiv2> ... as paramter to condor_restart (or reconfig).
Comment 2 Will Benton 2010-08-26 11:12:48 EDT
Another idea:  for params that are of the form X.Y, assume that X is a subsystem and that X.Y has the same restart/reconfigure behavior as (unqualified) Y. Then derive subsystems for qualified parameters implicitly.
Comment 3 Robert Rati 2010-08-26 11:37:31 EDT
A subsystem in the wallaby store corresponds to a condor daemon to be restarted.  In the case where there are multiple similar daemons, like multiple startds, a subsystem in the store should correspond to a subsystem/daemon condor will be running and monitoring.  To copy a startd subsystem to a new subsystem called startd_1:

condor_configure_store -a -s startd_1

condor_configure_store -e -s startd,startd_1

Then copy all entries from startd to startd_1.

Note You need to log in before you can comment on or make changes to this bug.