Description of problem: In case of configurations where startds are run under nonstandard daemon-list names (i.e., not STARTD), condor_configd will incorrectly conclude no startd is running, as it searches for "STARTD" in DAEMON_LIST. For example, if I have a machine configured with: DAEMON_LIST = MASTER,STARTD_ST1,STARTD_ST2,STARTD_ST3 ... If I attempt to activate a config store, I will see in the configd log output like this: 08/19 17:58:30 INFO: Retrieving configuration version "1282255107129214" from the store 08/19 17:58:42 DEBUG: Retrieved configuration from the store 08/19 17:58:42 DEBUG: Daemons to restart: [u'startd'] 08/19 17:58:42 DEBUG: Daemons to reconfig: [] 08/19 17:58:42 DEBUG: Not sending "condor_restart" to subsystem "startd" since it is not currently running Steps to Reproduce: 1. configure a condor node where there is a startd running, but named something nonstandard in DAEMON_LIST (e.g. STARTD_ST1, or some such) 2. make a modification to a parameter requiring a restart (or reconfig?) for that condor node 3. activate the configuration (condor_configure_pool --activate), while watching the log output on ConfigLog for the condor node Actual results: condor_configd will claim there is no startd running, and not restart the startds (see above). Expected results: supposed to restart any startds running. Additional info: see line 428 of condor_configd: (retval, daemons, err) = run_cmd('condor_config_val -master DAEMON_LIST') And also method act_upon_subsys_list()
Proposal for fix: define a config variable: <subsys>_WALLABY_EQUIV = <equiv1>, <equiv2> ... for example: STARTD_WALLABY_EQUIV = STARTD_ST1, STARTD_ST2, ... STARTD_ST90 Update the configd script to check for these variables -- if one is defined, then replace <subsys> with <equiv1>, <equiv2> ... as paramter to condor_restart (or reconfig).
Another idea: for params that are of the form X.Y, assume that X is a subsystem and that X.Y has the same restart/reconfigure behavior as (unqualified) Y. Then derive subsystems for qualified parameters implicitly.
A subsystem in the wallaby store corresponds to a condor daemon to be restarted. In the case where there are multiple similar daemons, like multiple startds, a subsystem in the store should correspond to a subsystem/daemon condor will be running and monitoring. To copy a startd subsystem to a new subsystem called startd_1: condor_configure_store -a -s startd_1 condor_configure_store -e -s startd,startd_1 Then copy all entries from startd to startd_1.