Description of problem: When a new configuration is activated for a node, the new WallabyFeatures and WallabyGroups are not being populated Version-Release number of selected component (if applicable): condor-wallaby-tools-3.8-4 How reproducible: 100% Steps to Reproduce: # Begin with condor pool in a virgin, "node with empty wallaby config" state: [eje@rorschach utscale]$ condor_configure_pool --load-snapshot grid_scale_2010/12/15_14:50:37_pretest Snapshot loaded [eje@rorschach utscale]$ condor_configure_pool --activate Activating configuration. This may take a while, please be patient Configuration activated [eje@rorschach utscale]$ condor_configure_pool -v -l -n rorschach Node "rorschach": Last Check-in Time: Thu Dec 16 12:45:00 2010 Group Memberships: Internal Default Group Features Applied: Explicitly Set Parameters: Configuration: WALLABY_CONFIG_VERSION = 1292528644056361 # start condor up, with no wallaby_node.config file # Log output from configd startup: 12/16 12:44:48 INFO: Starting Up 12/16 12:44:49 INFO: Hostname is "rorschach" 12/16 12:44:49 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672) 12/16 12:44:49 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults 12/16 12:44:49 DEBUG: Writing configuration file to "/usr/local/condor/local/wallaby_node.config" 12/16 12:44:49 DEBUG: Connected to broker "localhost.localdomain:5672" 12/16 12:44:49 DEBUG: Looking for the store agent 12/16 12:45:00 DEBUG: Found the store agent 12/16 12:45:00 DEBUG: Retrieved node object from store 12/16 12:45:00 DEBUG: Checking version of configuration 12/16 12:45:00 DEBUG: Performing a checkin with the store 12/16 12:45:01 DEBUG: Checked in with the store 12/16 12:45:01 INFO: Retrieving configuration version "1292528644056361" from the store 12/16 12:45:02 ERROR: Store: 'DAEMON_LIST' 12/16 12:45:02 WARNING: Failed to retrieve subsystem list. Configuration could break restart/reconfig functionality 12/16 12:45:02 INFO: Retrieved configuration from the store 12/16 12:45:02 DEBUG: Daemons to restart: [] 12/16 12:45:02 DEBUG: Daemons to reconfig: [] [root@rorschach log]$ more /usr/local/condor/local/wallaby_node.config WALLABY_CONFIG_VERSION = 1292528644056361 WallabyFeatures = "" WallabyGroups = "" MASTER_ATTRS = $(MASTER_ATTRS), WallabyFeatures, WallabyGroups STARTD_ATTRS = $(STARTD_ATTRS), WallabyFeatures, WallabyGroups QMF_BROKER_HOST = localhost.localdomain QMF_CONFIGD = /usr/sbin/condor_configd QMF_CONFIGD_ARGS = -d QMF_CONFIGD_LOG = $(LOG)/ConfigLog MAX_QMF_CONFIGD_LOG = 1000000 DAEMON_LIST = $(DAEMON_LIST), QMF_CONFIGD QMF_CONFIGD_CHECK_INTERVAL = 600 ALLOW_ADMINISTRATOR = $(ALLOW_ADMINISTRATOR), $(FULL_HOSTNAME) SEC_DEFAULT_AUTHENTICATION_METHODS = $(SEC_DEFAULT_AUTHENTICATION_METHODS), FS, NTLM, CLAIMTOBE SEC_CLIENT_AUTHENTICATION_METHODS = $(SEC_CLIENT_AUTHENTICATION_METHODS), FS, NTLM, CLAIMTOBE MASTER.SEC_ADMINISTRATOR_AUTHENTICATION_METHODS = $(MASTER.SEC_ADMINISTRATOR_AUTHENTICATION_METHODS), FS, NTLM, CLAIMTOBE #WINDOWS_SOFTKILL = $(SBIN)\pidkill.bat SHUTDOWN_FAST_TIMEOUT = 5 #QMF_CONFIGD_WIN_INTERVAL = 3 LOCAL_CONFIG_FILE = $(LOCAL_DIR)/wallaby_node.config REQUIRE_LOCAL_CONFIG_FILE = FALSE # Now load up test configuration and activate it: [eje@rorschach utscale]$ condor_configure_pool --load-snapshot grid_scale_2010/12/15_15:04:17_micro_test Snapshot loaded [eje@rorschach utscale]$ condor_configure_pool --activate Activating configuration. This may take a while, please be patient Configuration activated [eje@rorschach utscale]$ condor_configure_pool -v -l -n rorschach Node "rorschach": Last Check-in Time: Thu Dec 16 12:50:55 2010 Group Memberships: GridScaleTestMicro Internal Default Group Features Applied: GridScaleTestMicro Explicitly Set Parameters: Configuration: GRID_SCALE_TEST_RESTART_TAG = 1292450676.8 WALLABY_CONFIG_VERSION = 1292529034714247 # Here is log file from configd activation side: 12/16 12:50:35 DEBUG: Received a NodeUpdatedNotice 12/16 12:50:35 DEBUG: The event is for this node 12/16 12:50:36 DEBUG: Checking version of configuration 12/16 12:50:36 DEBUG: Performing a checkin with the store 12/16 12:50:36 DEBUG: Checked in with the store 12/16 12:50:36 INFO: Retrieving configuration version "1292529034714247" from the store 12/16 12:50:37 ERROR: Store: 'DAEMON_LIST' 12/16 12:50:37 WARNING: Failed to retrieve subsystem list. Configuration could break restart/reconfig functionality 12/16 12:50:37 INFO: Retrieved configuration from the store 12/16 12:50:37 DEBUG: Daemons to restart: [u'master'] 12/16 12:50:37 DEBUG: Daemons to reconfig: [] 12/16 12:50:38 DEBUG: Sending command "condor_restart" to subsystem "master" 12/16 12:50:38 DEBUG: Shutting down 12/16 12:50:38 DEBUG: Closing QMF connections 12/16 12:50:38 DEBUG: Lost connection to the configuration store 12/16 12:50:38 DEBUG: Closed QMF connections 12/16 12:50:38 DEBUG: Setting stop flag 12/16 12:50:38 DEBUG: Sent command "condor_restart" to subsystem "master" 12/16 12:50:38 INFO: Exiting 12/16 12:50:42 INFO: Starting Up 12/16 12:50:42 INFO: Hostname is "rorschach" 12/16 12:50:42 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672) 12/16 12:50:42 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults 12/16 12:50:42 DEBUG: Writing configuration file to "/usr/local/condor/local/wallaby_node.config" 12/16 12:50:42 DEBUG: Connected to broker "localhost.localdomain:5672" 12/16 12:50:42 DEBUG: Looking for the store agent 12/16 12:50:50 DEBUG: Found the store agent 12/16 12:50:50 DEBUG: Retrieved node object from store 12/16 12:50:54 DEBUG: Checking version of configuration 12/16 12:50:55 DEBUG: Performing a checkin with the store 12/16 12:50:55 DEBUG: Checked in with the store 12/16 12:50:55 DEBUG: The system is already running configuration version "1292529034714247" # Here is wallaby_node.config -- WallabyFeatures and WallabyGroups were not populated: [root@rorschach log]$ more /usr/local/condor/local/wallaby_node.config GRID_SCALE_TEST_RESTART_TAG = 1292450676.8 WALLABY_CONFIG_VERSION = 1292529034714247 WallabyFeatures = "" WallabyGroups = "" MASTER_ATTRS = $(MASTER_ATTRS), WallabyFeatures, WallabyGroups STARTD_ATTRS = $(STARTD_ATTRS), WallabyFeatures, WallabyGroups QMF_BROKER_HOST = localhost.localdomain QMF_CONFIGD = /usr/sbin/condor_configd QMF_CONFIGD_ARGS = -d QMF_CONFIGD_LOG = $(LOG)/ConfigLog MAX_QMF_CONFIGD_LOG = 1000000 DAEMON_LIST = $(DAEMON_LIST), QMF_CONFIGD QMF_CONFIGD_CHECK_INTERVAL = 600 ALLOW_ADMINISTRATOR = $(ALLOW_ADMINISTRATOR), $(FULL_HOSTNAME) SEC_DEFAULT_AUTHENTICATION_METHODS = $(SEC_DEFAULT_AUTHENTICATION_METHODS), FS, NTLM, CLAIMTOBE SEC_CLIENT_AUTHENTICATION_METHODS = $(SEC_CLIENT_AUTHENTICATION_METHODS), FS, NTLM, CLAIMTOBE MASTER.SEC_ADMINISTRATOR_AUTHENTICATION_METHODS = $(MASTER.SEC_ADMINISTRATOR_AUTHENTICATION_METHODS), FS, NTLM, CLAIMTOBE #WINDOWS_SOFTKILL = $(SBIN)\pidkill.bat SHUTDOWN_FAST_TIMEOUT = 5 #QMF_CONFIGD_WIN_INTERVAL = 3 LOCAL_CONFIG_FILE = $(LOCAL_DIR)/wallaby_node.config REQUIRE_LOCAL_CONFIG_FILE = FALSE Actual results: WallabyFeatures and WallabyGroups are empty Expected results: Should see: WallabyFeatures = "GridScaleTestMicro" WallabyGroups = "GridScaleTestMicro" Additional info:
Created attachment 469211 [details] Dump of the wallaby store used during the scenario
Created attachment 469212 [details] tarball of my config.d from scenario
I added some log output to WallabyHelpers.get_node_features(), and it looks like node.memberships is from the "current" config, as opposed to the config that is incoming. # when I activate a config including group GridScaleTestMicro: 12/16 16:23:14 INFO: Retrieving configuration version "1292541793631614" from the store 12/16 16:23:15 INFO: in get_node_features 12/16 16:23:15 INFO: id_name= +++1d1676d34b812e185c99321e43602092 12/16 16:23:15 INFO: group_list= [u'+++1d1676d34b812e185c99321e43602092', '+++DEFAULT'] 12/16 16:23:15 INFO: list= [] 12/16 16:23:15 INFO: list= [] 12/16 16:23:15 INFO: list= [] # next, when I activate a config that does *not* include group GridScaleTestMicro: 12/16 16:24:25 INFO: Retrieving configuration version "1292541863832308" from the store 12/16 16:24:25 INFO: in get_node_features 12/16 16:24:25 INFO: id_name= +++1d1676d34b812e185c99321e43602092 12/16 16:24:25 INFO: group_list= [u'+++1d1676d34b812e185c99321e43602092', u'GridScaleTestMicro', '+++DEFAULT'] 12/16 16:24:25 INFO: list= [] 12/16 16:24:25 INFO: list= [] 12/16 16:24:26 INFO: list= [] 12/16 16:24:26 INFO: list= []
The issue is that the groups and features are retrieved from an attribute on the node obj, but get_config was not guaranteeing the local version of the node object was updated. get_config will now call node.update before it accesses any attributes on the node object.
Unable to reproduce on previous version, because it's a new feature. Also unable to reproduce on devel version of condor from 22.11. till 22.12. because they are not working correctly. Works fine on version: condor-wallaby-client-3.8-8 Tested on: RHEL5 i386,x86_64 - passed >>> VERIFIED