Bug 464943
| Summary: | condor_master segfaults with incorrect HA Central Manager configuration | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Robert Rati <rrati> | ||||||
| Component: | grid | Assignee: | Robert Rati <rrati> | ||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Kim van der Riet <kim.vdriet> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 1.0 | CC: | matt | ||||||
| Target Milestone: | 1.1 | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2008-11-21 22:58:50 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 318209 [details]
File to search a directory for condor configs
The condor_master log file has the following entries: 10/1 11:31:03 ****************************************************** 10/1 11:31:03 ** condor_master (CONDOR_MASTER) STARTING UP 10/1 11:31:03 ** /usr/sbin/condor_master 10/1 11:31:03 ** $CondorVersion: 7.0.4 Sep 24 2008 BuildID: RH-7.0.4-4.fc8 $ 10/1 11:31:03 ** $CondorPlatform: I386-LINUX_F8 $ 10/1 11:31:03 ** PID = 13704 10/1 11:31:03 ** Log last touched 10/1 11:28:52 10/1 11:31:03 ****************************************************** 10/1 11:31:03 Using config source: /etc/condor/condor_config 10/1 11:31:03 Using local config sources: 10/1 11:31:03 /usr/libexec/condor/condor_generate_config.sh /var/lib/condor| 10/1 11:31:03 DaemonCore: Command Socket at <10.16.65.100:49070> 10/1 11:31:03 ERROR: Unable to find collector info in configuration file!!! When the following entries are changed in the condor_ha_central_manager config: COLLECTOR_HOST = host1,host2,host3 REPLICATION_LIST = host1:41450,host2:41450,host3:41450 HAD_LIST = host1:51450,host2:51450,host3:51450 the segfault still occurs with the following errors in MasterLog: 10/1 11:42:47 ****************************************************** 10/1 11:42:47 ** condor_master (CONDOR_MASTER) STARTING UP 10/1 11:42:47 ** /usr/sbin/condor_master 10/1 11:42:47 ** $CondorVersion: 7.0.4 Sep 24 2008 BuildID: RH-7.0.4-4.fc8 $ 10/1 11:42:47 ** $CondorPlatform: I386-LINUX_F8 $ 10/1 11:42:47 ** PID = 13750 10/1 11:42:47 ** Log last touched 10/1 11:31:03 10/1 11:42:47 ****************************************************** 10/1 11:42:47 Using config source: /etc/condor/condor_config 10/1 11:42:47 Using local config sources: 10/1 11:42:47 /usr/libexec/condor/condor_generate_config.sh /var/lib/condor| 10/1 11:42:47 DaemonCore: Command Socket at <10.16.65.100:54759> 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host1 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host2 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host3 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host1 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host2 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host3 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host1 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host2 10/1 11:42:47 IPVERIFY: unable to resolve IP address of host3 When doing this testing, there was also an HA schedd configured that pointed to a shared FS that was not writable by condor. This seems to be the cause of the segfault. The following config was used: MASTER_HA_LIST = $(MASTER_HA_LIST), SCHEDD SPOOL = /mnt/sharedfs HA_LOCK_URL = file:/mnt/sharedfs VALID_SPOOL_FILES = SCHEDD.lock SCHEDD_NAME = ha-schedd@ and /mnt/sharedfs was not writable by the condor user. Bug seems to be fixed in the 7.1.x series. Starting an HA Schedd pointing to a shared file system that condor doesn't have write access to produces an error message about log file generation permission problems instead of seg faulting. |
Created attachment 318208 [details] HA Central Manager configuration that causes condor_master to segfault Description of problem: The condor_master will segfault with an incorrect High-Availability Central Manager configuration. In my setup, I have LOCAL_CONFIG_FILE set to execute a command that will search a directory for condor_config files and include them in condor's configuration via: LOCAL_CONFIG_FILE = $(LOCAL_DIR)/tools/condor_generate_config.sh $(LOCAL_DIR)| I've attached the script and the config file. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: