Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 464943

Summary: condor_master segfaults with incorrect HA Central Manager configuration
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: gridAssignee: Robert Rati <rrati>
Status: CLOSED NEXTRELEASE QA Contact: Kim van der Riet <kim.vdriet>
Severity: high Docs Contact:
Priority: high    
Version: 1.0CC: matt
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-21 22:58:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
HA Central Manager configuration that causes condor_master to segfault
none
File to search a directory for condor configs none

Description Robert Rati 2008-10-01 15:22:38 UTC
Created attachment 318208 [details]
HA Central Manager configuration that causes condor_master to segfault

Description of problem:
The condor_master will segfault with an incorrect High-Availability Central Manager configuration.  In my setup, I have LOCAL_CONFIG_FILE set to execute a command that will search a directory for condor_config files and include them in condor's configuration via:
LOCAL_CONFIG_FILE = $(LOCAL_DIR)/tools/condor_generate_config.sh $(LOCAL_DIR)|

I've attached the script and the config file.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2008-10-01 15:23:30 UTC
Created attachment 318209 [details]
File to search a directory for condor configs

Comment 2 Robert Rati 2008-10-01 15:51:22 UTC
The condor_master log file has the following entries:
10/1 11:31:03 ******************************************************
10/1 11:31:03 ** condor_master (CONDOR_MASTER) STARTING UP
10/1 11:31:03 ** /usr/sbin/condor_master
10/1 11:31:03 ** $CondorVersion: 7.0.4 Sep 24 2008 BuildID: RH-7.0.4-4.fc8 $
10/1 11:31:03 ** $CondorPlatform: I386-LINUX_F8 $
10/1 11:31:03 ** PID = 13704
10/1 11:31:03 ** Log last touched 10/1 11:28:52
10/1 11:31:03 ******************************************************
10/1 11:31:03 Using config source: /etc/condor/condor_config
10/1 11:31:03 Using local config sources:
10/1 11:31:03    /usr/libexec/condor/condor_generate_config.sh /var/lib/condor|
10/1 11:31:03 DaemonCore: Command Socket at <10.16.65.100:49070>
10/1 11:31:03 ERROR: Unable to find collector info in configuration file!!!


When the following entries are changed in the condor_ha_central_manager config:
COLLECTOR_HOST = host1,host2,host3
REPLICATION_LIST = host1:41450,host2:41450,host3:41450
HAD_LIST = host1:51450,host2:51450,host3:51450

the segfault still occurs with the following errors in MasterLog:

10/1 11:42:47 ******************************************************
10/1 11:42:47 ** condor_master (CONDOR_MASTER) STARTING UP
10/1 11:42:47 ** /usr/sbin/condor_master
10/1 11:42:47 ** $CondorVersion: 7.0.4 Sep 24 2008 BuildID: RH-7.0.4-4.fc8 $
10/1 11:42:47 ** $CondorPlatform: I386-LINUX_F8 $
10/1 11:42:47 ** PID = 13750
10/1 11:42:47 ** Log last touched 10/1 11:31:03
10/1 11:42:47 ******************************************************
10/1 11:42:47 Using config source: /etc/condor/condor_config
10/1 11:42:47 Using local config sources:
10/1 11:42:47    /usr/libexec/condor/condor_generate_config.sh /var/lib/condor|
10/1 11:42:47 DaemonCore: Command Socket at <10.16.65.100:54759>
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host1
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host2
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host3
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host1
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host2
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host3
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host1
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host2
10/1 11:42:47 IPVERIFY: unable to resolve IP address of host3

Comment 3 Robert Rati 2008-10-01 18:43:19 UTC
When doing this testing, there was also an HA schedd configured that pointed to a shared FS that was not writable by condor.  This seems to be the cause of the segfault.  The following config was used:

MASTER_HA_LIST = $(MASTER_HA_LIST), SCHEDD
SPOOL = /mnt/sharedfs
HA_LOCK_URL = file:/mnt/sharedfs
VALID_SPOOL_FILES = SCHEDD.lock
SCHEDD_NAME = ha-schedd@

and /mnt/sharedfs was not writable by the condor user.

Comment 4 Robert Rati 2008-11-06 16:10:23 UTC
Bug seems to be fixed in the 7.1.x series.  Starting an HA Schedd pointing to a shared file system that condor doesn't have write access to produces an error message about log file generation permission problems instead of seg faulting.