Description of problem: The condor_triggerd segfaults without any core dump after /usr/sbin/condor_trigger_config -i `hostname` was called. Version-Release number of selected component (if applicable): ruby-wallaby-0.10.5-4.el5 condor-wallaby-tools-4.0-6.el5 qpid-cpp-server-devel-0.10-6.el5 qpid-qmf-devel-0.10-6.el5 wallaby-utils-0.10.5-4.el5 qpid-qmf-0.10-6.el5 qpid-cpp-client-devel-docs-0.10-6.el5 qpid-tools-0.10-4.el5 python-condorutils-1.5-3.el5 ruby-qpid-qmf-0.10-6.el5 qpid-cpp-client-0.10-6.el5 qpid-cpp-server-cluster-0.10-6.el5 qpid-cpp-server-store-0.10-6.el5 qpid-java-common-0.10-4.el5 condor-7.6.1-0.4.el5 python-wallabyclient-4.0-6.el5 condor-wallaby-base-db-1.12-1.el5 python-qpid-qmf-0.10-6.el5 qpid-cpp-client-ssl-0.10-6.el5 qpid-cpp-server-xml-0.10-6.el5 condor-classads-7.6.1-0.4.el5 qpid-java-client-0.10-4.el5 condor-wallaby-client-4.0-6.el5 qpid-cpp-server-ssl-0.10-6.el5 qpid-java-example-0.10-4.el5 wallaby-0.10.5-4.el5 python-qpid-0.10-1.el5 qpid-cpp-server-0.10-6.el5 qpid-cpp-client-devel-0.10-6.el5 condor-qmf-7.6.1-0.4.el5 How reproducible: 100% Steps to Reproduce: 1. set up condor for triggerd 2. run /condor_trigger_config -i `hostname` to initialize default triggers 3. tail -f /var/log/condor/TriggerLog Actual results: condor_triggerd segfault Expected results: no seffault Additional info: Config: CREATE_CORE_FILES=True ABORT_ON_EXCEPTION=True QMF_BROKER_HOST=localhost ALL_DEBUG=D_FULLDEBUG CONFIGD_ARGS = -d ALLOW_WRITE = * ALLOW_READ = * ALLOW_NEGOTIATOR = * ALLOW_ADMINISTRATOR_READ = * STARTD_CRON_NAME = TRIGGER_DATA STARTD_CRON_AUTOPUBLISH = If_Changed TRIGGER_DATA_JOBLIST = GetData TRIGGER_DATA_GETDATA_PREFIX = Triggerd TRIGGER_DATA_GETDATA_EXECUTABLE = $(BIN)/get_trigger_data TRIGGER_DATA_GETDATA_PERIOD = 5m TRIGGER_DATA_GETDATA_RECONFIG = FALSE DAEMON_LIST = $(DAEMON_LIST), TRIGGERD ENABLE_ABSENT_NODES_DETECTION=True DC_DAEMON_LIST = $(DAEMON_LIST) QMF_BROKER_AUTH_MECH = ANONYMOUS qpid: list Summary of Objects by Type: Package Class Active Deleted ======================================================== com.redhat.grid condortriggerservice 1 0 com.redhat.grid master 1 0 com.redhat.grid negotiator 1 0 com.redhat.grid collector 1 0 # condor_trigger_config -i `hostname` Connecting to broker 'hostname'... Initializing, adding default triggers... Adding trigger 'High CPU Usage'... Adding trigger 'Low Free Mem'... Adding trigger 'Low Free Disk Space (/)'... Adding trigger 'Busy and Swapping'... Adding trigger 'Busy but Idle'... Adding trigger 'Idle for long time'... Adding trigger 'Logs with ERROR entries'... Adding trigger 'Logs with error entries'... Adding trigger 'Logs with DENIED entries'... Adding trigger 'Logs with denied entries'... Adding trigger 'Logs with WARNING entries'... Adding trigger 'Logs with warning entries'... Adding trigger 'dprintf Logs'... Adding trigger 'Logs with stack dumps'... Adding trigger 'Core Files'... TriggerLog: 05/17/11 14:49:24 Triggerd::AddTriggerToCollection called 05/17/11 14:49:24 Triggerd::AddTriggerToCollection exited with return value 0 05/17/11 14:49:24 Triggerd::config called 05/17/11 14:49:24 Triggerd::SetInterval called 05/17/11 14:49:24 Triggerd: Registered PerformQueries() to evaluate triggers every 10 seconds 05/17/11 14:49:24 Updating collector every 300 seconds 05/17/11 14:49:24 Will use UDP to update collector rhel5_64.mrg-qe-12.lab.eng.brq.redhat.com <IP:9618> 05/17/11 14:49:24 DaemonCore: in SendAliveToParent() 05/17/11 14:49:24 Initialized the following authorization table: 05/17/11 14:49:24 Authorizations yet to be resolved: 05/17/11 14:49:24 allow ADMINISTRATOR: */IP */IP */IP */hostname */hostname 05/17/11 14:49:24 allow OWNER: */IP */IP */IP */IP */hostname */hostname */hostname 05/17/11 14:49:24 Completed DC_CHILDALIVE to daemon at <IP:55842> 05/17/11 14:49:24 DaemonCore: Leaving SendAliveToParent() - success 05/17/11 14:49:24 Triggerd::UpdateCollector called 05/17/11 14:49:24 Trying to update collector <IP:9618> 05/17/11 14:49:24 Attempting to send update via UDP to collector hostname <IP:9618> 05/17/11 14:49:34 Triggerd: Evaluating 15 triggers 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Parsing trigger text '$(Machine) has $(TriggerdCondorLogCapitalErrorCount) ERROR messages in the following log files: $(TriggerdCondorLogCapitalError)' 05/17/11 14:49:34 Adding text string prior to variable substitution to event text Stack dump for process 12272 at timestamp 1305636574 (10 frames) condor_triggerd(dprintf_dump_stack+0x56)[0x529986] condor_triggerd[0x51f662] /lib64/libpthread.so.0[0x353020eb10] condor_triggerd(_ZN3com6redhat4grid8Triggerd8RemoveWSEPKc+0xc)[0x46904c] condor_triggerd(_ZN3com6redhat4grid8Triggerd14PerformQueriesEv+0x398)[0x46b608] condor_triggerd(_ZN12TimerManager7TimeoutEv+0x155)[0x49abb5] condor_triggerd(_ZN10DaemonCore6DriverEv+0x248)[0x4853b8] condor_triggerd(main+0xe57)[0x4993a7] /lib64/libc.so.6(__libc_start_main+0xf4)[0x352f61d994] condor_triggerd[0x464fe9] No core file was generated. # ps ax | grep condor 7433 pts/0 S+ 0:00 grep condor 22730 ? Ssl 0:03 condor_master -pidfile /var/run/condor/condor_master.pid 22734 ? Ssl 0:01 condor_collector -f 22737 ? Ssl 0:00 condor_negotiator -f 22738 ? Ssl 0:00 condor_schedd -f 22739 ? Ssl 0:00 condor_startd -f 22741 ? S 0:00 condor_procd -A /var/run/condor/procd_pipe.SCHEDD -R 10000000 -S 60 -C 64 condor_trigged is down - MasterLog: 05/17/11 15:20:28 DaemonCore: No more children processes to reap. 05/17/11 15:20:28 The TRIGGERD (pid 20398) died due to signal 11 (Segmentation fault)
The triggerd would attempt to access a null pointer when processing white space if a trigger returned class ad data from a trigger evaluation. The issue was introduced when the triggered was modified to handle new classads. Fixed upstream and on: UPSTREAM-7.6.1-BZ705343-triggerd-segfault
Retested over RHEL5/x86,x86_64: ruby-wallaby-0.10.5-4.el5 condor-wallaby-tools-4.0-6.el5 qpid-java-common-0.10-6.el5 qpid-qmf-devel-0.10-6.el5 wallaby-utils-0.10.5-4.el5 qpid-qmf-0.10-6.el5 qpid-cpp-client-ssl-0.10-7.el5 qpid-cpp-server-cluster-0.10-7.el5 python-condorutils-1.5-3.el5 ruby-qpid-qmf-0.10-6.el5 qpid-cpp-client-0.10-7.el5 condor-7.6.1-0.5.el5 qpid-cpp-server-ssl-0.10-7.el5 qpid-java-client-0.10-6.el5 qpid-java-example-0.10-6.el5 python-wallabyclient-4.0-6.el5 condor-wallaby-base-db-1.12-1.el5 python-qpid-qmf-0.10-6.el5 qpid-cpp-server-0.10-7.el5 qpid-cpp-client-devel-0.10-7.el5 qpid-cpp-server-store-0.10-7.el5 qpid-cpp-server-devel-0.10-7.el5 qpid-tools-0.10-5.el5 condor-wallaby-client-4.0-6.el5 condor-classads-7.6.1-0.5.el5 qpid-cpp-server-xml-0.10-7.el5 wallaby-0.10.5-4.el5 python-qpid-0.10-1.el5 condor-qmf-7.6.1-0.5.el5 qpid-cpp-client-devel-docs-0.10-7.el5 # tail -f /var/log/condor/TriggerLog 05/18/11 12:32:46 Adding classad value to event text 05/18/11 12:32:46 Adding text string prior to variable substitution to event text 05/18/11 12:32:46 token: 'TriggerdCondorLogStackDump' 05/18/11 12:32:46 Adding classad value to event text 05/18/11 12:32:46 Triggerd: Raised event with text '"hostname" has 4507 stack dumps in the following log files: "MasterLog,ShadowLog,ShadowLog.old,TriggerLog"' 05/18/11 12:32:46 Trying to query collector <IP:9618> 05/18/11 12:32:46 Query successful. Parsing results 05/18/11 12:32:46 Triggerd: Found 1 nodes in the pool 05/18/11 12:32:46 Triggerd: 1 nodes expected to be in the pool 05/18/11 12:32:46 Triggerd: Found 0 missing nodes 05/18/11 12:32:56 Triggerd: Evaluating 15 triggers 05/18/11 12:32:56 Trying to query collector <IP:9618> 05/18/11 12:32:56 Query successful. Parsing results 05/18/11 12:32:56 Trying to query collector <IP:9618> 05/18/11 12:32:56 Query successful. Parsing results 05/18/11 12:32:56 Trying to query collector <IP:9618> 05/18/11 12:32:56 Query successful. Parsing results 05/18/11 12:32:56 Trying to query collector <IP:9618> 05/18/11 12:32:56 Query successful. Parsing results 05/18/11 12:32:56 Trying to query collector <IP:9618> 05/18/11 12:32:56 Query successful. Parsing results 05/18/11 12:32:56 Trying to query collector <IP:9618> 05/18/11 12:32:56 Query successful. Parsing results 05/18/11 12:32:56 Trying to query collector <IP:9618> 05/18/11 12:32:56 Query successful. Parsing results 05/18/11 12:32:56 Parsing trigger text '$(Machine) has $(TriggerdCondorLogCapitalErrorCount) ERROR messages in the following log files: $(TriggerdCondorLogCapitalError)' 05/18/11 12:32:56 Adding text string prior to variable substitution to event text 05/18/11 12:32:56 token: 'Machine' 05/18/11 12:32:56 Adding classad value to event text 05/18/11 12:32:56 Adding text string prior to variable substitution to event text 05/18/11 12:32:56 token: 'TriggerdCondorLogCapitalErrorCount' 05/18/11 12:32:56 Adding classad value to event text 5472 ? Ssl 0:00 condor_triggerd -f Daemon is still alive. No crash from condor_triggerd found. >>> VERIFIED