| Summary: | condor_triggerd segfault after initialization using the condor_trigger_config | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Tomas Rusnak <trusnak> | |
| Component: | condor | Assignee: | Robert Rati <rrati> | |
| Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | Development | CC: | iboverma, jneedle, matt | |
| Target Milestone: | 2.0 | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | condor-7.6.1-0.5 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 705722 (view as bug list) | Environment: | ||
| Last Closed: | 2011-06-27 15:32:43 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 602766, 693778, 705722 | |||
The triggerd would attempt to access a null pointer when processing white space if a trigger returned class ad data from a trigger evaluation. The issue was introduced when the triggered was modified to handle new classads. Fixed upstream and on: UPSTREAM-7.6.1-BZ705343-triggerd-segfault Retested over RHEL5/x86,x86_64:
ruby-wallaby-0.10.5-4.el5
condor-wallaby-tools-4.0-6.el5
qpid-java-common-0.10-6.el5
qpid-qmf-devel-0.10-6.el5
wallaby-utils-0.10.5-4.el5
qpid-qmf-0.10-6.el5
qpid-cpp-client-ssl-0.10-7.el5
qpid-cpp-server-cluster-0.10-7.el5
python-condorutils-1.5-3.el5
ruby-qpid-qmf-0.10-6.el5
qpid-cpp-client-0.10-7.el5
condor-7.6.1-0.5.el5
qpid-cpp-server-ssl-0.10-7.el5
qpid-java-client-0.10-6.el5
qpid-java-example-0.10-6.el5
python-wallabyclient-4.0-6.el5
condor-wallaby-base-db-1.12-1.el5
python-qpid-qmf-0.10-6.el5
qpid-cpp-server-0.10-7.el5
qpid-cpp-client-devel-0.10-7.el5
qpid-cpp-server-store-0.10-7.el5
qpid-cpp-server-devel-0.10-7.el5
qpid-tools-0.10-5.el5
condor-wallaby-client-4.0-6.el5
condor-classads-7.6.1-0.5.el5
qpid-cpp-server-xml-0.10-7.el5
wallaby-0.10.5-4.el5
python-qpid-0.10-1.el5
condor-qmf-7.6.1-0.5.el5
qpid-cpp-client-devel-docs-0.10-7.el5
# tail -f /var/log/condor/TriggerLog
05/18/11 12:32:46 Adding classad value to event text
05/18/11 12:32:46 Adding text string prior to variable substitution to event text
05/18/11 12:32:46 token: 'TriggerdCondorLogStackDump'
05/18/11 12:32:46 Adding classad value to event text
05/18/11 12:32:46 Triggerd: Raised event with text '"hostname" has 4507 stack dumps in the following log files: "MasterLog,ShadowLog,ShadowLog.old,TriggerLog"'
05/18/11 12:32:46 Trying to query collector <IP:9618>
05/18/11 12:32:46 Query successful. Parsing results
05/18/11 12:32:46 Triggerd: Found 1 nodes in the pool
05/18/11 12:32:46 Triggerd: 1 nodes expected to be in the pool
05/18/11 12:32:46 Triggerd: Found 0 missing nodes
05/18/11 12:32:56 Triggerd: Evaluating 15 triggers
05/18/11 12:32:56 Trying to query collector <IP:9618>
05/18/11 12:32:56 Query successful. Parsing results
05/18/11 12:32:56 Trying to query collector <IP:9618>
05/18/11 12:32:56 Query successful. Parsing results
05/18/11 12:32:56 Trying to query collector <IP:9618>
05/18/11 12:32:56 Query successful. Parsing results
05/18/11 12:32:56 Trying to query collector <IP:9618>
05/18/11 12:32:56 Query successful. Parsing results
05/18/11 12:32:56 Trying to query collector <IP:9618>
05/18/11 12:32:56 Query successful. Parsing results
05/18/11 12:32:56 Trying to query collector <IP:9618>
05/18/11 12:32:56 Query successful. Parsing results
05/18/11 12:32:56 Trying to query collector <IP:9618>
05/18/11 12:32:56 Query successful. Parsing results
05/18/11 12:32:56 Parsing trigger text '$(Machine) has $(TriggerdCondorLogCapitalErrorCount) ERROR messages in the following log files: $(TriggerdCondorLogCapitalError)'
05/18/11 12:32:56 Adding text string prior to variable substitution to event text
05/18/11 12:32:56 token: 'Machine'
05/18/11 12:32:56 Adding classad value to event text
05/18/11 12:32:56 Adding text string prior to variable substitution to event text
05/18/11 12:32:56 token: 'TriggerdCondorLogCapitalErrorCount'
05/18/11 12:32:56 Adding classad value to event text
5472 ? Ssl 0:00 condor_triggerd -f
Daemon is still alive. No crash from condor_triggerd found.
>>> VERIFIED
|
Description of problem: The condor_triggerd segfaults without any core dump after /usr/sbin/condor_trigger_config -i `hostname` was called. Version-Release number of selected component (if applicable): ruby-wallaby-0.10.5-4.el5 condor-wallaby-tools-4.0-6.el5 qpid-cpp-server-devel-0.10-6.el5 qpid-qmf-devel-0.10-6.el5 wallaby-utils-0.10.5-4.el5 qpid-qmf-0.10-6.el5 qpid-cpp-client-devel-docs-0.10-6.el5 qpid-tools-0.10-4.el5 python-condorutils-1.5-3.el5 ruby-qpid-qmf-0.10-6.el5 qpid-cpp-client-0.10-6.el5 qpid-cpp-server-cluster-0.10-6.el5 qpid-cpp-server-store-0.10-6.el5 qpid-java-common-0.10-4.el5 condor-7.6.1-0.4.el5 python-wallabyclient-4.0-6.el5 condor-wallaby-base-db-1.12-1.el5 python-qpid-qmf-0.10-6.el5 qpid-cpp-client-ssl-0.10-6.el5 qpid-cpp-server-xml-0.10-6.el5 condor-classads-7.6.1-0.4.el5 qpid-java-client-0.10-4.el5 condor-wallaby-client-4.0-6.el5 qpid-cpp-server-ssl-0.10-6.el5 qpid-java-example-0.10-4.el5 wallaby-0.10.5-4.el5 python-qpid-0.10-1.el5 qpid-cpp-server-0.10-6.el5 qpid-cpp-client-devel-0.10-6.el5 condor-qmf-7.6.1-0.4.el5 How reproducible: 100% Steps to Reproduce: 1. set up condor for triggerd 2. run /condor_trigger_config -i `hostname` to initialize default triggers 3. tail -f /var/log/condor/TriggerLog Actual results: condor_triggerd segfault Expected results: no seffault Additional info: Config: CREATE_CORE_FILES=True ABORT_ON_EXCEPTION=True QMF_BROKER_HOST=localhost ALL_DEBUG=D_FULLDEBUG CONFIGD_ARGS = -d ALLOW_WRITE = * ALLOW_READ = * ALLOW_NEGOTIATOR = * ALLOW_ADMINISTRATOR_READ = * STARTD_CRON_NAME = TRIGGER_DATA STARTD_CRON_AUTOPUBLISH = If_Changed TRIGGER_DATA_JOBLIST = GetData TRIGGER_DATA_GETDATA_PREFIX = Triggerd TRIGGER_DATA_GETDATA_EXECUTABLE = $(BIN)/get_trigger_data TRIGGER_DATA_GETDATA_PERIOD = 5m TRIGGER_DATA_GETDATA_RECONFIG = FALSE DAEMON_LIST = $(DAEMON_LIST), TRIGGERD ENABLE_ABSENT_NODES_DETECTION=True DC_DAEMON_LIST = $(DAEMON_LIST) QMF_BROKER_AUTH_MECH = ANONYMOUS qpid: list Summary of Objects by Type: Package Class Active Deleted ======================================================== com.redhat.grid condortriggerservice 1 0 com.redhat.grid master 1 0 com.redhat.grid negotiator 1 0 com.redhat.grid collector 1 0 # condor_trigger_config -i `hostname` Connecting to broker 'hostname'... Initializing, adding default triggers... Adding trigger 'High CPU Usage'... Adding trigger 'Low Free Mem'... Adding trigger 'Low Free Disk Space (/)'... Adding trigger 'Busy and Swapping'... Adding trigger 'Busy but Idle'... Adding trigger 'Idle for long time'... Adding trigger 'Logs with ERROR entries'... Adding trigger 'Logs with error entries'... Adding trigger 'Logs with DENIED entries'... Adding trigger 'Logs with denied entries'... Adding trigger 'Logs with WARNING entries'... Adding trigger 'Logs with warning entries'... Adding trigger 'dprintf Logs'... Adding trigger 'Logs with stack dumps'... Adding trigger 'Core Files'... TriggerLog: 05/17/11 14:49:24 Triggerd::AddTriggerToCollection called 05/17/11 14:49:24 Triggerd::AddTriggerToCollection exited with return value 0 05/17/11 14:49:24 Triggerd::config called 05/17/11 14:49:24 Triggerd::SetInterval called 05/17/11 14:49:24 Triggerd: Registered PerformQueries() to evaluate triggers every 10 seconds 05/17/11 14:49:24 Updating collector every 300 seconds 05/17/11 14:49:24 Will use UDP to update collector rhel5_64.mrg-qe-12.lab.eng.brq.redhat.com <IP:9618> 05/17/11 14:49:24 DaemonCore: in SendAliveToParent() 05/17/11 14:49:24 Initialized the following authorization table: 05/17/11 14:49:24 Authorizations yet to be resolved: 05/17/11 14:49:24 allow ADMINISTRATOR: */IP */IP */IP */hostname */hostname 05/17/11 14:49:24 allow OWNER: */IP */IP */IP */IP */hostname */hostname */hostname 05/17/11 14:49:24 Completed DC_CHILDALIVE to daemon at <IP:55842> 05/17/11 14:49:24 DaemonCore: Leaving SendAliveToParent() - success 05/17/11 14:49:24 Triggerd::UpdateCollector called 05/17/11 14:49:24 Trying to update collector <IP:9618> 05/17/11 14:49:24 Attempting to send update via UDP to collector hostname <IP:9618> 05/17/11 14:49:34 Triggerd: Evaluating 15 triggers 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Trying to query collector <IP:9618> 05/17/11 14:49:34 Query successful. Parsing results 05/17/11 14:49:34 Parsing trigger text '$(Machine) has $(TriggerdCondorLogCapitalErrorCount) ERROR messages in the following log files: $(TriggerdCondorLogCapitalError)' 05/17/11 14:49:34 Adding text string prior to variable substitution to event text Stack dump for process 12272 at timestamp 1305636574 (10 frames) condor_triggerd(dprintf_dump_stack+0x56)[0x529986] condor_triggerd[0x51f662] /lib64/libpthread.so.0[0x353020eb10] condor_triggerd(_ZN3com6redhat4grid8Triggerd8RemoveWSEPKc+0xc)[0x46904c] condor_triggerd(_ZN3com6redhat4grid8Triggerd14PerformQueriesEv+0x398)[0x46b608] condor_triggerd(_ZN12TimerManager7TimeoutEv+0x155)[0x49abb5] condor_triggerd(_ZN10DaemonCore6DriverEv+0x248)[0x4853b8] condor_triggerd(main+0xe57)[0x4993a7] /lib64/libc.so.6(__libc_start_main+0xf4)[0x352f61d994] condor_triggerd[0x464fe9] No core file was generated. # ps ax | grep condor 7433 pts/0 S+ 0:00 grep condor 22730 ? Ssl 0:03 condor_master -pidfile /var/run/condor/condor_master.pid 22734 ? Ssl 0:01 condor_collector -f 22737 ? Ssl 0:00 condor_negotiator -f 22738 ? Ssl 0:00 condor_schedd -f 22739 ? Ssl 0:00 condor_startd -f 22741 ? S 0:00 condor_procd -A /var/run/condor/procd_pipe.SCHEDD -R 10000000 -S 60 -C 64 condor_trigged is down - MasterLog: 05/17/11 15:20:28 DaemonCore: No more children processes to reap. 05/17/11 15:20:28 The TRIGGERD (pid 20398) died due to signal 11 (Segmentation fault)