Bug 705016 - Modifying ATTR_SUBMISSION value on job leads to crash
Summary: Modifying ATTR_SUBMISSION value on job leads to crash
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-aviary
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 2.1
: ---
Assignee: Pete MacKinnon
QA Contact: Martin Kudlej
URL:
Whiteboard:
: 705052 (view as bug list)
Depends On:
Blocks: 743350
TreeView+ depends on / blocked
 
Reported: 2011-05-16 12:27 UTC by Martin Kudlej
Modified: 2012-02-07 08:52 UTC (History)
3 users (show)

Fixed In Version: condor-7.6.4-0.4
Doc Type: Bug Fix
Doc Text:
The Aviary query server could have eventually crashed when the ATTR_SUBMISSION_NAME attribute on a job was modified through the Aviary API after it had been added to the job queue, then removed, followed by an invocation of getJobDetails. This update corrects the code so that ATTR_SUBMISSION_NAME is prevented from being modified after the job has been submitted, which prevents the possibility of an eventual Aviary query server crash.
Clone Of:
Environment:
Last Closed: 2012-01-23 17:26:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log files and configuration (8.45 KB, application/x-gzip)
2011-05-16 12:33 UTC, Martin Kudlej
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2012:0045 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update 2012-01-23 22:22:58 UTC

Description Martin Kudlej 2011-05-16 12:27:50 UTC
Description of problem:
I've run simple test for aviary:
1. submit sleep 1000
2. hold that job
3. wait till summary from all submissions is held == 1
4. change attribute classad "Submission" to "test"
5. sleep 60
6. get classad "Submission" via Aviary function getJobDetail
7. remove job
8. wait till from all submissions is 0
9. sleep 60
10. get classad "Submission" via Aviary function getJobDetail
I did it few times and coredump raised in QueryServerLog.

Version-Release number of selected component (if applicable):
condor-7.6.1-0.4.el6.i686
condor-aviary-7.6.1-0.4.el6.i686
condor-classads-7.6.1-0.4.el6.i686
condor-debuginfo-7.6.1-0.4.el6.i686
condor-qmf-7.6.1-0.4.el6.i686
condor-wallaby-base-db-1.12-1.el6.noarch
condor-wallaby-client-4.0-6.el6.noarch
condor-wallaby-tools-4.0-6.el6.noarch
python-condorutils-1.5-3.el6.noarch
wso2-axis2-2.1.0-3.el6.i686
wso2-rampart-2.1.0-3.el6.i686
wso2-wsf-cpp-2.1.0-3.el6.i686
wso2-wsf-cpp-debuginfo-2.1.0-3.el6.i686

How reproducible:
100%

Actual results:
There is error in condor aviary.

Expected results:
There will be no error in condor aviary like Rb_treeIPKN.

Additional info:
$ cat QueryServerLog
Stack dump for process 14505 at timestamp 1305548445 (18 frames)
aviary_query_server(dprintf_dump_stack+0x44)[0x8110604]
aviary_query_server[0x81559d7]
[0x293400]
/lib/libc.so.6[0x5e623a]
aviary_query_server(_ZNSt8_Rb_treeIPKN6aviary5query3JobES4_St9_IdentityIS4_ENS1_6cmpjobESaIS4_EE16_M_insert_uniqueERKS4_+0x54)[0x80958a4]
aviary_query_server(_ZN6aviary5query16SubmissionObject9incrementEPKNS0_3JobE+0xb9)[0x8095269]
aviary_query_server(_ZN6aviary5query3Job19incrementSubmissionEv+0x1c)[0x809682c]
aviary_query_server(_ZN6aviary5query11LiveJobImpl3setEPKcS3_+0x268)[0x80994f8]
aviary_query_server(_ZN23JobServerJobLogConsumer12SetAttributeEPKcS1_S1_+0xb1)[0x8095e91]
aviary_query_server(_ZN16ClassAdLogReader15ProcessLogEntryEP15ClassAdLogEntryP16ClassAdLogParser+0xdf)[0x812c9ff]
aviary_query_server(_ZN16ClassAdLogReader15IncrementalLoadEv+0x93)[0x812caa3]
aviary_query_server(_ZN16ClassAdLogReader4PollEv+0xc8)[0x812cc08]
aviary_query_server(_ZN12JobLogMirror26TimerHandler_JobLogPollingEv+0x29)[0x81101e9]
aviary_query_server(_ZN12TimerManager7TimeoutEv+0x127)[0x80bddd7]
aviary_query_server(_ZN10DaemonCore6DriverEv+0x265)[0x80b2675]
aviary_query_server(main+0x13b2)[0x80a3ae2]
/lib/libc.so.6(__libc_start_main+0xe6)[0x585cc6]
aviary_query_server[0x80935a1]


$ gdb /usr/sbin/aviary_query_server core.14505
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/aviary_query_server...Reading symbols from /usr/lib/debug/usr/sbin/aviary_query_server.debug...done.
done. 

warning: core file may not match specified executable file.
[New Thread 14505]
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/23/e5a71140d5c8345a1c915447c466c23f43dc02
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libclassad.so.1.1.0...Reading symbols from /usr/lib/debug/usr/lib/libclassad.so.1.1.0.debug...done.
done. 
Loaded symbols for /usr/lib/libclassad.so.1.1.0
Reading symbols from /lib/libexpat.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libexpat.so.1
Reading symbols from /lib/libpcre.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpcre.so.0
Reading symbols from /usr/lib/libssl.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libssl.so.10
Reading symbols from /usr/lib/libcrypto.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libcrypto.so.10
Reading symbols from /lib/libkrb5.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /lib/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libk5crypto.so.3
Reading symbols from /lib/libkrb5support.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libkrb5support.so.0
Reading symbols from /usr/lib/libaxis2_engine.so.0.0.0...Reading symbols from /usr/lib/debug/usr/lib/libaxis2_engine.so.0.0.0.debug...done.
done. 
Loaded symbols for /usr/lib/libaxis2_engine.so.0.0.0
Reading symbols from /usr/lib/libaxutil.so.0.4.0...Reading symbols from /usr/lib/debug/usr/lib/libaxutil.so.0.4.0.debug...
warning: "/usr/lib/debug/usr/lib/libaxutil.so.0.4.0.debug": separate debug info file has no debug info
(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libaxutil.so.0.4.0
Reading symbols from /usr/lib/libaxis2_axiom.so.0.4.0...Reading symbols from /usr/lib/debug/usr/lib/libaxis2_axiom.so.0.4.0.debug...
warning: "/usr/lib/debug/usr/lib/libaxis2_axiom.so.0.4.0.debug": separate debug info file has no debug info
(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libaxis2_axiom.so.0.4.0
Reading symbols from /usr/lib/libaxis2_parser.so.0.4.0...Reading symbols from /usr/lib/debug/usr/lib/libaxis2_parser.so.0.4.0.debug...
warning: "/usr/lib/debug/usr/lib/libaxis2_parser.so.0.4.0.debug": separate debug info file has no debug info
(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libaxis2_parser.so.0.4.0
Reading symbols from /usr/lib/libaxis2_http_common.so.0.6.0...Reading symbols from /usr/lib/debug/usr/lib/libaxis2_http_common.so.0.6.0.debug...done.
done. 
Loaded symbols for /usr/lib/libaxis2_http_common.so.0.6.0
Reading symbols from /usr/lib/libaxis2_http_receiver.so.0.6.0...Reading symbols from /usr/lib/debug/usr/lib/libaxis2_http_receiver.so.0.6.0.debug...done.
done. 
Loaded symbols for /usr/lib/libaxis2_http_receiver.so.0.6.0
Reading symbols from /usr/lib/libaxis2_http_sender.so.0.6.0...Reading symbols from /usr/lib/debug/usr/lib/libaxis2_http_sender.so.0.6.0.debug...done.
done. 
Loaded symbols for /usr/lib/libaxis2_http_sender.so.0.6.0
Reading symbols from /usr/lib/libneethi.so.0.1.0...Reading symbols from /usr/lib/debug/usr/lib/libneethi.so.0.1.0.debug...
warning: "/usr/lib/debug/usr/lib/libneethi.so.0.1.0.debug": separate debug info file has no debug info
(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libneethi.so.0.1.0
Reading symbols from /usr/lib/libguththila.so.0.6.0...Reading symbols from /usr/lib/debug/usr/lib/libguththila.so.0.6.0.debug...done.
done. 
Loaded symbols for /usr/lib/libguththila.so.0.6.0
Reading symbols from /usr/lib/libwso2_wsf.so.0.0.0...Reading symbols from /usr/lib/debug/usr/lib/libwso2_wsf.so.0.0.0.debug...done.
done. 
Loaded symbols for /usr/lib/libwso2_wsf.so.0.0.0
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libgssapi_krb5.so.2
Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib/libkeyutils.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libkeyutils.so.1
Reading symbols from /lib/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libselinux.so.1
Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libxml2.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /usr/lib/libwsf_cpp_msg_recv.so.0.0.0...Reading symbols from /usr/lib/debug/usr/lib/libwsf_cpp_msg_recv.so.0.0.0.debug...done.
done. 
Loaded symbols for /usr/lib/libwsf_cpp_msg_recv.so.0.0.0
Reading symbols from /var/lib/condor/aviary/services/query/libaviary_query_axis.so...Reading symbols from /usr/lib/debug/var/lib/condor/aviary/services/query/libaviary_query_axis.so.debug...done.
done. 
Loaded symbols for /var/lib/condor/aviary/services/query/libaviary_query_axis.so
Core was generated by `aviary_query_server -f'.
Program terminated with signal 11, Segmentation fault.
#0  0x00293424 in __kernel_vsyscall ()
Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.i686 glibc-2.12-1.25.el6.i686 keyutils-libs-1.4-1.el6.i686 krb5-libs-1.9-9.el6.i686 libcom_err-1.41.12-7.el6.i686 libgcc-4.4.5-6.el6.i686 libselinux-2.0.94-5.el6.i686 libstdc++-4.4.5-6.el6.i686 libxml2-2.7.6-1.el6.i686 openssl-1.0.0-10.el6.i686 pcre-7.8-3.1.el6.i686 zlib-1.2.3-25.el6.i686
(gdb) info threads
* 1 Thread 0xb7838750 (LWP 14505)  0x00293424 in __kernel_vsyscall ()
(gdb) thread apply all bt
Thread 1 (Thread 0xb7838750 (LWP 14505)):
#0  0x00293424 in __kernel_vsyscall ()
#1  0x007167e0 in raise () from /lib/libpthread.so.0
#2  0x08155a24 in sig_backtrace_handler (signum=11) at /usr/src/debug/condor-7.6.0/src/condor_utils/dprintf_config.cpp:75
#3  <signal handler called>
#4  0x005e623a in __strcmp_ia32 () from /lib/libc.so.6
#5  0x080958a4 in operator() (this=0x9007878, __v=@0xbfd19fd4) at /usr/src/debug/condor-7.6.0/src/condor_contrib/aviary/src/SubmissionObject.h:43
#6  std::_Rb_tree<aviary::query::Job const*, aviary::query::Job const*, std::_Identity<aviary::query::Job const*>, aviary::query::cmpjob, std::allocator<aviary::query::Job const*> >::_M_insert_unique (
    this=0x9007878, __v=@0xbfd19fd4) at /usr/include/c++/4.4.4/bits/stl_tree.h:1170
#7  0x08095269 in insert (this=0x9007818, job=0x900ed78) at /usr/include/c++/4.4.4/bits/stl_set.h:411
#8  aviary::query::SubmissionObject::increment (this=0x9007818, job=0x900ed78) at /usr/src/debug/condor-7.6.0/src/condor_contrib/aviary/src/SubmissionObject.cpp:85
#9  0x0809682c in aviary::query::Job::incrementSubmission (this=0x900ed78) at /usr/src/debug/condor-7.6.0/src/condor_contrib/aviary/src/Job.cpp:555
#10 0x080994f8 in aviary::query::LiveJobImpl::set (this=0x901d580, _name=0x9015eb8 "JobStatus", _value=0x90226a0 "5") at /usr/src/debug/condor-7.6.0/src/condor_contrib/aviary/src/Job.cpp:272
#11 0x08095e91 in JobServerJobLogConsumer::SetAttribute (this=0x8fb5978, _key=<value optimized out>, _name=<value optimized out>, _value=0x90226a0 "5")
    at /usr/src/debug/condor-7.6.0/src/condor_contrib/aviary/src/JobServerJobLogConsumer.cpp:200
#12 0x0812c9ff in ClassAdLogReader::ProcessLogEntry (this=0x8fb598c, log_entry=0x8fb5bd4) at /usr/src/debug/condor-7.6.0/src/condor_utils/ClassAdLogReader.cpp:168
#13 0x0812caa3 in ClassAdLogReader::IncrementalLoad (this=0x8fb598c) at /usr/src/debug/condor-7.6.0/src/condor_utils/ClassAdLogReader.cpp:124
#14 0x0812cc08 in ClassAdLogReader::Poll (this=0x8fb598c) at /usr/src/debug/condor-7.6.0/src/condor_utils/ClassAdLogReader.cpp:86
#15 0x081101e9 in JobLogMirror::TimerHandler_JobLogPolling (this=0x8fb5988) at /usr/src/debug/condor-7.6.0/src/condor_utils/JobLogMirror.cpp:79
#16 0x080bddd7 in TimerManager::Timeout (this=0x81c52e8) at /usr/src/debug/condor-7.6.0/src/condor_daemon_core.V6/timer_manager.cpp:419
#17 0x080b2675 in DaemonCore::Driver (this=0x8fbe878) at /usr/src/debug/condor-7.6.0/src/condor_daemon_core.V6/daemon_core.cpp:3091
#18 0x080a3ae2 in main (argc=1, argv=0xbfd1aa98) at /usr/src/debug/condor-7.6.0/src/condor_daemon_core.V6/daemon_core_main.cpp:2377

Comment 1 Martin Kudlej 2011-05-16 12:33:28 UTC
Created attachment 499126 [details]
log files and configuration

Comment 3 Pete MacKinnon 2011-05-17 13:24:28 UTC
Changing a submission name is disruptive since the name is a key into an internal map. This is not a scenario that has been tested in development nor expressed as a requirement for QMF or Aviary. Complicating matters is that a submission is open-ended and could encompass jobs that are "live" in the queue with writeable attributes as well as jobs recorded to the history log whose attributes are "frozen" permanently.

It's trivial to change this particular case to ensure that submission names can't be modified after they enter into the system. However, we should generally revisit the notion of how we expose attributes for modification to a remote user. Do we maintain a table of RW vs. RO attributes? Special case sensitive attributes like ATTR_SUBMISSION? Food for thought...

Comment 6 Pete MacKinnon 2011-09-07 14:59:51 UTC
Modified to prevent submission name changes on attribute set in QMF & Aviary

Comment 7 Pete MacKinnon 2011-09-07 15:03:29 UTC
*** Bug 705052 has been marked as a duplicate of this bug. ***

Comment 8 Pete MacKinnon 2011-10-04 19:14:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Modification of the ATTR_SUBMISSION_NAME attribute on a job through Aviary API after it has been added to job queue, removed, then getJobDetails invoked.
Consequence: Eventual crash of Aviary query server.
Fix: Implemented guard in setJobAttribute implementation that prevents modification of ATTR_SUBMISSION_NAME post-submission.
Result: Query server does not crash.

Comment 10 Douglas Silas 2011-11-17 13:11:28 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-Cause: Modification of the ATTR_SUBMISSION_NAME attribute on a job through Aviary API after it has been added to job queue, removed, then getJobDetails invoked.
+The Aviary query server could have eventually crashed when the ATTR_SUBMISSION_NAME attribute on a job was modified through the Aviary API after it had been added to the job queue, then removed, followed by an invocation of getJobDetails. This update corrects the code so that ATTR_SUBMISSION_NAME is prevented from being modified after the job has been submitted, which prevents the possibility of an eventual Aviary query server crash.-Consequence: Eventual crash of Aviary query server.
-Fix: Implemented guard in setJobAttribute implementation that prevents modification of ATTR_SUBMISSION_NAME post-submission.
-Result: Query server does not crash.

Comment 11 Martin Kudlej 2011-11-28 15:37:08 UTC
Tested on RHEL 5.7/6.2 x x86_64/i386 with 
condor-ec2-enhanced-hooks-1.2-4
condor-classads-7.6.5-0.7
condor-debuginfo-7.6.5-0.7
python-condorutils-1.5-4
condor-wallaby-base-db-1.16-2
condor-7.6.5-0.7
condor-qmf-7.6.5-0.7
condor-kbdd-7.6.5-0.7
condor-wallaby-client-4.1.2-1
python-condorec2e-1.2-4
condor-job-hooks-1.5-4
condor-ec2-enhanced-1.2-3
condor-wallaby-tools-4.1.2-1
condor-aviary-7.6.5-0.7

and it works. -->VERIFIED

Comment 12 errata-xmlrpc 2012-01-23 17:26:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html


Note You need to log in before you can comment on or make changes to this bug.