Bug 595010

Summary: job_server: calling GetJobSummaries on a submission with live jobs causes seg fault
Product: Red Hat Enterprise MRG Reporter: Pete MacKinnon <pmackinn>
Component: condorAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED CURRENTRELEASE QA Contact: Tomas Rusnak <trusnak>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: trusnak
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-22 17:08:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pete MacKinnon 2010-05-22 19:57:38 UTC
Stack dump for process 18818 at timestamp 1274558043 (19 frames)
condor_job_server(dprintf_dump_stack+0xc7)[0x80fc5db]
condor_job_server[0x80fc7b2]
[0x675400]
condor_job_server(_ZN8AttrList8NextExprEv+0x54)[0x8159546]
condor_job_server(_Z15jobToVariantMapPK3JobRSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS5_EEEPPKc+0xde)[0x80d7403]
condor_job_server(_ZN16SubmissionObject15GetJobSummariesERSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS3_EEERSs+0x289)[0x80ce1dd]
condor_job_server(_ZN16SubmissionObject16ManagementMethodEjRN4qpid10management4ArgsERSs+0x38)[0x80ce848]
condor_job_server(_ZN3qmf3com6redhat4grid10Submission8doMethodERSsRKSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS8_EEERSF_+0x5f0)[0x80c69ce]
/usr/lib/libqmf.so.1(_ZN4qpid10management19ManagementAgentImpl19invokeMethodRequestERKSsS3_S3_+0x1057)[0x13fba7]
/usr/lib/libqmf.so.1(_ZN4qpid10management19ManagementAgentImpl13pollCallbacksEj+0xc6)[0x147276]
condor_job_server(_Z16HandleMgmtSocketP7ServiceP6Stream+0x1f)[0x80c8427]
condor_job_server(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x187)[0x80e26c7]
condor_job_server(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x3b)[0x80e252f]
condor_job_server(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x29)[0x813dd2b]
condor_job_server(_ZN10DaemonCore17CallSocketHandlerERib+0x1bc)[0x80e24f2]
condor_job_server(_ZN10DaemonCore6DriverEv+0x180e)[0x80e2200]
condor_job_server(main+0x1ce0)[0x80f6a87]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7c4a86]
condor_job_server[0x80be061]
Segmentation fault

Something wrong about the parent chaining of classads...

Comment 1 Pete MacKinnon 2010-05-24 21:35:35 UTC
Trying to externalize the cluster-to-job classads was casuing all sorts of problems. Went to a model where LiveJob ctor chains parents classad.

Comment 2 Pete MacKinnon 2010-07-20 15:09:01 UTC
1) ensure condor is setup for QMF plugins
2) QMF_PUBLISH_SUBMISSIONS=True
3) submit job using condor_submit
4) qpid-tool
5) "list com.redhat.grid:submission"
6) choose a corresponding submission in list from step1 (submission name should have cluster id at end of string)
7) "call some_qmf_object_number_from_step_6 GetJobSummaries"

should return a map of job details like cmd, args, etc.