Hide Forgot
Description of problem: $ qpid-tool qpid: list submission ... 3225 06:34:02 - hostname#2525 qpid: call 3225 GetJobSummaries qpid: OK (0) - {u'Jobs': [{u'ProcId': 0, u'Args': '6', u'Submission': 'mrg-qe-05.lab.eng.brq.redhat.com#2525', u'CurrentTime': 1301570306, u'QDate': 1301553224, u'Cmd': '/bin/sleep', u'ClusterId': 2525, u'JobStatus': 4, u'Owner': 'condor', u'EnteredCurrentStatus': 1301553236, u'GlobalJobId': 'hostname#2525.0#1301553224'}]} qpid: list ... com.redhat.grid jobserver 1 0 qpid: list jobserver 322 14:48:00 - jobs@hostname qpid: call 322 GetJobAd 'hostname#2525' qpid: Unknown Job Id (65536) - {} qpid: call 322 GetJobAd '2525' qpid: Unknown Job Id (65536) - {} qpid: call 322 GetJobAd '2525.0' qpid: Unknown Job Id (65536) - {} qpid: call 322 GetJobAd 'hostname#2525.0' qpid: Unknown Job Id (65536) - {} qpid: call 322 GetJobAd 'hostname#2525.0#1301553224' qpid: Unknown Job Id (65536) - {} I think this is a bug. Or am I doing anything wrong? Version-Release number of selected component (if applicable): condor-qmf-7.6.0-0.4.el5 qpid-qmf-0.10-2.el5 qpid-cpp-server-0.10-1.el5 How reproducible: 100% Steps to Reproduce: 1. install condor 2. install condor remote configuration 3. set up: condor_configure_pool -n `hostname` -l Node "hostname": Last Check-in Time: Thu Mar 31 06:51:42 2011 Group Memberships: Internal Default Group Features Applied: CentralManager Scheduler ConsoleExecuteNode ConsoleMaster ConsoleScheduler ConsoleCollector ConsoleNegotiator QMF JobServer ExecuteNode Master NodeAccess Explicitly Set Parameters: QMF_BROKER_HOST = 127.0.0.1 Also I've tried this configuration with reordered features: condor_configure_pool -n `hostname` -l Node "hostname": Last Check-in Time: Thu Mar 31 07:57:54 2011 Group Memberships: Internal Default Group Features Applied: QMF JobServer CentralManager Scheduler ConsoleExecuteNode ConsoleMaster ConsoleScheduler ConsoleCollector ConsoleNegotiator ExecuteNode Master NodeAccess Explicitly Set Parameters: QMF_BROKER_HOST = 127.0.0.1 $ condor_history 2525.0 2525.0 condor 3/31 02:33 0+00:00:07 C 3/31 02:33 /bin/sleep 6 Actual results: QMF function jobserver.GetJobAd doesn't get right data. Expected results: QMF function jobserver.GetJobAd will get right data. Additional info:
Created attachment 489094 [details] log file
0) There may be an issue in how arguments are serialized from qpid-tool 0.10. 1) JobServer is SEGV in GetJobAd - $ rpm -qa | grep -e condor -e qpid -e qmf -e cumin | sort -u condor-7.6.0-0.4.el6.x86_64 condor-aviary-7.6.0-0.4.el6.x86_64 condor-debuginfo-7.6.0-0.2.el6.x86_64 condor-qmf-7.6.0-0.4.el6.x86_64 cumin-0.1.4675-1.el6.noarch python-qpid-0.10-1.el6.noarch python-qpid-qmf-0.10-4.el6.x86_64 qpid-cpp-client-0.10-2.el6.x86_64 qpid-cpp-client-devel-0.10-2.el6.x86_64 qpid-cpp-server-0.10-2.el6.x86_64 qpid-qmf-0.10-4.el6.x86_64 qpid-qmf-devel-0.10-4.el6.x86_64 qpid-tools-0.10-1.el6.noarch - Without condor_job_server... From qpid-tool: (call 123 GetJobAd '246.0' and "246.0" SchedLog - (with SCHEDD_DEBUG = D_FULLDEBUG) 03/31/11 10:04:22 Calling GetJobAd for ''246.0'' 03/31/11 10:04:22 Fetch: Failed to parse id: '246.0' ... 03/31/11 10:04:35 Calling GetJobAd for '"246.0"' 03/31/11 10:04:35 Fetch: Failed to parse id: "246.0" Navigating to the Submission via Cumin: (details are displayed) ScheddLog - (with QMF_PUBLISH_SUBMISSIONS = TRUE) 03/31/11 10:17:20 GetJobSummaries for submission: Submission == "eeyore.local#246" 03/31/11 10:17:20 Warning: failed to lookup attribute 'Args' from ad 03/31/11 10:17:20 Warning: Args attribute not found for job of Submission == "eeyore.local#246" 03/31/11 10:17:20 Warning: failed to lookup attribute 'ReleaseReason' from ad 03/31/11 10:17:20 Warning: ReleaseReason attribute not found for job of Submission == "eeyore.local#246" 03/31/11 10:17:28 Calling GetJobAd for '246.0' 03/31/11 10:17:28 PROC_ID is '246.0' There may be an issue in how arguments are serialized from qpid-tool 0.10. - With condor_job_server... JOB_SERVER = $(SBIN)/condor_job_server JOB_SERVER_ARGS = -f JOB_SERVER_DEBUG = D_FULLDEBUG JOB_SERVER_LOG = $(LOG)/JobServerLog SCHEDD.QMF_PUBLISH_SUBMISSIONS = FALSE DAEMON_LIST = $(DAEMON_LIST), JOB_SERVER 03/31/11 10:27:50 warning: failed to lookup attribute Args in job '246.0' 03/31/11 10:27:50 warning: failed to lookup attribute ReleaseReason in job '246.0' Stack dump for process 14900 at timestamp 1301581674 (18 frames) condor_job_server(dprintf_dump_stack+0x63)[0x534983] condor_job_server(linux_sig_coredump(int)+0x40)[0x495f00] /lib64/libpthread.so.0[0x376300f4a0] /lib64/libc.so.6[0x3762d2091f] condor_job_server(com::redhat::grid::JobServerObject::GetJobAd(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, qpid::types::Variant, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, qpid::types::Variant> > >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0xd7)[0x46a287] condor_job_server(com::redhat::grid::JobServerObject::ManagementMethod(unsigned int, qpid::management::Args&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0x93)[0x46a383] condor_job_server(qmf::com::redhat::grid::JobServer::doMethod(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, qpid::types::Variant, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, qpid::types::Variant> > > const&, std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, qpid::types::Variant, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, qpid::types::Variant> > >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x710)[0x473350] /usr/lib64/libqmf.so.4(qpid::management::ManagementAgentImpl::invokeMethodRequest(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x95f)[0x326761e73f] /usr/lib64/libqmf.so.4(qpid::management::ManagementAgentImpl::pollCallbacks(unsigned int)+0xbc)[0x326762029c] condor_job_server(HandleMgmtSocket(Service*, Stream*)+0x14)[0x4639d4] condor_job_server(DaemonCore::CallSocketHandler_worker(int, bool, Stream*)+0x285)[0x490de5] condor_job_server(DaemonCore::CallSocketHandler_worker_demarshall(void*)+0x1a)[0x4912da] condor_job_server(CondorThreads::pool_add(void (*)(void*), void*, int*, char const*)+0x40)[0x51df70] condor_job_server(DaemonCore::CallSocketHandler(int&, bool)+0x135)[0x486bd5] condor_job_server(DaemonCore::Driver()+0x2013)[0x48b7c3] condor_job_server(main+0x10eb)[0x4978eb] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3762c1ec5d] condor_job_server[0x463509] Navigating to the job details in Cumin results in a SEGV when navigating to job list of eeyore.local#246.
Note: When the condor_job_server SEGV, Cumin will report "Request timed out"
Testing uncovered bug 692741 The JobServer SEGV fix is pushed to V7_6-branch as commit 8450a28be8531e76982b8b6c6063dd4bd8caa1e8 Author: Matthew Farrellee <matt@redhat> Date: Thu Mar 31 16:06:56 2011 -0400 Corrected JOB_AD_ERROR lookup logic, removed SEGV from string = NULL
Please check to see if there is a regression in qpid-tool. You should be able to get details with, qpid: call 322 GetJobAd 2525.0 However, qpid-tool in 0.10 insists on, qpid: call 322 GetJobAd "2525.0" and passes "2525.0", not 2525.0, to the GetJobAd method, which fails as expected.
I think it should work with and without quotes. We've tried another method of another object in broker: qpid: call 246 reroute 5 True "amq.topic" qpid: OK (0) - {} qpid: call 246 reroute 5 True 'amq.topic' qpid: OK (0) - {} qpid: call 246 reroute 5 True amq.topic qpid: OK (0) - {} for condor-qmf I've got: qpid: call 11414 GetJobSummaries qpid: OK (0) - {u'Jobs': [{u'ProcId': 0, u'CurrentTime': 1301909239, u'QDate': 1301908659, u'Cmd': '/tmp/.condor_run.28044', u'ClusterId': 2531, u'JobStatus': 2, u'Arguments': '""', u'EnteredCurrentStatus': 1301908670, u'GlobalJobId': 'mrg-qe-05.lab.eng.brq.redhat.com#2531.0#1301908659'}]} qpid: call 322 GetJobAd '2531.0' qpid: Unknown Job Id (65536) - {} qpid: qpid: call 322 GetJobAd 2531.0 qpid: invalid conversion: Variant is not a string; use asString() if conversion is required. (qpid/types/Variant.cpp:569) (7) - {} qpid: call 322 GetJobAd "2531.0" qpid: Unknown Job Id (65536) - {} qpid: qpid: call 322 GetJobAd '2531.0' qpid: Unknown Job Id (65536) - {} All tests done with: condor-7.6.0-0.4.el5 condor-qmf-7.6.0-0.4.el5 qpid-qmf-0.10-2.el5 qpid-cpp-server-0.10-1.el5 qpid-qmf-devel-0.10-2.el5 qpid-tools-0.10-1.el5 qpid-cpp-client-0.10-1.el5 ruby-qpid-qmf-0.10-2.el5 qpid-cpp-client-devel-0.10-1.el5 python-qpid-0.10-1.el5 python-qpid-qmf-0.10-2.el5
(0) from comment 2 demonstrates a regression in qpid-tool
The behavior from comment 7 is now tracked separately as bug 693847.
Retested over all supported platforms x86, x86_64 with: RHEL5: python-qpid-0.10-1.el5 python-qpid-qmf-0.10-8.el5 qpid-cpp-client-0.10-7.el5 qpid-cpp-client-devel-0.10-7.el5 qpid-cpp-client-devel-docs-0.10-7.el5 qpid-cpp-client-ssl-0.10-7.el5 qpid-cpp-server-0.10-7.el5 qpid-cpp-server-cluster-0.10-7.el5 qpid-cpp-server-devel-0.10-7.el5 qpid-cpp-server-ssl-0.10-7.el5 qpid-cpp-server-store-0.10-7.el5 qpid-cpp-server-xml-0.10-7.el5 qpid-java-client-0.10-6.el5 qpid-java-common-0.10-6.el5 qpid-java-example-0.10-6.el5 qpid-qmf-0.10-8.el5 qpid-qmf-devel-0.10-8.el5 qpid-tools-0.10-5.el5 ruby-qpid-qmf-0.10-8.el5 RHEL6: python-qpid-0.10-1.el6.noarch python-qpid-qmf-0.10-7.el6.x86_64 qpid-cpp-client-0.10-4.el6.x86_64 qpid-cpp-client-devel-0.10-4.el6.x86_64 qpid-cpp-client-devel-docs-0.10-4.el6.noarch qpid-cpp-client-rdma-0.10-4.el6.x86_64 qpid-cpp-client-ssl-0.10-4.el6.x86_64 qpid-cpp-server-0.10-4.el6.x86_64 qpid-cpp-server-cluster-0.10-4.el6.x86_64 qpid-cpp-server-devel-0.10-4.el6.x86_64 qpid-cpp-server-rdma-0.10-4.el6.x86_64 qpid-cpp-server-ssl-0.10-4.el6.x86_64 qpid-cpp-server-store-0.10-4.el6.x86_64 qpid-cpp-server-xml-0.10-4.el6.x86_64 qpid-java-client-0.10-6.el6.noarch qpid-java-common-0.10-6.el6.noarch qpid-java-example-0.10-6.el6.noarch qpid-java-jca-0.10-6.el6.noarch qpid-qmf-0.10-7.el6.x86_64 qpid-tests-0.10-1.el6.noarch qpid-tools-0.10-4.el6.noarch rh-qpid-cpp-tests-0.10-4.el6.x86_64 ruby-qpid-0.7.946106-2.el6.x86_64 ruby-qpid-qmf-0.10-7.el6.x86_64 sesame-debuginfo-0.10-1.el6.x86_64 ===== python-qpid-0.10-1.el6.noarch python-qpid-qmf-0.10-7.el6.i686 qpid-cpp-client-0.10-4.el6.i686 qpid-cpp-client-devel-0.10-4.el6.i686 qpid-cpp-client-devel-docs-0.10-4.el6.noarch qpid-cpp-client-rdma-0.10-4.el6.i686 qpid-cpp-client-ssl-0.10-4.el6.i686 qpid-cpp-server-0.10-4.el6.i686 qpid-cpp-server-cluster-0.10-4.el6.i686 qpid-cpp-server-devel-0.10-4.el6.i686 qpid-cpp-server-rdma-0.10-4.el6.i686 qpid-cpp-server-ssl-0.10-4.el6.i686 qpid-cpp-server-store-0.10-4.el6.i686 qpid-cpp-server-xml-0.10-4.el6.i686 qpid-java-client-0.10-6.el6.noarch qpid-java-common-0.10-6.el6.noarch qpid-java-example-0.10-6.el6.noarch qpid-java-jca-0.10-6.el6.noarch qpid-qmf-0.10-7.el6.i686 qpid-qmf-devel-0.10-7.el6.i686 qpid-tests-0.10-1.el6.noarch qpid-tools-0.10-4.el6.noarch rh-qpid-cpp-tests-0.10-4.el6.i686 ruby-qpid-0.7.946106-2.el6.i686 ruby-qpid-qmf-0.10-7.el6.i686 sesame-0.10-1.el6.i686 sesame-debuginfo-0.10-1.el6.i686 qpid: list Summary of Objects by Type: Package Class Active Deleted ============================================== com.redhat.grid scheduler 1 0 com.redhat.grid slot 1 0 com.redhat.grid master 1 0 com.redhat.grid submitter 1 0 com.redhat.grid negotiator 1 0 com.redhat.grid submission 1 0 com.redhat.grid jobserver 1 0 com.redhat.grid collector 1 0 qpid: list jobserver Object Summary: ID Created Destroyed Index ========================================================================== 106 12:56:06 - jobs.lab.eng.brq.redhat.com qpid: call 106 GetJobAd "246.0" qpid: OK (0) - {...} qpid: call 106 GetJobAd 246.0 qpid: invalid conversion: Variant is not a string; use asString() if conversion is required. (qpid/types/Variant.cpp:569) (7) - {} JobserverLog: 05/24/11 21:09:10 JobServerJobLogConsumer::NewClassAd processing _key='0.0' 05/24/11 21:09:10 JobServerJobLogConsumer::NewClassAd processing _key='246.0' 05/24/11 21:09:10 JobServerJobLogConsumer::NewClassAd processing _key='0246.-1' 05/24/11 21:09:10 Job::Job of '0246.-1' 05/24/11 21:09:10 LiveJobImpl created for '0246.-1' 05/24/11 21:09:10 warning: failed to lookup attribute JobStatus in job '0246.-1' 05/24/11 21:09:10 ProcessHistoryTimer() called 05/24/11 21:09:20 TimerHandler_JobLogPolling() called 05/24/11 21:09:20 === Current Probing Information === 05/24/11 21:09:20 fsize: 5535 mtime: 1305222134 05/24/11 21:09:20 first log entry: 4 CreationTimestamp 1301426975 05/24/11 21:09:30 TimerHandler_JobLogPolling() called 05/24/11 21:09:30 === Current Probing Information === 05/24/11 21:09:30 fsize: 5535 mtime: 1305222134 05/24/11 21:09:30 first log entry: 4 CreationTimestamp 1301426975 05/24/11 21:09:40 TimerHandler_JobLogPolling() called 05/24/11 21:09:40 === Current Probing Information === 05/24/11 21:09:40 fsize: 5535 mtime: 1305222134 05/24/11 21:09:40 first log entry: 4 CreationTimestamp 1301426975 05/24/11 21:09:50 TimerHandler_JobLogPolling() called 05/24/11 21:09:50 === Current Probing Information === 05/24/11 21:09:50 fsize: 5535 mtime: 1305222134 05/24/11 21:09:50 first log entry: 4 CreationTimestamp 1301426975 05/24/11 21:10:00 TimerHandler_JobLogPolling() called 05/24/11 21:10:00 === Current Probing Information === 05/24/11 21:10:00 fsize: 5535 mtime: 1305222134 05/24/11 21:10:00 first log entry: 4 CreationTimestamp 1301426975 Original issue with calling jobserver methods solved. New bugzilla was created for an issue from comment 7. >>> VERIFIED