Bug 602637 - Schedd crashes after submitting job over QMF
Summary: Schedd crashes after submitting job over QMF
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-qmf
Version: Development
Hardware: All
OS: Linux
high
high
Target Milestone: 1.3
: ---
Assignee: Matthew Farrellee
QA Contact: Lubos Trilety
URL:
Whiteboard:
: 602630 (view as bug list)
Depends On:
Blocks: 517190
TreeView+ depends on / blocked
 
Reported: 2010-06-10 11:22 UTC by Martin Kudlej
Modified: 2011-03-17 18:16 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-10-21 18:44:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log files and condor_config.local (15.10 KB, application/x-gzip)
2010-06-10 11:22 UTC, Martin Kudlej
no flags Details

Description Martin Kudlej 2010-06-10 11:22:47 UTC
Created attachment 422871 [details]
log files and condor_config.local

Description of problem:
Scheduler crashes after submitting job over QMF.

Version-Release number of selected component (if applicable):
condor-7.4.3-0.17.el5
condor-qmf-7.4.3-0.17.el5

How reproducible:
100%

Steps to Reproduce:
1. set up codnor qmf
2. submit job over qmf
3. watch crash in scheduler log file
  
Actual results:
Scheduler crashes.

Expected results:
Scheduler will not crash.

Additional info:
Schedd log:
06/10 05:52:46 (pid:18589) ERROR "Assertion ERROR on (!active_transaction)" at line 364 in file classad_log.cpp
Stack dump for process 18589 at timestamp 1276163566 (12 frames)
condor_schedd(dprintf_dump_stack+0x44)[0x817d344]
condor_schedd[0x817f0a4]
[0x740420]
condor_schedd(_EXCEPT_+0x93)[0x817d213]
condor_schedd(_ZN10ClassAdLog16BeginTransactionEv+0x41)[0x81caf21]
condor_schedd(_Z16BeginTransactionv+0x11)[0x8114411]
condor_schedd(_ZN9Scheduler10sendAlivesEv+0x1e)[0x80f86be]
condor_schedd(_ZN12TimerManager7TimeoutEv+0x14b)[0x817c8ab]
condor_schedd(_ZN10DaemonCore6DriverEv+0x244)[0x8163a84]
condor_schedd(main+0xd80)[0x8177290]
/lib/libc.so.6(__libc_start_main+0xdc)[0xb9de9c]
condor_schedd[0x80e4601]

Submit example:
from sys import exit
from qmf.console import Session

UNIVERSE = {"VANILLA": 5, "SCHEDULER": 7, "GRID": 9, "JAVA": 10, "PARALLEL": 11, "LOCAL": 12, "VM": 13}

__annotations__ = {"Requirements": "com.redhat.mrg.grid.Expression"}
ad = {"Cmd":          "/bin/sleep",
      "Args":         "120",
      "Requirements": "TRUE",
      "JobUniverse":  UNIVERSE["VANILLA"],
      "Iwd":          "/tmp",
      "Owner":        "nobody",
      "!!descriptors": __annotations__
}

session = Session();
session.addBroker()
schedulers = session.getObjects(_class="scheduler", _package="com.redhat.grid")
result = schedulers[0].SubmitJob(ad)

Comment 1 Matthew Farrellee 2010-06-10 13:54:33 UTC
*** Bug 602630 has been marked as a duplicate of this bug. ***

Comment 2 Matthew Farrellee 2010-06-10 14:05:55 UTC
Please re-test with 0.18.

Comment 3 Martin Kudlej 2010-06-10 14:19:52 UTC
I've retested this bug with 
condor-7.4.3-0.18.el5
condor-qmf-7.4.3-0.18.el5

and it has crashed.
It looks different but something is still there:

Stack dump for process 29067 at timestamp 1276179443 (20 frames)
condor_schedd(dprintf_dump_stack+0x44)[0x817d4a4]
condor_schedd[0x817f204]
[0x568420]
/usr/lib/libqpidcommon.so.2(_ZNK4qpid5types7Variant9isEqualToERKS1_+0x28)[0xfe7408]
/usr/lib/libqpidcommon.so.2(_ZN4qpid5typeseqERKNS0_7VariantES3_+0x24)[0xfe7444]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_Z24PopulateAdFromVariantMapRSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS2_EEER7ClassAd+0x1ef)[0x1a175f]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3com6redhat4grid15SchedulerObject6SubmitERSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS6_EEERSsSF_+0xb1)[0x1ab9b1]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3com6redhat4grid15SchedulerObject16ManagementMethodEjRN4qpid10management4ArgsERSs+0x255)[0x1ac5b5]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3qmf3com6redhat4grid9Scheduler8doMethodERSsRKSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS8_EEERSF_+0x480)[0x1a60e0]
/usr/lib/libqmf.so.1(_ZN4qpid10management19ManagementAgentImpl19invokeMethodRequestERKSsS3_S3_+0x1870)[0x2608d0]
/usr/lib/libqmf.so.1(_ZN4qpid10management19ManagementAgentImpl13pollCallbacksEj+0xdb)[0x26140b]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3com6redhat4grid16MgmtScheddPlugin16HandleMgmtSocketEP7ServiceP6Stream+0x23)[0x1bffd3]
condor_schedd(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x834)[0x816b144]
condor_schedd(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x22)[0x816b422]
condor_schedd(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x40)[0x8204470]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x130)[0x8163200]
condor_schedd(_ZN10DaemonCore6DriverEv+0x1f66)[0x81658f6]
condor_schedd(main+0xd80)[0x81773e0]
/lib/libc.so.6(__libc_start_main+0xdc)[0xb9de9c]
condor_schedd[0x80e46a1]

Comment 4 Matthew Farrellee 2010-06-10 14:27:40 UTC
That is a different bug, one which will be resolved in 7.4.3-0.19. You can open a BZ for it. I'm going to close this one as MODI for 7.4.3-0.18.

Comment 5 Lubos Trilety 2010-08-03 11:50:56 UTC
Tested with (version):
condor-7.4.4-0.4

Tested this issue on RHEL 4.8/5.5 x i386/x86_64 and it works without any
exception

>>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.