Bug 603033

Summary: Schedd crashes when job is submitted over QMF
Product: Red Hat Enterprise MRG Reporter: Martin Kudlej <mkudlej>
Component: gridAssignee: Matthew Farrellee <matt>
Status: CLOSED ERRATA QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: matt
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kudlej 2010-06-11 11:18:36 UTC
Description of problem:
Scheduler crashes after submitting job over QMF.

Version-Release number of selected component (if applicable):
condor-7.4.3-0.18.el5
condor-qmf-7.4.3-0.18.el5

How reproducible:
100%

Steps to Reproduce:
1. set up codnor qmf
2. submit job over qmf
3. watch crash in scheduler log file

Actual results:
Scheduler crashes.

Expected results:
Scheduler will not crash.

Additional info:

Submit example:
from sys import exit
from qmf.console import Session

UNIVERSE = {"VANILLA": 5, "SCHEDULER": 7, "GRID": 9, "JAVA": 10, "PARALLEL":
11, "LOCAL": 12, "VM": 13}

__annotations__ = {"Requirements": "com.redhat.mrg.grid.Expression"}
ad = {"Cmd":          "/bin/sleep",
      "Args":         "120",
      "Requirements": "TRUE",
      "JobUniverse":  UNIVERSE["VANILLA"],
      "Iwd":          "/tmp",
      "Owner":        "nobody",
      "!!descriptors": __annotations__
}

session = Session();
session.addBroker()
schedulers = session.getObjects(_class="scheduler", _package="com.redhat.grid")
result = schedulers[0].SubmitJob(ad)    

Schedd log:
Stack dump for process 29067 at timestamp 1276179443 (20 frames)
condor_schedd(dprintf_dump_stack+0x44)[0x817d4a4]
condor_schedd[0x817f204]
[0x568420]
/usr/lib/libqpidcommon.so.2(_ZNK4qpid5types7Variant9isEqualToERKS1_+0x28)[0xfe7408]
/usr/lib/libqpidcommon.so.2(_ZN4qpid5typeseqERKNS0_7VariantES3_+0x24)[0xfe7444]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_Z24PopulateAdFromVariantMapRSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS2_EEER7ClassAd+0x1ef)[0x1a175f]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3com6redhat4grid15SchedulerObject6SubmitERSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS6_EEERSsSF_+0xb1)[0x1ab9b1]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3com6redhat4grid15SchedulerObject16ManagementMethodEjRN4qpid10management4ArgsERSs+0x255)[0x1ac5b5]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3qmf3com6redhat4grid9Scheduler8doMethodERSsRKSt3mapISsN4qpid5types7VariantESt4lessISsESaISt4pairIKSsS8_EEERSF_+0x480)[0x1a60e0]
/usr/lib/libqmf.so.1(_ZN4qpid10management19ManagementAgentImpl19invokeMethodRequestERKSsS3_S3_+0x1870)[0x2608d0]
/usr/lib/libqmf.so.1(_ZN4qpid10management19ManagementAgentImpl13pollCallbacksEj+0xdb)[0x26140b]
/usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so(_ZN3com6redhat4grid16MgmtScheddPlugin16HandleMgmtSocketEP7ServiceP6Stream+0x23)[0x1bffd3]
condor_schedd(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x834)[0x816b144]
condor_schedd(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x22)[0x816b422]
condor_schedd(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x40)[0x8204470]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x130)[0x8163200]
condor_schedd(_ZN10DaemonCore6DriverEv+0x1f66)[0x81658f6]
condor_schedd(main+0xd80)[0x81773e0]
/lib/libc.so.6(__libc_start_main+0xdc)[0xb9de9c]
condor_schedd[0x80e46a1]

Comment 1 Matthew Farrellee 2010-06-11 11:31:10 UTC
Fixed upstream, built post 7.4.3-0.18.

Comment 2 Martin Kudlej 2010-08-02 12:39:57 UTC
I've tested it on RHEL 4.8/5.5 x x86_64/i386 with more than 1000 jobs submitted over QMF and it works without exception in Schedlog. --> VERIFIED