Bug 773680
Summary: | Released job doesn't start | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Stanislav Graf <sgraf> |
Component: | condor-qmf | Assignee: | Pete MacKinnon <pmackinn> |
Status: | CLOSED ERRATA | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | Development | CC: | iboverma, jneedle, ltoscano, matt, pmackinn, tstclair |
Target Milestone: | 2.1.1 | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | condor-7.6.5-0.12 | Doc Type: | Bug Fix |
Doc Text: |
Cause: Hold of a job using the Aviary or QMF job control API.
Consequence: condor_q, Aviary and QMF API call to check job status indicates that the job remains marked as IDLE after release, despite being restarted by the scheduler.
Fix: The condor_schedd code that represents an internal API for use by Aviary and QMF implementations was updated to ensure that the held job's state was correctly adjusted.
Result: Once job is held using Aviary or QMF API, condor_q, Aviary and QMF API call to check job status indicates correct job transition of HELD->IDLE->RUNNING after release.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-02-06 18:19:14 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 739658, 765607 |
Description
Stanislav Graf
2012-01-12 15:08:16 UTC
The released job does actually run but its job state has not been correctly set back to RUNNING for some reason. The root of the problem is that there has been some changes that has affected Scheduler::holdJobRaw resulting in a regression. The corrective action is for the QMF & Aviary layers to call an additional scheduler.enqueueActOnJobMyself(id,JA_HOLD_JOBS,true) after the holdJob transaction. VERIFY QMF part: # rpm -q condor-aviary package condor-aviary is not installed RHEL5 i386 condor-qmf-7.6.5-0.12.el5.i386 cumin-0.1.5184-1.el5.noarch RHEL5 x86_64 condor-qmf-7.6.5-0.12.el5.x86_64 cumin-0.1.5184-1.el5.noarch RHEL6 i386 condor-qmf-7.6.5-0.12.el6.i686 cumin-0.1.5184-1.el6.noarch RHEL6 x86_64 condor-qmf-7.6.5-0.12.el6.x86_64 cumin-0.1.5184-1.el6.noarch ---CUMIN PART--- -Grid::Submission::Submit job (aaa, /bin/sleep 3600, true, /tmp) -Grid::Submission::aaa -Wait until job status is "Running" -verify status also with condor_q - R -Select job and click on "Hold" -Wait until job status is "Held" -verify status also with condor_q - H -click on "Release" -Wait until job status is "Running" -verify status also with condor_q - R ---CONDOR PART--- -condor_hold -Wait until job status is "Held" in cumin -verify status also with condor_q - H -condor_release -Wait until job status is "Running" in cumin -verify status also with condor_q - R ---CUMIN PART--- -click on "Remove" -You should be now in Grid::Submission -Wait until job disappears -verify also with condor_q VERIFY AVIARY part: RHEL5 i386 condor-aviary-7.6.5-0.12.el5.i386 condor-qmf-7.6.5-0.12.el5.i386 cumin-0.1.5184-1.el5.noarch RHEL5 x86_64 condor-aviary-7.6.5-0.12.el5.x86_64 condor-qmf-7.6.5-0.12.el5.x86_64 cumin-0.1.5184-1.el5.noarch RHEL6 i386 condor-aviary-7.6.5-0.12.el6.i686 condor-qmf-7.6.5-0.12.el6.i686 cumin-0.1.5184-1.el6.noarch RHEL6 x86_64 condor-aviary-7.6.5-0.12.el6.x86_64 condor-qmf-7.6.5-0.12.el6.x86_64 cumin-0.1.5184-1.el6.noarch Add to cumin.conf: aviary-job-servers: http://localhost:9090 aviary-query-servers: http://localhost:9091 aviary-suds-logs: True log-level: debug # grep Aviary /var/log/cumin/web.log DEBUG AviaryOperations: suds logging on INFO AviaryOperations: no root certificate file specified, using client validation only for ssl connections. INFO Enabled Aviary interface for job submission and control. INFO Enabled Aviary interface for query operations. ---CUMIN PART--- -Grid::Submission::Submit job (aaa, /bin/sleep 3600, true, /tmp) -Grid::Submission::aaa -Wait until job status is "Running" -verify status also with condor_q - R -Select job and click on "Hold" -Wait until job status is "Held" -verify status also with condor_q - H -click on "Release" -Wait until job status is "Running" -verify status also with condor_q - R ---CONDOR PART--- -condor_hold -Wait until job status is "Held" in cumin -verify status also with condor_q - H -condor_release -Wait until job status is "Running" in cumin -verify status also with condor_q - R ---CUMIN PART--- -click on "Remove" -Now I hit Bug 783139 (because I used the same name for the job as in QMF test) -When I use different name, the test passes. -You should be now in Grid::Submission -Wait until job disappears -verify also with condor_q Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Hold of a job using the Aviary or QMF job control API. Consequence: condor_q, Aviary and QMF API call to check job status indicates that the job remains marked as IDLE after release, despite being restarted by the scheduler. Fix: The condor_schedd code that represents an internal API for use by Aviary and QMF implementations was updated to ensure that the held job's state was correctly adjusted. Result: Once job is held using Aviary or QMF API, condor_q, Aviary and QMF API call to check job status indicates correct job transition of HELD->IDLE->RUNNING after release. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0100.html |