Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 718265 - low-latency not expiring work
low-latency not expiring work
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-low-latency (Show other bugs)
1.3
Unspecified Unspecified
medium Severity high
: 2.0.1
: ---
Assigned To: Robert Rati
Lubos Trilety
:
Depends On: 723971
Blocks: 723887
  Show dependency treegraph
 
Reported: 2011-07-01 11:42 EDT by Robert Rati
Modified: 2011-09-07 12:43 EDT (History)
4 users (show)

See Also:
Fixed In Version: condor-low-latency-1.2-1
Doc Type: Bug Fix
Doc Text:
C: A job that causes the condor_starter to exit quickly, such as a job where the starter is unable to execute the program in the job, the low-latency daemon will not expire the low-latency job C: The slot running the job that should have been expired will not be allowed to do any more work by the low-latency daemon until the daemon has been restarted. F: Fixed issues with message expiration R: Messages are expired as expected
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-09-07 12:43:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1249 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update 2011-09-07 12:40:45 EDT

  None (edit)
Description Robert Rati 2011-07-01 11:42:18 EDT
Description of problem:
Running the file_no_perms.py test results in the started excepting w/o calling the exit hook.  Carod's lease checking thread isn't expiring a job that isn't being updated, so the result is that carod won't allow that thread to process more work.  The only fix is to restart carod.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Run carod with debug logging
2. run file_no_perms.py
3. Watch CaroLog, see that the slot is continually thought to be doing work.

  
Actual results:


Expected results:


Additional info:
Comment 1 Robert Rati 2011-07-01 15:26:04 EDT
There were 2 issues with message expiration:
1) The messages were never being expired because the check for a slot being in use was resetting the access time
2) Once a message was expired, it was unable to be removed from the work queue thus keeping a slot "busy" that was actually empty

Fixed on BZ718265-no-expiration
Comment 2 Robert Rati 2011-07-01 16:39:31 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: A job that causes the condor_starter to exit quickly, such as a job where the starter is unable to execute the program in the job, the low-latency daemon will not expire the low-latency job
C: The slot running the job that should have been expired will not be allowed to do any more work by the low-latency daemon until the daemon has been restarted.
F: Fixed issues with message expiration
R: Messages are expired as expected
Comment 3 Martin Kudlej 2011-07-21 06:33:45 EDT
Tested on RHEL5.6/6.1 x x86_64/i386 with condor-low-latency-1.1-3 and it doesn't work.
Comment 5 Lubos Trilety 2011-08-02 07:29:15 EDT
Job expired after some time and slot was released.
(There is an issue with exit hook, see bug 726761.)

Tested with:
condor-low-latency-1.2-2

Tested on:
RHEL6 x86_64,i386
RHEL5 x86_64,i386

>>> VERIFIED
Comment 6 errata-xmlrpc 2011-09-07 12:43:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html

Note You need to log in before you can comment on or make changes to this bug.