Bug 732797 - performance issue - n^2 fsync()ing algorithm in dedicated scheduler
Summary: performance issue - n^2 fsync()ing algorithm in dedicated scheduler
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.0
Hardware: All
OS: All
medium
medium
Target Milestone: 2.1
: ---
Assignee: Timothy St. Clair
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 743350
TreeView+ depends on / blocked
 
Reported: 2011-08-23 16:26 UTC by Timothy St. Clair
Modified: 2012-06-14 10:36 UTC (History)
6 users (show)

Fixed In Version: condor-7.6.4-0.2
Doc Type: Bug Fix
Doc Text:
Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower then expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.
Clone Of:
Environment:
Last Closed: 2012-01-23 17:28:10 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2012:0045 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update 2012-01-23 22:22:58 UTC
Condor 2367 None None None 2012-06-14 10:36:18 UTC

Description Timothy St. Clair 2011-08-23 16:26:21 UTC
Description of problem:
performance issue - n^2 fsync()ing algorithm in dedicated scheduler

Version-Release number of selected component (if applicable):
2.0

How reproducible:
100%

Details found in Upstream Tracking:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2367

Comment 2 Matthew Farrellee 2011-08-29 11:33:39 UTC
This can be tested with an strace -c on the condor_schedd and a parallel universe job containing many procs (queue 25). Look for the number of fsync calls.

Comment 3 Timothy St. Clair 2011-09-23 20:41:33 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Code analysis revealed sub-optimal configuration in the dedicated scheduler  
C: Slower then expected performance
F: Change fsync algo for dedicated scheduler
R: Increased performance

Comment 5 Tomas Rusnak 2011-10-20 10:47:22 UTC
Reproduced on:

$CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $
$CondorPlatform: X86_64-Redhat_5.6 $

# cat dedicated.job 
universe = parallel
cmd = /bin/sleep
args = 1
should_transfer_files = if_needed
when_to_transfer_output = on_exit
machine_count=1
queue 25

Config:
DedicatedScheduler = "DedicatedScheduler@localhost"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler


# strace -c
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 82.95    1.244577          47     26458           fsync

Comment 6 Tomas Rusnak 2011-10-20 15:09:44 UTC
RHEL6_64:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 10.61    0.013997          46       302           fsync

RHEL5_64:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 11.88    0.002997          42        72           fsync


RHEL6_32:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.59    0.012998          72       180           fsync

RHEL5/32:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.95    0.003996          78        51           fsync


Performance issue seems to be resolved. There are small differences between platforms in number of fsync calls, but all in acceptable level without performance hit.

>>> VERIFIED

Comment 7 Tomas Rusnak 2011-10-20 15:13:23 UTC
Verification done on condor-7.6.4-0.8.

Comment 8 Tomas Capek 2011-11-16 16:14:53 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-C: Code analysis revealed sub-optimal configuration in the dedicated scheduler  
+Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower then expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.-C: Slower then expected performance
-F: Change fsync algo for dedicated scheduler
-R: Increased performance

Comment 9 errata-xmlrpc 2012-01-23 17:28:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html


Note You need to log in before you can comment on or make changes to this bug.