Bug 732797

Summary: performance issue - n^2 fsync()ing algorithm in dedicated scheduler
Product: Red Hat Enterprise MRG Reporter: Timothy St. Clair <tstclair>
Component: condorAssignee: Timothy St. Clair <tstclair>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: dahorak, ltoscano, matt, mkudlej, trusnak, tstclair
Target Milestone: 2.1   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: condor-7.6.4-0.2 Doc Type: Bug Fix
Doc Text:
Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower then expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 17:28:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743350    

Description Timothy St. Clair 2011-08-23 16:26:21 UTC
Description of problem:
performance issue - n^2 fsync()ing algorithm in dedicated scheduler

Version-Release number of selected component (if applicable):
2.0

How reproducible:
100%

Details found in Upstream Tracking:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2367

Comment 2 Matthew Farrellee 2011-08-29 11:33:39 UTC
This can be tested with an strace -c on the condor_schedd and a parallel universe job containing many procs (queue 25). Look for the number of fsync calls.

Comment 3 Timothy St. Clair 2011-09-23 20:41:33 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Code analysis revealed sub-optimal configuration in the dedicated scheduler  
C: Slower then expected performance
F: Change fsync algo for dedicated scheduler
R: Increased performance

Comment 5 Tomas Rusnak 2011-10-20 10:47:22 UTC
Reproduced on:

$CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $
$CondorPlatform: X86_64-Redhat_5.6 $

# cat dedicated.job 
universe = parallel
cmd = /bin/sleep
args = 1
should_transfer_files = if_needed
when_to_transfer_output = on_exit
machine_count=1
queue 25

Config:
DedicatedScheduler = "DedicatedScheduler@localhost"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler


# strace -c
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 82.95    1.244577          47     26458           fsync

Comment 6 Tomas Rusnak 2011-10-20 15:09:44 UTC
RHEL6_64:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 10.61    0.013997          46       302           fsync

RHEL5_64:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 11.88    0.002997          42        72           fsync


RHEL6_32:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.59    0.012998          72       180           fsync

RHEL5/32:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.95    0.003996          78        51           fsync


Performance issue seems to be resolved. There are small differences between platforms in number of fsync calls, but all in acceptable level without performance hit.

>>> VERIFIED

Comment 7 Tomas Rusnak 2011-10-20 15:13:23 UTC
Verification done on condor-7.6.4-0.8.

Comment 8 Tomas Capek 2011-11-16 16:14:53 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-C: Code analysis revealed sub-optimal configuration in the dedicated scheduler  
+Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower then expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.-C: Slower then expected performance
-F: Change fsync algo for dedicated scheduler
-R: Increased performance

Comment 9 errata-xmlrpc 2012-01-23 17:28:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html