Description of problem: performance issue - n^2 fsync()ing algorithm in dedicated scheduler Version-Release number of selected component (if applicable): 2.0 How reproducible: 100% Details found in Upstream Tracking: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2367
This can be tested with an strace -c on the condor_schedd and a parallel universe job containing many procs (queue 25). Look for the number of fsync calls.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: Code analysis revealed sub-optimal configuration in the dedicated scheduler C: Slower then expected performance F: Change fsync algo for dedicated scheduler R: Increased performance
Reproduced on: $CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $ $CondorPlatform: X86_64-Redhat_5.6 $ # cat dedicated.job universe = parallel cmd = /bin/sleep args = 1 should_transfer_files = if_needed when_to_transfer_output = on_exit machine_count=1 queue 25 Config: DedicatedScheduler = "DedicatedScheduler@localhost" STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler # strace -c % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 82.95 1.244577 47 26458 fsync
RHEL6_64: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 10.61 0.013997 46 302 fsync RHEL5_64: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 11.88 0.002997 42 72 fsync RHEL6_32: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 32.59 0.012998 72 180 fsync RHEL5/32: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 32.95 0.003996 78 51 fsync Performance issue seems to be resolved. There are small differences between platforms in number of fsync calls, but all in acceptable level without performance hit. >>> VERIFIED
Verification done on condor-7.6.4-0.8.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1 @@ -C: Code analysis revealed sub-optimal configuration in the dedicated scheduler +Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower then expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.-C: Slower then expected performance -F: Change fsync algo for dedicated scheduler -R: Increased performance
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html