Bug 732797
| Summary: | performance issue - n^2 fsync()ing algorithm in dedicated scheduler | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Timothy St. Clair <tstclair> |
| Component: | condor | Assignee: | Timothy St. Clair <tstclair> |
| Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 2.0 | CC: | dahorak, ltoscano, matt, mkudlej, trusnak, tstclair |
| Target Milestone: | 2.1 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | condor-7.6.4-0.2 | Doc Type: | Bug Fix |
| Doc Text: |
Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower then expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-01-23 17:28:10 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 743350 | ||
|
Description
Timothy St. Clair
2011-08-23 16:26:21 UTC
This can be tested with an strace -c on the condor_schedd and a parallel universe job containing many procs (queue 25). Look for the number of fsync calls.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
C: Code analysis revealed sub-optimal configuration in the dedicated scheduler
C: Slower then expected performance
F: Change fsync algo for dedicated scheduler
R: Increased performance
Reproduced on: $CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $ $CondorPlatform: X86_64-Redhat_5.6 $ # cat dedicated.job universe = parallel cmd = /bin/sleep args = 1 should_transfer_files = if_needed when_to_transfer_output = on_exit machine_count=1 queue 25 Config: DedicatedScheduler = "DedicatedScheduler@localhost" STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler # strace -c % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 82.95 1.244577 47 26458 fsync
RHEL6_64:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
10.61 0.013997 46 302 fsync
RHEL5_64:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
11.88 0.002997 42 72 fsync
RHEL6_32:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.59 0.012998 72 180 fsync
RHEL5/32:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.95 0.003996 78 51 fsync
Performance issue seems to be resolved. There are small differences between platforms in number of fsync calls, but all in acceptable level without performance hit.
>>> VERIFIED
Verification done on condor-7.6.4-0.8.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1,4 +1 @@
-C: Code analysis revealed sub-optimal configuration in the dedicated scheduler
+Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower then expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.-C: Slower then expected performance
-F: Change fsync algo for dedicated scheduler
-R: Increased performance
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html |