Bug 732821

Summary: negotiator updates user priorities with one transaction per user
Product: Red Hat Enterprise MRG Reporter: Erik Erlandson <eerlands>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: low Docs Contact:
Priority: high    
Version: 1.3CC: eerlands, ltoscano, matt, mkudlej, trusnak, tstclair
Target Milestone: 2.1Keywords: Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: condor-7.6.4-0.1 Doc Type: Bug Fix
Doc Text:
Previously, the Negotiator daemon updated the accountant with more fsync() operations than required for maintaining transactions. Consequently, operations involving large numbers of submitters resulted in a performance hit. With this update, the use of fsync() calls has been minimized in the accountant transactions and performance of Negotiator on pools with large numbers of submitter records improved significantly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 17:28:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743350    

Description Erik Erlandson 2011-08-23 18:19:49 UTC
Description of problem:

From upstream:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2342

2011-Aug-16 16:41:04 by danb:
We found that there is an additional fsync-related performance problem, this time in Accountant::CheckMatches(). This took over an hour in a case where a large number of machines transitioned from claimed to unclaimed.

I assert that AddMatch() and RemoveMatch() should commit "nondurable" transactions, meaning no fsync. This would solve the problem in CheckMatches() and it would also avoid any similar problem during matchmaking when large numbers of new matches are created. In the current model, the match records do not need to be durable, because in the event of a negotiator crash, any discrepancies between the match records and the advertised state of the startds will be corrected in CheckMatches() itself.

Additional info:
This was fixed upstream 7.6.3, and is in our builds as of 7.6.4-0.1
This bz is primarily for tracking purposes.

Comment 4 Erik Erlandson 2011-10-04 17:13:39 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: 
The negotiator updates accountant with more fsync() operations than required for maintaining transactions.

Consequence:
Operations involving large numbers of submitters incur a performance hit.

Fix:
The use of fsync() was minimized in the accountant transactions.

Result:
Improved negotiator performance on pools with large numbers of submitter records.

Comment 6 Tomas Rusnak 2011-10-05 15:01:26 UTC
Reproduced with 1000 users: 
             | users | fsyncs | condor
------------------------------------------------
RHEL5/x86_64 |  1000 | 2036   | RH-7.6.3-0.3.el5


Verification:
             | users | fsyncs | condor
------------------------------------------------
RHEL6/x86_64 |   100 |  100   | RH-7.6.4-0.6.el6
             |   200 |  204   |
             |   300 |  304   |
             |   400 |  409   |
             |   500 |  512   |
             |   600 |  616   |
             |   700 |  724   |
             |   800 |  832   |
             |   900 |  941   |
             |  1000 | 1017   |

.... other platforms without scaling info:

RHEL5/x86_64 |  1000 | 1023   |
RHEL6/x86    |  1000 | 1019   |
RHEL5/x86    |  1000 | 1020   |

Fsyncs over 1000 are coming from negotiators collector update cycle.

>>> VERIFIED

Comment 7 Tomas Capek 2011-11-16 16:11:23 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause: 
+Previously, the Negotiator daemon updated the accountant with more fsync() operations than required for maintaining transactions. Consequently, operations involving large numbers of submitters resulted in a performance hit. With this update, the use of fsync() calls has been minimized in the accountant transactions and performance of Negotiator on pools with large numbers of submitter records improved significantly.-The negotiator updates accountant with more fsync() operations than required for maintaining transactions.
-
-Consequence:
-Operations involving large numbers of submitters incur a performance hit.
-
-Fix:
-The use of fsync() was minimized in the accountant transactions.
-
-Result:
-Improved negotiator performance on pools with large numbers of submitter records.

Comment 8 errata-xmlrpc 2012-01-23 17:28:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html