Bug 732821 - negotiator updates user priorities with one transaction per user
Summary: negotiator updates user priorities with one transaction per user
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.3
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: 2.1
: ---
Assignee: Erik Erlandson
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 743350
TreeView+ depends on / blocked
 
Reported: 2011-08-23 18:19 UTC by Erik Erlandson
Modified: 2012-02-08 10:14 UTC (History)
6 users (show)

Fixed In Version: condor-7.6.4-0.1
Doc Type: Bug Fix
Doc Text:
Previously, the Negotiator daemon updated the accountant with more fsync() operations than required for maintaining transactions. Consequently, operations involving large numbers of submitters resulted in a performance hit. With this update, the use of fsync() calls has been minimized in the accountant transactions and performance of Negotiator on pools with large numbers of submitter records improved significantly.
Clone Of:
Environment:
Last Closed: 2012-01-23 17:28:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2012:0045 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update 2012-01-23 22:22:58 UTC

Description Erik Erlandson 2011-08-23 18:19:49 UTC
Description of problem:

From upstream:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2342

2011-Aug-16 16:41:04 by danb:
We found that there is an additional fsync-related performance problem, this time in Accountant::CheckMatches(). This took over an hour in a case where a large number of machines transitioned from claimed to unclaimed.

I assert that AddMatch() and RemoveMatch() should commit "nondurable" transactions, meaning no fsync. This would solve the problem in CheckMatches() and it would also avoid any similar problem during matchmaking when large numbers of new matches are created. In the current model, the match records do not need to be durable, because in the event of a negotiator crash, any discrepancies between the match records and the advertised state of the startds will be corrected in CheckMatches() itself.

Additional info:
This was fixed upstream 7.6.3, and is in our builds as of 7.6.4-0.1
This bz is primarily for tracking purposes.

Comment 4 Erik Erlandson 2011-10-04 17:13:39 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: 
The negotiator updates accountant with more fsync() operations than required for maintaining transactions.

Consequence:
Operations involving large numbers of submitters incur a performance hit.

Fix:
The use of fsync() was minimized in the accountant transactions.

Result:
Improved negotiator performance on pools with large numbers of submitter records.

Comment 6 Tomas Rusnak 2011-10-05 15:01:26 UTC
Reproduced with 1000 users: 
             | users | fsyncs | condor
------------------------------------------------
RHEL5/x86_64 |  1000 | 2036   | RH-7.6.3-0.3.el5


Verification:
             | users | fsyncs | condor
------------------------------------------------
RHEL6/x86_64 |   100 |  100   | RH-7.6.4-0.6.el6
             |   200 |  204   |
             |   300 |  304   |
             |   400 |  409   |
             |   500 |  512   |
             |   600 |  616   |
             |   700 |  724   |
             |   800 |  832   |
             |   900 |  941   |
             |  1000 | 1017   |

.... other platforms without scaling info:

RHEL5/x86_64 |  1000 | 1023   |
RHEL6/x86    |  1000 | 1019   |
RHEL5/x86    |  1000 | 1020   |

Fsyncs over 1000 are coming from negotiators collector update cycle.

>>> VERIFIED

Comment 7 Tomas Capek 2011-11-16 16:11:23 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause: 
+Previously, the Negotiator daemon updated the accountant with more fsync() operations than required for maintaining transactions. Consequently, operations involving large numbers of submitters resulted in a performance hit. With this update, the use of fsync() calls has been minimized in the accountant transactions and performance of Negotiator on pools with large numbers of submitter records improved significantly.-The negotiator updates accountant with more fsync() operations than required for maintaining transactions.
-
-Consequence:
-Operations involving large numbers of submitters incur a performance hit.
-
-Fix:
-The use of fsync() was minimized in the accountant transactions.
-
-Result:
-Improved negotiator performance on pools with large numbers of submitter records.

Comment 8 errata-xmlrpc 2012-01-23 17:28:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html


Note You need to log in before you can comment on or make changes to this bug.