Description of problem: From upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2342 2011-Aug-16 16:41:04 by danb: We found that there is an additional fsync-related performance problem, this time in Accountant::CheckMatches(). This took over an hour in a case where a large number of machines transitioned from claimed to unclaimed. I assert that AddMatch() and RemoveMatch() should commit "nondurable" transactions, meaning no fsync. This would solve the problem in CheckMatches() and it would also avoid any similar problem during matchmaking when large numbers of new matches are created. In the current model, the match records do not need to be durable, because in the event of a negotiator crash, any discrepancies between the match records and the advertised state of the startds will be corrected in CheckMatches() itself. Additional info: This was fixed upstream 7.6.3, and is in our builds as of 7.6.4-0.1 This bz is primarily for tracking purposes.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: The negotiator updates accountant with more fsync() operations than required for maintaining transactions. Consequence: Operations involving large numbers of submitters incur a performance hit. Fix: The use of fsync() was minimized in the accountant transactions. Result: Improved negotiator performance on pools with large numbers of submitter records.
Reproduced with 1000 users: | users | fsyncs | condor ------------------------------------------------ RHEL5/x86_64 | 1000 | 2036 | RH-7.6.3-0.3.el5 Verification: | users | fsyncs | condor ------------------------------------------------ RHEL6/x86_64 | 100 | 100 | RH-7.6.4-0.6.el6 | 200 | 204 | | 300 | 304 | | 400 | 409 | | 500 | 512 | | 600 | 616 | | 700 | 724 | | 800 | 832 | | 900 | 941 | | 1000 | 1017 | .... other platforms without scaling info: RHEL5/x86_64 | 1000 | 1023 | RHEL6/x86 | 1000 | 1019 | RHEL5/x86 | 1000 | 1020 | Fsyncs over 1000 are coming from negotiators collector update cycle. >>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,11 +1 @@ -Cause: +Previously, the Negotiator daemon updated the accountant with more fsync() operations than required for maintaining transactions. Consequently, operations involving large numbers of submitters resulted in a performance hit. With this update, the use of fsync() calls has been minimized in the accountant transactions and performance of Negotiator on pools with large numbers of submitter records improved significantly.-The negotiator updates accountant with more fsync() operations than required for maintaining transactions. - -Consequence: -Operations involving large numbers of submitters incur a performance hit. - -Fix: -The use of fsync() was minimized in the accountant transactions. - -Result: -Improved negotiator performance on pools with large numbers of submitter records.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html