Bug 232949
Summary: | Speed up gfs locking | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Robert Peterson <rpeterso> | ||||||
Component: | gfs | Assignee: | Robert Peterson <rpeterso> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | GFS Bugs <gfs-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4 | CC: | rkenna, swhiteho | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-08-14 17:09:03 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Robert Peterson
2007-03-19 16:20:59 UTC
Created attachment 150387 [details]
Proposed patch
Devel ACK - am okay with pulling this into cluster 4.5 release, although not required for beta. Adding Steve Whitehouse to the cc list, since he originated the patch in gfs2. Any estimations yet and amount of performance improvements? I haven't done any performance testing on it yet, sorry. Here are the results from some performance tests I did using bonnie++: Without the patch, single run after reboot: Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP roth-01.lab.msp. 4G 35710 95 87872 45 35820 20 36909 80 110930 24 435.4 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 20 1574 32 +++++ +++ 6731 80 1854 36 +++++ +++ 5989 79 roth-01.lab.msp.redhat.com,4G,35710,95,87872,45,35820,20,36909,80,110930,24,435.4,1,20,1574,32,+++++,+++,6731,80,1854,36,+++++,+++,5989,79 Without the patch, with -x 10: roth-01.lab.msp.redhat.com,4G,35160,95,88731,47,35879,20,37236,81,116018,26,417.5,1,20,1585,32,+++++,+++,6308,80,2445,41,+++++,+++,5726,79 roth-01.lab.msp.redhat.com,4G,35128,95,88989,49,35016,21,36651,80,107442,24,430.4,1,20,1985,38,+++++,+++,6865,82,1747,44,+++++,+++,6074,77 roth-01.lab.msp.redhat.com,4G,35073,95,88569,49,34956,21,36706,80,108503,24,430.4,1,20,2066,38,+++++,+++,7244,86,1828,46,+++++,+++,5643,81 roth-01.lab.msp.redhat.com,4G,36016,98,96918,53,34988,21,37038,81,112593,24,436.9,1,20,2053,38,+++++,+++,7081,85,2020,44,+++++,+++,5416,75 roth-01.lab.msp.redhat.com,4G,36033,98,101985,56,34899,21,37082,81,114778,26,430.0,1,20,2032,38,+++++,+++,6858,82,1974,44,+++++,+++,5813,84 roth-01.lab.msp.redhat.com,4G,35898,98,97938,53,35108,21,36810,80,110272,25,413.9,1,20,2065,37,+++++,+++,6905,84,2170,43,+++++,+++,5832,82 roth-01.lab.msp.redhat.com,4G,36205,98,93346,50,34789,20,37253,81,114968,25,430.0,1,20,2049,38,+++++,+++,6865,85,1359,29,+++++,+++,5512,76 roth-01.lab.msp.redhat.com,4G,35972,98,100949,56,34759,21,36741,80,114137,25,429.5,1,20,2059,37,+++++,+++,6666,81,2215,43,+++++,+++,5821,83 roth-01.lab.msp.redhat.com,4G,36202,98,90058,48,35516,21,36138,79,109441,24,431.5,1,20,2080,39,+++++,+++,6883,84,2037,45,+++++,+++,5941,79 roth-01.lab.msp.redhat.com,4G,36056,98,96134,52,35192,21,36116,79,109534,24,425.8,1,20,2069,39,+++++,+++,7087,86,2049,43,+++++,+++,5957,78 With the patch, without -x 10: Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP roth-01.lab.msp. 4G 35349 95 86573 45 36162 20 35778 78 109143 24 425.3 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 20 1597 32 +++++ +++ 7084 84 1597 38 +++++ +++ 6008 73 roth-01.lab.msp.redhat.com,4G,35349,95,86573,45,36162,20,35778,78,109143,24,425.3,1,20,1597,32,+++++,+++,7084,84,1597,38,+++++,+++,6008,73 With the patch, with -x 10: roth-01.lab.msp.redhat.com,4G,35356,96,89584,46,35753,20,36014,78,113505,25,422.4,1,20,1849,36,+++++,+++,6710,81,1658,37,+++++,+++,6229,76 roth-01.lab.msp.redhat.com,4G,34856,94,88234,50,35332,21,35787,78,111944,24,422.8,1,20,1579,32,+++++,+++,7021,84,2333,39,+++++,+++,6537,82 roth-01.lab.msp.redhat.com,4G,35041,95,85319,48,34878,20,35796,78,105722,24,434.7,1,20,1582,32,+++++,+++,6938,84,1187,27,+++++,+++,5866,72 roth-01.lab.msp.redhat.com,4G,31706,86,86605,48,34933,21,35770,78,106884,23,431.0,1,20,1735,34,+++++,+++,6411,82,1977,38,+++++,+++,5540,74 roth-01.lab.msp.redhat.com,4G,33728,92,87929,48,35178,21,35617,77,108876,24,423.2,1,20,1603,34,+++++,+++,6538,79,1906,44,+++++,+++,5798,80 roth-01.lab.msp.redhat.com,4G,34681,94,87859,50,35144,21,36090,78,106946,24,430.8,1,20,2065,39,+++++,+++,7993,75,1882,44,+++++,+++,5642,78 roth-01.lab.msp.redhat.com,4G,34729,94,87235,49,35103,21,36372,79,105901,23,414.4,1,20,2095,39,+++++,+++,6951,73,1878,44,+++++,+++,6379,81 roth-01.lab.msp.redhat.com,4G,34689,94,86105,47,35303,21,35838,78,108084,24,425.9,1,20,2073,40,+++++,+++,6795,85,2433,41,+++++,+++,6023,84 roth-01.lab.msp.redhat.com,4G,34695,94,88271,49,34924,20,35728,78,102107,22,435.5,1,20,2054,38,+++++,+++,6738,79,2359,41,+++++,+++,5871,81 roth-01.lab.msp.redhat.com,4G,34881,95,92651,52,34567,20,36073,79,105532,23,419.1,1,20,2070,38,+++++,+++,7070,86,2370,41,+++++,+++,5940,84 Created attachment 151291 [details]
Spreadsheet with results
I stuck the numbers into a spreadsheet to look at them a bit more closely. I
don't see any case where the difference is much greater than 1 stddev on one or
the other set of figures, so I'm not sure that its greatly significant one way
or the other.
Did you see the cpu usage go down between the two runs? You'll notice it most
with a lot of lock operations, so that one reason I was using postmark for my
testing.
Based on the previous post, I decided to run postmark to see if I saw improvements. Note: There are two runs for each scenario because I wanted to eliminate bias based on reading in the RGs the first time after mounting. Both runs were done after a fresh reboot of the system. Here are the results: Without fix: ---------------------------------------------------- pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 63 seconds total 1 seconds of transactions (500 per second) Files: 50244 created (797 per second) Creation alone: 50000 files (1020 per second) Mixed with transactions: 244 files (244 per second) 236 read (236 per second) 264 appended (264 per second) 50244 deleted (797 per second) Deletion alone: 49988 files (3845 per second) Mixed with transactions: 256 files (256 per second) Data: 1.22 megabytes read (19.82 kilobytes per second) 252.35 megabytes written (4.01 megabytes per second) pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 50 seconds total 1 seconds of transactions (500 per second) Files: 50244 created (1004 per second) Creation alone: 50000 files (1470 per second) Mixed with transactions: 244 files (244 per second) 236 read (236 per second) 264 appended (264 per second) 50244 deleted (1004 per second) Deletion alone: 49988 files (3124 per second) Mixed with transactions: 256 files (256 per second) Data: 1.22 megabytes read (24.97 kilobytes per second) 252.35 megabytes written (5.05 megabytes per second) pm> With fix: ------------------------------------------------------- pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 64 seconds total 1 seconds of transactions (500 per second) Files: 50244 created (785 per second) Creation alone: 50000 files (1020 per second) Mixed with transactions: 244 files (244 per second) 236 read (236 per second) 264 appended (264 per second) 50244 deleted (785 per second) Deletion alone: 49988 files (3570 per second) Mixed with transactions: 256 files (256 per second) Data: 1.22 megabytes read (19.51 kilobytes per second) 252.35 megabytes written (3.94 megabytes per second) pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 63 seconds total 1 seconds of transactions (500 per second) Files: 50244 created (797 per second) Creation alone: 50000 files (1063 per second) Mixed with transactions: 244 files (244 per second) 236 read (236 per second) 264 appended (264 per second) 50244 deleted (797 per second) Deletion alone: 49988 files (3332 per second) Mixed with transactions: 256 files (256 per second) Data: 1.22 megabytes read (19.82 kilobytes per second) 252.35 megabytes written (4.01 megabytes per second) Again, it looks like performance actually went down. Does this make sense? I've been fighting with these tests all day, and I'm still not convinced I'm doing something fundamentally wrong. I'll try some more tests on Monday. The thing which strikes me about your postmark results is that the read speed result seems to be really very slow indeed. I've never seen a result that slow in fact. 19k/sec is really rather pathetic! You ought to be seeing a figure in the region of Mb/sec though it is usually slower than the write speed. Did you forget to turn off the drop_count thing? Are you using 128M journals or the default (64M) ? To be comparable with my postmark runs, then you need: set transactions 100000 set number 100000 I tend to find that anything less results in seeing the effect of caches, but if you have less memory on the machine than the 4G I have, you might not need to turn it up so much to get sensible results. Here is a fairly typical run from my test machine. Its running lock_dlm, but only as a single node. Time: 376 seconds total 230 seconds of transactions (434 per second) Files: 150087 created (399 per second) Creation alone: 100000 files (1052 per second) Mixed with transactions: 50087 files (217 per second) 49995 read (217 per second) 49991 appended (217 per second) 150087 deleted (399 per second) Deletion alone: 100174 files (1964 per second) Mixed with transactions: 49913 files (217 per second) Data: 273.42 megabytes read (744.64 kilobytes per second) 852.13 megabytes written (2.27 megabytes per second) I didn't do anything special to get this, so I'm not sure exactly what state the machine was in when I started and I didn't disable any of the system daemons etc. Normally the read & write speed is slightly higher than this, but then I'm using to lock_nolock figures rather than lock_dlm figures, so thats probably whats making it look a bit lower than normal. This is really confusing. There must be some other mitigating factor. I'm not sure what "as a single node" means in the previous comment. Do you mean that only one node had the file system mounted for the test? The drop_count thing definitely helped. What's alarming is that the longer my system sat idle, the slower the postmark test ran, regardless of whether or not it had the fix running. When I was running these tests, I was typically popping over to the system and starting another run in between working on other problems. So sometimes there would be an hour between runs, and you could see the run time go down with each run. Here are some more results, both with and without the fix: Without fix: ---------------------------------------------------- pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 702 seconds total 463 seconds of transactions (215 per second) Files: 150087 created (213 per second) Creation alone: 100000 files (854 per second) Mixed with transactions: 50087 files (108 per second) 49995 read (107 per second) 49991 appended (107 per second) 150087 deleted (213 per second) Deletion alone: 100174 files (821 per second) Mixed with transactions: 49913 files (107 per second) Data: 273.42 megabytes read (398.84 kilobytes per second) 852.13 megabytes written (1.21 megabytes per second) pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 720 seconds total 492 seconds of transactions (203 per second) Files: 150087 created (208 per second) Creation alone: 100000 files (900 per second) Mixed with transactions: 50087 files (101 per second) 49995 read (101 per second) 49991 appended (101 per second) 150087 deleted (208 per second) Deletion alone: 100174 files (856 per second) Mixed with transactions: 49913 files (101 per second) Data: 273.42 megabytes read (388.87 kilobytes per second) 852.13 megabytes written (1.18 megabytes per second) With the fix: ------------------------------------------------------- Creating files...Done Performing transactions..........Done Deleting files...Done Time: 697 seconds total 465 seconds of transactions (215 per second) Files: 150087 created (215 per second) Creation alone: 100000 files (862 per second) Mixed with transactions: 50087 files (107 per second) 49995 read (107 per second) 49991 appended (107 per second) 150087 deleted (215 per second) Deletion alone: 100174 files (863 per second) Mixed with transactions: 49913 files (107 per second) Data: 273.42 megabytes read (401.70 kilobytes per second) 852.13 megabytes written (1.22 megabytes per second) pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 807 seconds total 624 seconds of transactions (160 per second) Files: 150087 created (185 per second) Creation alone: 100000 files (1492 per second) Mixed with transactions: 50087 files (80 per second) 49995 read (80 per second) 49991 appended (80 per second) 150087 deleted (185 per second) Deletion alone: 100174 files (863 per second) Mixed with transactions: 49913 files (79 per second) Data: 273.42 megabytes read (346.94 kilobytes per second) 852.13 megabytes written (1.06 megabytes per second) pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 872 seconds total 692 seconds of transactions (144 per second) Files: 150087 created (172 per second) Creation alone: 100000 files (1587 per second) Mixed with transactions: 50087 files (72 per second) 49995 read (72 per second) 49991 appended (72 per second) 150087 deleted (172 per second) Deletion alone: 100174 files (856 per second) Mixed with transactions: 49913 files (72 per second) Data: 273.42 megabytes read (321.08 kilobytes per second) 852.13 megabytes written (1000.66 kilobytes per second) pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 909 seconds total 711 seconds of transactions (140 per second) Files: 150087 created (165 per second) Creation alone: 100000 files (1111 per second) Mixed with transactions: 50087 files (70 per second) 49995 read (70 per second) 49991 appended (70 per second) 150087 deleted (165 per second) Deletion alone: 100174 files (927 per second) Mixed with transactions: 49913 files (70 per second) Data: 273.42 megabytes read (308.01 kilobytes per second) 852.13 megabytes written (959.93 kilobytes per second) pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 941 seconds total 756 seconds of transactions (132 per second) Files: 150087 created (159 per second) Creation alone: 100000 files (1428 per second) Mixed with transactions: 50087 files (66 per second) 49995 read (66 per second) 49991 appended (66 per second) 150087 deleted (159 per second) Deletion alone: 100174 files (871 per second) Mixed with transactions: 49913 files (66 per second) Data: 273.42 megabytes read (297.54 kilobytes per second) 852.13 megabytes written (927.29 kilobytes per second) After postmark was ended and restarted: pm>run Creating files...Done Performing transactions..........Done Deleting files...Done Time: 1044 seconds total 877 seconds of transactions (114 per second) Files: 150087 created (143 per second) Creation alone: 100000 files (1449 per second) Mixed with transactions: 50087 files (57 per second) 49995 read (57 per second) 49991 appended (57 per second) 150087 deleted (143 per second) Deletion alone: 100174 files (1022 per second) Mixed with transactions: 49913 files (56 per second) Data: 273.42 megabytes read (268.18 kilobytes per second) 852.13 megabytes written (835.80 kilobytes per second) This makes all the tests seem meaningless. Moving to proposed for cluster 4.6 - too late in the cycle to take the changes. Management decided this was not high enough priority to fix in RHEL4, since the performance savings were never proven for GFS1. Closing as WONTFIX, but people may still use the proposed patch as they see fit, if they compile the code themselves. |