Description of problem: Enabling ctr on a volume results in significant reduction in I/O throughput to the volume. Reducing this performance impact is important because it affects performance of tiered volumes, and performance gains that can be obtained using tiered volumes. ctr impact on performance is particularly high for sequential writes to large files, where a drop of 35% is seen in some cases on enabling ctr. Currently, a drop of about 10% is seen on sequential reads to large files. How reproducible: Consistently, on writes. Read performance with ctr enabled is showing quite a bit of variation, but a drop of about 10% is the typical, with some runs showing more. Steps to Reproduce: 1. create and mount a gluster volume. in this test, a 2x2 volume was used with SAS-SSD as underlying brick storage. options set on the volume: lookup-optimize on, client/server event-threads 4. 2. run a multi-threaded iozone write/read test and not the throughput recorded. 3. repeat test, but set the following option as well on the volume before running the test: gluster volume set $rhs_volume features.ctr-enabled on [record-counters is off] Actual results: ctr not enabled: write throughput = 337242.34 KB/sec read throughput = 1346288.64 KB/sec ctr enabled: write throughput = 224108.58 KB/sec read throughput = 1214276.92 KB/sec Expected results: No hard expectations, but ideally ctr enabled should result in only a few % drop in performance. Additional info:
Test have above with sql-cache:12500 and sql-wal size:25000 pages Option: features.ctr-sql-db-cachesize Default Value: 12500 Description: Defines the cache size of the sqlite database of changetimerecorder xlator.The input to this option is in pages.Each page is 4096 bytes. Default value is 12500 pages i.e ~ 49 MB. The max value is 262144 pages i.e 1 GB and the min value is 1000 pages i.e ~ 4 MB. Option: features.ctr-sql-db-wal-autocheckpoint Default Value: 25000 Description: Defines the autocheckpoint of the sqlite database of changetimerecorder. The input to this option is in pages. Each page is 4096 bytes. Default value is 25000 pages i.e ~ 98 MB.The max value is 262144 pages i.e 1 GB and the min value is 1000 pages i.e ~4 MB.
Recent builds are showing much improved performance with ctr on. E.g. repeat of test in comment #0 with glusterfs-3.7.5-15.el7rhgs.x86_64: ctr not enabled: write throughput = 343957.13 KB/sec read throughput = 1385587.59 KB/sec ctr enabled: write throughput = 331219.20 KB/sec read throughput = 1211810.69 KB/sec However, the old defaults for sql-cache and sql-wal size are giving better performance: ctr enabled, with sql-cache and sql-wal at 1000 each: write throughput = 341040.57 KB/sec read throughput = 1334078.26 KB/sec More runs are needed to characterize ctr overhead more thoroughly, but there is definitely big improvements in the new builds.
Updating bug with numbers for build glusterfs*-3.7.5-17.el7rhgs.x86_64: We're just looking overhead seen from enabling ctr on a gluster volume here -- no tier setup. large-file numbers are with iozone. small-file numbers are with the smallfile benchmark. 2x(8+4) volume: ctr off: large-file seq write = 1191140.81 KB/sec large-file seq read = 1353652.79 KB/sec large-file random read = 41545.47 KB/sec large-file random write = 77722.64 KB/sec ctr on: large-file sequential write = 1112725.20 KB/sec large-file seq read = 1156032.43 KB/sec large-file random read = 40674.99 KB/sec large-file random write = 73536.95 KB/sec 2x2 SAS-SSD volume: ctr off: large-file seq write = 334135.34 KB/sec large-file seq read = 1416166.25 KB/sec large-file random read = 408672.50 KB/sec large-file random write = 255235.33 KB/sec ctr on: large-file seq write = 275198.68 KB/sec large-file seq read = 1030546.73 KB/sec large-file random read = 290876.97 KB/sec large-file random write = 240935.74 KB/sec 2x(8+4) volume: ctr off: small-file create = 114 MB/s small-file read = 99 MB/s ctr on: small-file create = 92 MB/s small-file read = 82 MB/s 2x2 SAS-SSD volume: ctr off: small-file create = 52 MB/s small-file read = 486 MB/s ctr on: small-file create = 45 MB/s small-file read = 487 MB/s Some of these tests are showing fairly high performance impact of turning on ctr. For example, large-file random read on 2x2 SAS-SSD volume is showing 30% drop on turning ctr on. small-file create is showing about 15-20% degradation from ctr. It will be good to reduce ctr overheads going forward, because these overheads cut into the performance benefit you can obtain from tiering.
Results with glusterfs*-3.7.5-19.el7.x86_64 with fast PCI SSDs. Test in this case is using iozone to perform seq write, seq read, rand read and rand write. data set is 480 GB. Total 24 threads,each accessing a 20GB file. Results with 2x2 PCI-SSD volume (2 servers, 64GB RAM each): Sequential write = 1204927.25 KB/sec Sequential read = 2396584.24 KB/sec Random read = 848802.38 KB/sec Random write = 1202198.43 KB/sec Note capacity of volume is 1.5TB Results with tiered volume with 2x2 PCI-SSD (above) as hot tier. Attach tier and run the same iozone test, so all data accesses should go to hot tier (480GB data set, 1.5TB hot tier capacity). Seq write = 1204771.33 KB/sec Seq read = 2283647.04 KB/sec Random read = 749896.74 KB/sec Random write = 766392.37 KB/sec Sequential write and read are doing pretty well, but severe drop in throughput with random read and write. Repeat with 2x2 PCI-SSD volume, but this time with ctr enabled: Seq write = 1204094.61 KB/sec Seq read = 2335040.09 KB/sec Random read = 746581.30 KB/sec Random write = 759357.20 KB/sec This is showing that for this test, the degraded performance of tiered volume is solely because of ctr. For random write, the drop when turning ctr on is more than 35% (760/1202).
As tier is not being actively developed, I'm closing this bug. Feel free to open it if necessary.