Red Hat Bugzilla – Bug 1291768
[tiering]: large performance degradation seen for some workloads with ctr enabled
Last modified: 2018-05-24 17:32:24 EDT
Description of problem:
Enabling ctr on a volume results in significant reduction in I/O throughput to the volume. Reducing this performance impact is important because it affects performance of tiered volumes, and performance gains that can be obtained using tiered volumes.
ctr impact on performance is particularly high for sequential writes to large files, where a drop of 35% is seen in some cases on enabling ctr. Currently, a drop of about 10% is seen on sequential reads to large files.
Consistently, on writes. Read performance with ctr enabled is showing quite a bit of variation, but a drop of about 10% is the typical, with some runs showing more.
Steps to Reproduce:
create and mount a gluster volume. in this test, a 2x2 volume was used with SAS-SSD as underlying brick storage. options set on the volume: lookup-optimize on, client/server event-threads 4.
run a multi-threaded iozone write/read test and not the throughput recorded.
repeat test, but set the following option as well on the volume before running the test:
gluster volume set $rhs_volume features.ctr-enabled on
[record-counters is off]
ctr not enabled:
write throughput = 337242.34 KB/sec
read throughput = 1346288.64 KB/sec
write throughput = 224108.58 KB/sec
read throughput = 1214276.92 KB/sec
No hard expectations, but ideally ctr enabled should result in only a few % drop in performance.
Test have above with sql-cache:12500 and sql-wal size:25000 pages
Default Value: 12500
Description: Defines the cache size of the sqlite database of changetimerecorder xlator.The input to this option is in pages.Each page is 4096 bytes. Default value is 12500 pages i.e ~ 49 MB. The max value is 262144 pages i.e 1 GB and the min value is 1000 pages i.e ~ 4 MB.
Default Value: 25000
Description: Defines the autocheckpoint of the sqlite database of changetimerecorder. The input to this option is in pages. Each page is 4096 bytes. Default value is 25000 pages i.e ~ 98 MB.The max value is 262144 pages i.e 1 GB and the min value is 1000 pages i.e ~4 MB.
Recent builds are showing much improved performance with ctr on.
E.g. repeat of test in comment #0 with glusterfs-3.7.5-15.el7rhgs.x86_64:
ctr not enabled:
write throughput = 343957.13 KB/sec
read throughput = 1385587.59 KB/sec
write throughput = 331219.20 KB/sec
read throughput = 1211810.69 KB/sec
However, the old defaults for sql-cache and sql-wal size are giving better performance:
ctr enabled, with sql-cache and sql-wal at 1000 each:
write throughput = 341040.57 KB/sec
read throughput = 1334078.26 KB/sec
More runs are needed to characterize ctr overhead more thoroughly, but there is definitely big improvements in the new builds.
Updating bug with numbers for build glusterfs*-3.7.5-17.el7rhgs.x86_64:
We're just looking overhead seen from enabling ctr on a gluster volume here -- no tier setup. large-file numbers are with iozone. small-file numbers are with the smallfile benchmark.
large-file seq write = 1191140.81 KB/sec
large-file seq read = 1353652.79 KB/sec
large-file random read = 41545.47 KB/sec
large-file random write = 77722.64 KB/sec
large-file sequential write = 1112725.20 KB/sec
large-file seq read = 1156032.43 KB/sec
large-file random read = 40674.99 KB/sec
large-file random write = 73536.95 KB/sec
2x2 SAS-SSD volume:
large-file seq write = 334135.34 KB/sec
large-file seq read = 1416166.25 KB/sec
large-file random read = 408672.50 KB/sec
large-file random write = 255235.33 KB/sec
large-file seq write = 275198.68 KB/sec
large-file seq read = 1030546.73 KB/sec
large-file random read = 290876.97 KB/sec
large-file random write = 240935.74 KB/sec
small-file create = 114 MB/s
small-file read = 99 MB/s
small-file create = 92 MB/s
small-file read = 82 MB/s
2x2 SAS-SSD volume:
small-file create = 52 MB/s
small-file read = 486 MB/s
small-file create = 45 MB/s
small-file read = 487 MB/s
Some of these tests are showing fairly high performance impact of turning on ctr. For example, large-file random read on 2x2 SAS-SSD volume is showing 30% drop on turning ctr on. small-file create is showing about 15-20% degradation from ctr.
It will be good to reduce ctr overheads going forward, because these overheads cut into the performance benefit you can obtain from tiering.
Results with glusterfs*-3.7.5-19.el7.x86_64 with fast PCI SSDs. Test in this case is using iozone to perform seq write, seq read, rand read and rand write. data set is 480 GB. Total 24 threads,each accessing a 20GB file.
Results with 2x2 PCI-SSD volume (2 servers, 64GB RAM each):
Sequential write = 1204927.25 KB/sec
Sequential read = 2396584.24 KB/sec
Random read = 848802.38 KB/sec
Random write = 1202198.43 KB/sec
Note capacity of volume is 1.5TB
Results with tiered volume with 2x2 PCI-SSD (above) as hot tier. Attach tier and run the same iozone test, so all data accesses should go to hot tier (480GB data set, 1.5TB hot tier capacity).
Seq write = 1204771.33 KB/sec
Seq read = 2283647.04 KB/sec
Random read = 749896.74 KB/sec
Random write = 766392.37 KB/sec
Sequential write and read are doing pretty well, but severe drop in throughput with random read and write.
Repeat with 2x2 PCI-SSD volume, but this time with ctr enabled:
Seq write = 1204094.61 KB/sec
Seq read = 2335040.09 KB/sec
Random read = 746581.30 KB/sec
Random write = 759357.20 KB/sec
This is showing that for this test, the degraded performance of tiered volume is solely because of ctr. For random write, the drop when turning ctr on is more than 35% (760/1202).