Bug 1291768 - [tiering]: large performance degradation seen for some workloads with ctr enabled
[tiering]: large performance degradation seen for some workloads with ctr ena...
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Milind Changire
Manoj Pillai
: Performance, ZStream
Depends On:
  Show dependency treegraph
Reported: 2015-12-15 10:07 EST by Manoj Pillai
Modified: 2017-07-14 05:48 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Manoj Pillai 2015-12-15 10:07:07 EST
Description of problem:

Enabling ctr on a volume results in significant reduction in I/O throughput to the volume. Reducing this performance impact is important because it affects performance of tiered volumes, and performance gains that can be obtained using tiered volumes.

ctr impact on performance is particularly high for sequential writes to large files, where a drop of 35% is seen in some cases on enabling ctr. Currently, a drop of about 10% is seen on sequential reads to large files.

How reproducible:
Consistently, on writes. Read performance with ctr enabled is showing quite a bit of variation, but a drop of about 10% is the typical, with some runs showing more.

Steps to Reproduce:
create and mount a gluster volume. in this test, a 2x2 volume was used with SAS-SSD as underlying brick storage. options set on the volume: lookup-optimize on, client/server event-threads 4.

run a multi-threaded iozone write/read test and not the throughput recorded. 

repeat test, but set the following option as well on the volume before running the test:
gluster volume set $rhs_volume features.ctr-enabled on
[record-counters is off]

Actual results:

ctr not enabled:
write throughput  = 337242.34 KB/sec
read throughput   = 1346288.64 KB/sec

ctr enabled:
write throughput  = 224108.58 KB/sec
read throughput   = 1214276.92 KB/sec

Expected results:

No hard expectations, but ideally ctr enabled should result in only a few % drop in performance.

Additional info:
Comment 2 Joseph Elwin Fernandes 2016-01-11 03:41:02 EST
Test have above with sql-cache:12500 and sql-wal size:25000 pages

Option: features.ctr-sql-db-cachesize
Default Value: 12500
Description: Defines the cache size of the sqlite database of changetimerecorder xlator.The input to this option is in pages.Each page is 4096 bytes. Default value is 12500 pages i.e ~ 49 MB. The max value is 262144 pages i.e 1 GB and the min value is 1000 pages i.e ~ 4 MB. 

Option: features.ctr-sql-db-wal-autocheckpoint
Default Value: 25000
Description: Defines the autocheckpoint of the sqlite database of  changetimerecorder. The input to this option is in pages. Each page is 4096 bytes. Default value is 25000 pages i.e ~ 98 MB.The max value is 262144 pages i.e 1 GB and the min value is 1000 pages i.e ~4 MB.
Comment 4 Manoj Pillai 2016-01-12 07:26:47 EST
Recent builds are showing much improved performance with ctr on.

E.g. repeat of test in comment #0 with glusterfs-3.7.5-15.el7rhgs.x86_64:

ctr not enabled:
write throughput  = 343957.13 KB/sec
read throughput   = 1385587.59 KB/sec

ctr enabled:
write throughput  = 331219.20 KB/sec
read throughput   = 1211810.69 KB/sec

However, the old defaults for sql-cache and sql-wal size are giving better performance:

ctr enabled, with sql-cache and sql-wal at 1000 each:
write throughput  = 341040.57 KB/sec
read throughput   = 1334078.26 KB/sec

More runs are needed to characterize ctr overhead more thoroughly, but there is definitely big improvements in the new builds.
Comment 6 Manoj Pillai 2016-01-27 04:10:02 EST
Updating bug with numbers for build glusterfs*-3.7.5-17.el7rhgs.x86_64:

We're just looking overhead seen from enabling ctr on a gluster volume here -- no tier setup. large-file numbers are with iozone. small-file numbers are with the smallfile benchmark.

2x(8+4) volume:
ctr off:
        large-file seq write       = 1191140.81 KB/sec
        large-file seq read        = 1353652.79 KB/sec
        large-file random read     =   41545.47 KB/sec
        large-file random write    =   77722.64 KB/sec

ctr on:
        large-file sequential write  = 1112725.20 KB/sec
        large-file seq read          = 1156032.43 KB/sec
        large-file random read       =   40674.99 KB/sec
        large-file random write      =   73536.95 KB/sec

2x2 SAS-SSD volume:
ctr off:
        large-file seq write       =  334135.34  KB/sec
        large-file seq read        = 1416166.25 KB/sec
        large-file random read     =  408672.50 KB/sec
        large-file random write    =  255235.33 KB/sec

ctr on:
        large-file seq write       =  275198.68  KB/sec
        large-file seq read        = 1030546.73 KB/sec
        large-file random read     =  290876.97 KB/sec
        large-file random write    =  240935.74 KB/sec

2x(8+4) volume:
ctr off:
        small-file create  =  114 MB/s
        small-file read    =   99 MB/s

ctr on:
        small-file create  =   92 MB/s
        small-file read    =   82 MB/s

2x2 SAS-SSD volume:
ctr off:
        small-file create  =  52 MB/s
        small-file read    =   486 MB/s

ctr on:
        small-file create  =   45 MB/s
        small-file read    =   487 MB/s

Some of these tests are showing fairly high performance impact of turning on ctr. For example, large-file random read on 2x2 SAS-SSD volume is showing 30% drop on turning ctr on. small-file create is showing about 15-20% degradation from ctr.

It will be good to reduce ctr overheads going forward, because these overheads cut into the performance benefit you can obtain from tiering.
Comment 7 Manoj Pillai 2016-02-24 05:00:48 EST
Results with glusterfs*-3.7.5-19.el7.x86_64 with fast PCI SSDs. Test in this case is using iozone to perform seq write, seq read, rand read and rand write. data set is 480 GB. Total 24 threads,each accessing a 20GB file.

Results with 2x2 PCI-SSD volume (2 servers, 64GB RAM each):

        Sequential write  = 1204927.25 KB/sec
        Sequential read   = 2396584.24 KB/sec
        Random read       =  848802.38 KB/sec
        Random write      = 1202198.43 KB/sec

Note capacity of volume is 1.5TB

Results with tiered volume with 2x2 PCI-SSD (above) as hot tier. Attach tier and run the same iozone test, so all data accesses should go to hot tier (480GB data set, 1.5TB hot tier capacity).
        Seq write        = 1204771.33 KB/sec
        Seq read         = 2283647.04 KB/sec
        Random read      =  749896.74 KB/sec
        Random write     =  766392.37 KB/sec

Sequential write and read are doing pretty well, but severe drop in throughput with random read and write.

Repeat with 2x2 PCI-SSD volume, but this time with ctr enabled:
        Seq write        = 1204094.61 KB/sec
        Seq read         = 2335040.09 KB/sec
        Random read      =  746581.30 KB/sec
        Random write     =  759357.20 KB/sec

This is showing that for this test, the degraded performance of tiered volume is solely because of ctr. For random write, the drop when turning ctr on is more than 35% (760/1202).

Note You need to log in before you can comment on or make changes to this bug.