Description of problem: PostgreSQL database when running on Gluster Block Volume in a CNS environment reports poor performance compared to PostgreSQL running on Gluster File volume. There is 67% percent drop in performance. Glusterfs reports 129 tps whereas Gluster Block reports 42 tps. Pgbench tool was used to capture the performance numbers. First Database was scaled: pgbench -i -s 10000 sampledb and then performance was captured using command: pgbench -c 10 -j 2 -t 25000 sampledb The following gluster tunables were set on Gluster File Volume: Volume Name: vol_f6b9200b3a6912a8c35b7aaa9c9c391c Type: Replicate Volume ID: 64bad30d-00c2-4729-a00e-718bf6cfe8a4 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.28.40.47:/var/lib/heketi/mounts/vg_2038c643c494c481b99c62727c86e93e/brick_3061767c75198cc221e019ef29f077bd/brick Brick2: 172.28.40.40:/var/lib/heketi/mounts/vg_dcc0051caa0aac34d488a8048043b963/brick_c0c84a320b9c608187b7fe79a43e6099/brick Brick3: 172.28.40.45:/var/lib/heketi/mounts/vg_d9c3f0ba91e1f68d0177ebabfb0eea99/brick_510de37045e9b1695211f2e6ad821254/brick Options Reconfigured: performance.client-io-threads: off performance.open-behind: off performance.quick-read: off performance.stat-prefetch: off performance.write-behind: off performance.strict-o-direct: on performance.read-ahead: off performance.io-cache: off performance.readdir-ahead: off transport.address-family: inet nfs.disable: on cluster.brick-multiplex: on Whereas gluster Block volume had default block volume profile set on it. Version-Release number of selected component (if applicable): glusterfs-libs-3.8.4-44.el7rhgs.x86_64 glusterfs-3.8.4-44.el7rhgs.x86_64 glusterfs-api-3.8.4-44.el7rhgs.x86_64 glusterfs-cli-3.8.4-44.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64 glusterfs-fuse-3.8.4-44.el7rhgs.x86_64 glusterfs-server-3.8.4-44.el7rhgs.x86_64 gluster-block-0.2.1-14.el7rhgs.x86_64 oc version oc v3.7.11 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://gprfs020.sbu.lab.eng.bos.redhat.com:8443 openshift v3.7.11 kubernetes v1.7.6+a08f5eeb62 How reproducible: Always Additional info: Here is extracted output from Volume Profile for both gluster block volume and gluster file volume: Block_Volume_Profile -------------------- Interval 1 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 672 686 789 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 0 0 0 No. of Writes: 61212 1063318 59789 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 0 0 No. of Writes: 74223 168138 2339146 Block Size: 262144b+ No. of Reads: 0 No. of Writes: 1 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 263 RELEASE 0.00 0.00 us 0.00 us 0.00 us 2175 RELEASEDIR 0.00 3.67 us 3.00 us 4.00 us 3 OPENDIR 0.00 145.50 us 143.00 us 148.00 us 2 OPEN 0.00 86.40 us 29.00 us 164.00 us 5 GETXATTR 0.00 87.20 us 57.00 us 129.00 us 10 INODELK 0.00 171.86 us 52.00 us 283.00 us 7 LOOKUP 0.00 654.17 us 319.00 us 1201.00 us 6 READDIR 0.10 87.00 us 41.00 us 205.00 us 5780 FINODELK 0.41 293.86 us 152.00 us 18025.00 us 7280 FXATTROP 0.44 789.18 us 173.00 us 40891.00 us 2874 FSYNC 99.06 45108.70 us 171.00 us 918580.00 us 11431 WRITE File_Volume_Profile ------------------- Interval 1 Stats: Block Size: 32b+ 512b+ 1024b+ No. of Reads: 0 6 4 No. of Writes: 12 11 18 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 7298 26 2 No. of Writes: 301 896 1385 Block Size: 65536b+ 131072b+ No. of Reads: 7 0 No. of Writes: 863 42 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 29 FORGET 0.00 0.00 us 0.00 us 0.00 us 1898 RELEASE 0.00 112.50 us 85.00 us 131.00 us 4 STATFS 0.01 106.62 us 58.00 us 193.00 us 71 STAT 0.01 70.28 us 34.00 us 142.00 us 116 INODELK 0.01 294.79 us 223.00 us 416.00 us 29 RENAME 0.01 360.17 us 283.00 us 545.00 us 29 CREATE 0.01 64.88 us 32.00 us 161.00 us 174 ENTRYLK 0.04 156.24 us 108.00 us 332.00 us 278 OPEN 0.07 233.62 us 101.00 us 404.00 us 371 LOOKUP 0.09 52.67 us 22.00 us 173.00 us 1900 FLUSH 0.45 74.59 us 28.00 us 225.00 us 7054 FSTAT 0.52 171.13 us 90.00 us 312.00 us 3528 WRITE 0.67 79.40 us 29.00 us 297.00 us 9861 FINODELK 1.59 263.67 us 174.00 us 10567.00 us 7047 FXATTROP 26.70 4789.79 us 231.00 us 396493.00 us 6511 FSYNC 69.84 11108.87 us 45.00 us 756177.00 us 7343 READ As seen above from block volume profile above WRITE is almost 3 times more on Brick than gluster file volume brick. This is what causing the performance drop for gluster block.
Just cross my mind one possible limitation that the Ring Buffer in Kernel for now is fixed to 8M(If using the 3.10 kernel without the dynamic growth feature).This will also slow down the performance too in high IOPS case.
Pranith, Shekhar, The following are my test output: Before changing of the Ring Buffer size(1M as default): file: [postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test tps = 81.652434 (including connections establishing) tps = 81.659641 (excluding connections establishing) block: bash-4.2$ pgbench -c 10 -j 2 -t 4000 test tps = 105.612957 (including connections establishing) tps = 105.620169 (excluding connections establishing) I'm not sure why the block's tps is higher than the file's when I first got the machines, and I have test this many times were still the same. ======================= After changing of the Ring Buffer size to 64M. 1, Replace the kernel to 3.10.0-693.5.2.3.el7.bclinux.x86_64, one of my old test kernel with LIO patches backported base 3.10.0-693.5.2, because the current kernel doesn't include the Ring Buffer resize patches. 2, Replace the gluster-block and configshell to add the ringbuffer size scaleable. file: postgres@gprfs025 ~]$ pgbench -c 10 -j 2 -t 100 test tps = 96.613083 (including connections establishing) tps = 97.094510 (excluding connections establishing) block: -bash-4.2$ pgbench -c 10 -j 2 -t 4000 test tps = 583.529217 (including connections establishing) tps = 583.750357 (excluding connections establishing) May it work ?
(In reply to Xiubo Li from comment #11) > Pranith, Shekhar, > > The following are my test output: > > Before changing of the Ring Buffer size(1M as default): > > file: > [postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test > tps = 81.652434 (including connections establishing) > tps = 81.659641 (excluding connections establishing) > > block: > bash-4.2$ pgbench -c 10 -j 2 -t 4000 test > tps = 105.612957 (including connections establishing) > tps = 105.620169 (excluding connections establishing) > > I'm not sure why the block's tps is higher than the file's when I first got > the machines, and I have test this many times were still the same. Hi Xiubo, That was due to a node getting disconnected due to network failure. I have tested it with new volume after fixing the node issue. The tps is back to 150 tps for file.
(In reply to Shekhar Berry from comment #12) > (In reply to Xiubo Li from comment #11) > > Pranith, Shekhar, > > > > The following are my test output: > > > > Before changing of the Ring Buffer size(1M as default): > > > > file: > > [postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test > > tps = 81.652434 (including connections establishing) > > tps = 81.659641 (excluding connections establishing) > > > > block: > > bash-4.2$ pgbench -c 10 -j 2 -t 4000 test > > tps = 105.612957 (including connections establishing) > > tps = 105.620169 (excluding connections establishing) > > > > I'm not sure why the block's tps is higher than the file's when I first got > > the machines, and I have test this many times were still the same. > > Hi Xiubo, > > That was due to a node getting disconnected due to network failure. I have > tested it with new volume after fixing the node issue. The tps is back to > 150 tps for file. Hi Shekhar, Okay, get it. BTW, have ever test the block currently using my changes? I have test it again today for both of them: file: [postgres@gprfs025 ~]$ pgbench -c 10 -j 2 -t 4000 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 3000 query mode: simple number of clients: 10 number of threads: 2 number of transactions per client: 4000 number of transactions actually processed: 40000/40000 tps = 65.080532 (including connections establishing) tps = 65.085644 (excluding connections establishing) block: -bash-4.2$ pgbench -c 10 -j 2 -t 4000 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 10 number of threads: 2 number of transactions per client: 4000 number of transactions actually processed: 40000/40000 tps = 494.927941 (including connections establishing) tps = 495.094597 (excluding connections establishing) -bash-4.2$ lsblk For the kernel, 3.10.0-799.el7.x86_64 the default one doesn't include the RingBuffer resize patches. I have cloned the RHEL git repository and it has already merged them, so we could update the kernel to a higher version of 799.
Related Patch: https://github.com/gluster/gluster-block/pull/60
With newer builds we see that Gluster Block is performing well for postgresDB compared to gluster file. So this BZ can be closed.
Moving the bug to verified based on comment 29.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2691