Bug 1535780

Summary:	Poor Gluster Block Performance for PostgreSQL in CNS environment
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Shekhar Berry <shberry>
Component:	gluster-block	Assignee:	Prasanna Kumar Kalever <prasanna.kalever>
Status:	CLOSED ERRATA	QA Contact:	krishnaram Karthick <kramdoss>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	cns-3.7	CC:	ekuric, kramdoss, mpillai, pkarampu, prasanna.kalever, psuriset, rhs-bugs, rsussman, sanandpa, shberry, storage-qa-internal, vinug, xiubli
Target Milestone:	---	Keywords:	Performance
Target Release:	CNS 3.10
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	gluster-block-0.2.1-18.el7rhgs	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-12 09:25:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1559746, 1563298, 1564082, 1565063
Bug Blocks:	1568861

Description Shekhar Berry 2018-01-18 05:04:26 UTC

Description of problem:

PostgreSQL database when running on Gluster Block Volume in a CNS environment reports poor performance compared to PostgreSQL running on Gluster File volume. There is 67% percent drop in performance. Glusterfs reports 129 tps whereas Gluster Block reports 42 tps. Pgbench tool was used to capture the performance numbers.

First Database was scaled:

pgbench -i -s 10000 sampledb

and then performance was captured using command:

pgbench -c 10 -j 2 -t 25000 sampledb

The following gluster tunables were set on Gluster File Volume:

Volume Name: vol_f6b9200b3a6912a8c35b7aaa9c9c391c
Type: Replicate
Volume ID: 64bad30d-00c2-4729-a00e-718bf6cfe8a4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.28.40.47:/var/lib/heketi/mounts/vg_2038c643c494c481b99c62727c86e93e/brick_3061767c75198cc221e019ef29f077bd/brick
Brick2: 172.28.40.40:/var/lib/heketi/mounts/vg_dcc0051caa0aac34d488a8048043b963/brick_c0c84a320b9c608187b7fe79a43e6099/brick
Brick3: 172.28.40.45:/var/lib/heketi/mounts/vg_d9c3f0ba91e1f68d0177ebabfb0eea99/brick_510de37045e9b1695211f2e6ad821254/brick
Options Reconfigured:
performance.client-io-threads: off
performance.open-behind: off
performance.quick-read: off
performance.stat-prefetch: off
performance.write-behind: off
performance.strict-o-direct: on
performance.read-ahead: off
performance.io-cache: off
performance.readdir-ahead: off
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on

Whereas gluster Block volume had default block volume profile set on it.

Version-Release number of selected component (if applicable):

glusterfs-libs-3.8.4-44.el7rhgs.x86_64
glusterfs-3.8.4-44.el7rhgs.x86_64
glusterfs-api-3.8.4-44.el7rhgs.x86_64
glusterfs-cli-3.8.4-44.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64
glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
glusterfs-server-3.8.4-44.el7rhgs.x86_64
gluster-block-0.2.1-14.el7rhgs.x86_64

oc version
oc v3.7.11
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://gprfs020.sbu.lab.eng.bos.redhat.com:8443
openshift v3.7.11
kubernetes v1.7.6+a08f5eeb62


How reproducible:

Always

Additional info:

Here is extracted output from Volume Profile for both gluster block volume and gluster file volume:

Block_Volume_Profile
--------------------
Interval 1 Stats:
 
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                  672                   686                   789 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                61212               1063318                 59789 
 
   Block Size:              32768b+               65536b+              131072b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                74223                168138               2339146 
 
   Block Size:             262144b+ 
 No. of Reads:                    0 
No. of Writes:                    1 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            263     RELEASE
      0.00       0.00 us       0.00 us       0.00 us           2175  RELEASEDIR
      0.00       3.67 us       3.00 us       4.00 us              3     OPENDIR
      0.00     145.50 us     143.00 us     148.00 us              2        OPEN
      0.00      86.40 us      29.00 us     164.00 us              5    GETXATTR
      0.00      87.20 us      57.00 us     129.00 us             10     INODELK
      0.00     171.86 us      52.00 us     283.00 us              7      LOOKUP
      0.00     654.17 us     319.00 us    1201.00 us              6     READDIR
      0.10      87.00 us      41.00 us     205.00 us           5780    FINODELK
      0.41     293.86 us     152.00 us   18025.00 us           7280    FXATTROP
      0.44     789.18 us     173.00 us   40891.00 us           2874       FSYNC
     99.06   45108.70 us     171.00 us  918580.00 us          11431       WRITE

File_Volume_Profile
-------------------

Interval 1 Stats:
   Block Size:                 32b+                 512b+                1024b+ 
 No. of Reads:                    0                     6                     4 
No. of Writes:                   12                    11                    18 
 
   Block Size:               8192b+               16384b+               32768b+ 
 No. of Reads:                 7298                    26                     2 
No. of Writes:                  301                   896                  1385 
 
   Block Size:              65536b+              131072b+ 
 No. of Reads:                    7                     0 
No. of Writes:                  863                    42 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             29      FORGET
      0.00       0.00 us       0.00 us       0.00 us           1898     RELEASE
      0.00     112.50 us      85.00 us     131.00 us              4      STATFS
      0.01     106.62 us      58.00 us     193.00 us             71        STAT
      0.01      70.28 us      34.00 us     142.00 us            116     INODELK
      0.01     294.79 us     223.00 us     416.00 us             29      RENAME
      0.01     360.17 us     283.00 us     545.00 us             29      CREATE
      0.01      64.88 us      32.00 us     161.00 us            174     ENTRYLK
      0.04     156.24 us     108.00 us     332.00 us            278        OPEN
      0.07     233.62 us     101.00 us     404.00 us            371      LOOKUP
      0.09      52.67 us      22.00 us     173.00 us           1900       FLUSH
      0.45      74.59 us      28.00 us     225.00 us           7054       FSTAT
      0.52     171.13 us      90.00 us     312.00 us           3528       WRITE
      0.67      79.40 us      29.00 us     297.00 us           9861    FINODELK
      1.59     263.67 us     174.00 us   10567.00 us           7047    FXATTROP
     26.70    4789.79 us     231.00 us  396493.00 us           6511       FSYNC
     69.84   11108.87 us      45.00 us  756177.00 us           7343        READ

As seen above from block volume profile above WRITE is almost 3 times more on Brick than gluster file volume brick. This is what causing the performance drop for gluster block.

Comment 10 Xiubo Li 2018-02-12 09:46:38 UTC

Just cross my mind one possible limitation that the Ring Buffer in Kernel for now is fixed to 8M(If using the 3.10 kernel without the dynamic growth feature).This will also slow down the performance too in high IOPS case.

Comment 11 Xiubo Li 2018-02-23 08:05:42 UTC

Pranith, Shekhar,

The following are my test output:

Before changing of the Ring Buffer size(1M as default):

file:
[postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test
tps = 81.652434 (including connections establishing)
tps = 81.659641 (excluding connections establishing)

block:
bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
tps = 105.612957 (including connections establishing)
tps = 105.620169 (excluding connections establishing)

I'm not sure why the block's tps is higher than the file's when I first got the machines, and I have test this many times were still the same.


=======================
After changing of the Ring Buffer size to 64M.

1, Replace the kernel to 3.10.0-693.5.2.3.el7.bclinux.x86_64, one of my old test kernel with LIO patches backported base 3.10.0-693.5.2, because the current kernel doesn't include the Ring Buffer resize patches.
2, Replace the gluster-block and configshell to add the ringbuffer size scaleable.

file:
postgres@gprfs025 ~]$ pgbench -c 10 -j 2 -t 100 test
tps = 96.613083 (including connections establishing)
tps = 97.094510 (excluding connections establishing)


block:
-bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
tps = 583.529217 (including connections establishing)
tps = 583.750357 (excluding connections establishing)


May it work ?

Comment 12 Shekhar Berry 2018-02-23 17:13:51 UTC

(In reply to Xiubo Li from comment #11)
> Pranith, Shekhar,
> 
> The following are my test output:
> 
> Before changing of the Ring Buffer size(1M as default):
> 
> file:
> [postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test
> tps = 81.652434 (including connections establishing)
> tps = 81.659641 (excluding connections establishing)
> 
> block:
> bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
> tps = 105.612957 (including connections establishing)
> tps = 105.620169 (excluding connections establishing)
> 
> I'm not sure why the block's tps is higher than the file's when I first got
> the machines, and I have test this many times were still the same.

Hi Xiubo,

That was due to a node getting disconnected due to network failure. I have tested it with new volume after fixing the node issue. The tps is back to 150 tps for file.

Comment 13 Xiubo Li 2018-02-24 01:50:24 UTC

(In reply to Shekhar Berry from comment #12)
> (In reply to Xiubo Li from comment #11)
> > Pranith, Shekhar,
> > 
> > The following are my test output:
> > 
> > Before changing of the Ring Buffer size(1M as default):
> > 
> > file:
> > [postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test
> > tps = 81.652434 (including connections establishing)
> > tps = 81.659641 (excluding connections establishing)
> > 
> > block:
> > bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
> > tps = 105.612957 (including connections establishing)
> > tps = 105.620169 (excluding connections establishing)
> > 
> > I'm not sure why the block's tps is higher than the file's when I first got
> > the machines, and I have test this many times were still the same.
> 
> Hi Xiubo,
> 
> That was due to a node getting disconnected due to network failure. I have
> tested it with new volume after fixing the node issue. The tps is back to
> 150 tps for file.

Hi Shekhar,

Okay, get it.

BTW, have ever test the block currently using my changes?

I have test it again today for both of them:

file:
[postgres@gprfs025 ~]$  pgbench -c 10 -j 2 -t 4000 test 
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 3000
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 4000
number of transactions actually processed: 40000/40000
tps = 65.080532 (including connections establishing)
tps = 65.085644 (excluding connections establishing)


block:
-bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 4000
number of transactions actually processed: 40000/40000
tps = 494.927941 (including connections establishing)
tps = 495.094597 (excluding connections establishing)
-bash-4.2$ lsblk

For the kernel, 3.10.0-799.el7.x86_64 the default one doesn't include the RingBuffer resize patches.

I have cloned the RHEL git repository and it has already merged them, so we could update the kernel to a higher version of 799.

Comment 21 Prasanna Kumar Kalever 2018-03-21 17:00:03 UTC

Related Patch:
https://github.com/gluster/gluster-block/pull/60

Comment 29 Shekhar Berry 2018-07-10 09:26:30 UTC

With newer builds we see that Gluster Block is performing well for postgresDB compared to gluster file. So this BZ can be closed.

Comment 30 krishnaram Karthick 2018-07-11 14:25:04 UTC

Moving the bug to verified based on comment 29.

Comment 32 errata-xmlrpc 2018-09-12 09:25:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2691