Bug 1535780 - Poor Gluster Block Performance for PostgreSQL in CNS environment
Summary: Poor Gluster Block Performance for PostgreSQL in CNS environment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-block
Version: cns-3.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: CNS 3.10
Assignee: Prasanna Kumar Kalever
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On: 1559746 1563298 1564082 1565063
Blocks: 1568861
TreeView+ depends on / blocked
 
Reported: 2018-01-18 05:04 UTC by Shekhar Berry
Modified: 2019-02-13 09:03 UTC (History)
13 users (show)

Fixed In Version: gluster-block-0.2.1-18.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-12 09:25:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2691 0 None None None 2018-09-12 09:26:55 UTC

Description Shekhar Berry 2018-01-18 05:04:26 UTC
Description of problem:

PostgreSQL database when running on Gluster Block Volume in a CNS environment reports poor performance compared to PostgreSQL running on Gluster File volume. There is 67% percent drop in performance. Glusterfs reports 129 tps whereas Gluster Block reports 42 tps. Pgbench tool was used to capture the performance numbers.

First Database was scaled:

pgbench -i -s 10000 sampledb

and then performance was captured using command:

pgbench -c 10 -j 2 -t 25000 sampledb

The following gluster tunables were set on Gluster File Volume:

Volume Name: vol_f6b9200b3a6912a8c35b7aaa9c9c391c
Type: Replicate
Volume ID: 64bad30d-00c2-4729-a00e-718bf6cfe8a4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.28.40.47:/var/lib/heketi/mounts/vg_2038c643c494c481b99c62727c86e93e/brick_3061767c75198cc221e019ef29f077bd/brick
Brick2: 172.28.40.40:/var/lib/heketi/mounts/vg_dcc0051caa0aac34d488a8048043b963/brick_c0c84a320b9c608187b7fe79a43e6099/brick
Brick3: 172.28.40.45:/var/lib/heketi/mounts/vg_d9c3f0ba91e1f68d0177ebabfb0eea99/brick_510de37045e9b1695211f2e6ad821254/brick
Options Reconfigured:
performance.client-io-threads: off
performance.open-behind: off
performance.quick-read: off
performance.stat-prefetch: off
performance.write-behind: off
performance.strict-o-direct: on
performance.read-ahead: off
performance.io-cache: off
performance.readdir-ahead: off
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on

Whereas gluster Block volume had default block volume profile set on it.

Version-Release number of selected component (if applicable):

glusterfs-libs-3.8.4-44.el7rhgs.x86_64
glusterfs-3.8.4-44.el7rhgs.x86_64
glusterfs-api-3.8.4-44.el7rhgs.x86_64
glusterfs-cli-3.8.4-44.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64
glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
glusterfs-server-3.8.4-44.el7rhgs.x86_64
gluster-block-0.2.1-14.el7rhgs.x86_64

oc version
oc v3.7.11
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://gprfs020.sbu.lab.eng.bos.redhat.com:8443
openshift v3.7.11
kubernetes v1.7.6+a08f5eeb62


How reproducible:

Always

Additional info:

Here is extracted output from Volume Profile for both gluster block volume and gluster file volume:

Block_Volume_Profile
--------------------
Interval 1 Stats:
 
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                  672                   686                   789 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                61212               1063318                 59789 
 
   Block Size:              32768b+               65536b+              131072b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                74223                168138               2339146 
 
   Block Size:             262144b+ 
 No. of Reads:                    0 
No. of Writes:                    1 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            263     RELEASE
      0.00       0.00 us       0.00 us       0.00 us           2175  RELEASEDIR
      0.00       3.67 us       3.00 us       4.00 us              3     OPENDIR
      0.00     145.50 us     143.00 us     148.00 us              2        OPEN
      0.00      86.40 us      29.00 us     164.00 us              5    GETXATTR
      0.00      87.20 us      57.00 us     129.00 us             10     INODELK
      0.00     171.86 us      52.00 us     283.00 us              7      LOOKUP
      0.00     654.17 us     319.00 us    1201.00 us              6     READDIR
      0.10      87.00 us      41.00 us     205.00 us           5780    FINODELK
      0.41     293.86 us     152.00 us   18025.00 us           7280    FXATTROP
      0.44     789.18 us     173.00 us   40891.00 us           2874       FSYNC
     99.06   45108.70 us     171.00 us  918580.00 us          11431       WRITE

File_Volume_Profile
-------------------

Interval 1 Stats:
   Block Size:                 32b+                 512b+                1024b+ 
 No. of Reads:                    0                     6                     4 
No. of Writes:                   12                    11                    18 
 
   Block Size:               8192b+               16384b+               32768b+ 
 No. of Reads:                 7298                    26                     2 
No. of Writes:                  301                   896                  1385 
 
   Block Size:              65536b+              131072b+ 
 No. of Reads:                    7                     0 
No. of Writes:                  863                    42 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             29      FORGET
      0.00       0.00 us       0.00 us       0.00 us           1898     RELEASE
      0.00     112.50 us      85.00 us     131.00 us              4      STATFS
      0.01     106.62 us      58.00 us     193.00 us             71        STAT
      0.01      70.28 us      34.00 us     142.00 us            116     INODELK
      0.01     294.79 us     223.00 us     416.00 us             29      RENAME
      0.01     360.17 us     283.00 us     545.00 us             29      CREATE
      0.01      64.88 us      32.00 us     161.00 us            174     ENTRYLK
      0.04     156.24 us     108.00 us     332.00 us            278        OPEN
      0.07     233.62 us     101.00 us     404.00 us            371      LOOKUP
      0.09      52.67 us      22.00 us     173.00 us           1900       FLUSH
      0.45      74.59 us      28.00 us     225.00 us           7054       FSTAT
      0.52     171.13 us      90.00 us     312.00 us           3528       WRITE
      0.67      79.40 us      29.00 us     297.00 us           9861    FINODELK
      1.59     263.67 us     174.00 us   10567.00 us           7047    FXATTROP
     26.70    4789.79 us     231.00 us  396493.00 us           6511       FSYNC
     69.84   11108.87 us      45.00 us  756177.00 us           7343        READ

As seen above from block volume profile above WRITE is almost 3 times more on Brick than gluster file volume brick. This is what causing the performance drop for gluster block.

Comment 10 Xiubo Li 2018-02-12 09:46:38 UTC
Just cross my mind one possible limitation that the Ring Buffer in Kernel for now is fixed to 8M(If using the 3.10 kernel without the dynamic growth feature).This will also slow down the performance too in high IOPS case.

Comment 11 Xiubo Li 2018-02-23 08:05:42 UTC
Pranith, Shekhar,

The following are my test output:

Before changing of the Ring Buffer size(1M as default):

file:
[postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test
tps = 81.652434 (including connections establishing)
tps = 81.659641 (excluding connections establishing)

block:
bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
tps = 105.612957 (including connections establishing)
tps = 105.620169 (excluding connections establishing)

I'm not sure why the block's tps is higher than the file's when I first got the machines, and I have test this many times were still the same.


=======================
After changing of the Ring Buffer size to 64M.

1, Replace the kernel to 3.10.0-693.5.2.3.el7.bclinux.x86_64, one of my old test kernel with LIO patches backported base 3.10.0-693.5.2, because the current kernel doesn't include the Ring Buffer resize patches.
2, Replace the gluster-block and configshell to add the ringbuffer size scaleable.

file:
postgres@gprfs025 ~]$ pgbench -c 10 -j 2 -t 100 test
tps = 96.613083 (including connections establishing)
tps = 97.094510 (excluding connections establishing)


block:
-bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
tps = 583.529217 (including connections establishing)
tps = 583.750357 (excluding connections establishing)


May it work ?

Comment 12 Shekhar Berry 2018-02-23 17:13:51 UTC
(In reply to Xiubo Li from comment #11)
> Pranith, Shekhar,
> 
> The following are my test output:
> 
> Before changing of the Ring Buffer size(1M as default):
> 
> file:
> [postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test
> tps = 81.652434 (including connections establishing)
> tps = 81.659641 (excluding connections establishing)
> 
> block:
> bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
> tps = 105.612957 (including connections establishing)
> tps = 105.620169 (excluding connections establishing)
> 
> I'm not sure why the block's tps is higher than the file's when I first got
> the machines, and I have test this many times were still the same.

Hi Xiubo,

That was due to a node getting disconnected due to network failure. I have tested it with new volume after fixing the node issue. The tps is back to 150 tps for file.

Comment 13 Xiubo Li 2018-02-24 01:50:24 UTC
(In reply to Shekhar Berry from comment #12)
> (In reply to Xiubo Li from comment #11)
> > Pranith, Shekhar,
> > 
> > The following are my test output:
> > 
> > Before changing of the Ring Buffer size(1M as default):
> > 
> > file:
> > [postgres@gprfs025 root]$ pgbench -c 10 -j 2 -t 4000 test
> > tps = 81.652434 (including connections establishing)
> > tps = 81.659641 (excluding connections establishing)
> > 
> > block:
> > bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
> > tps = 105.612957 (including connections establishing)
> > tps = 105.620169 (excluding connections establishing)
> > 
> > I'm not sure why the block's tps is higher than the file's when I first got
> > the machines, and I have test this many times were still the same.
> 
> Hi Xiubo,
> 
> That was due to a node getting disconnected due to network failure. I have
> tested it with new volume after fixing the node issue. The tps is back to
> 150 tps for file.

Hi Shekhar,

Okay, get it.

BTW, have ever test the block currently using my changes?

I have test it again today for both of them:

file:
[postgres@gprfs025 ~]$  pgbench -c 10 -j 2 -t 4000 test 
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 3000
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 4000
number of transactions actually processed: 40000/40000
tps = 65.080532 (including connections establishing)
tps = 65.085644 (excluding connections establishing)


block:
-bash-4.2$ pgbench -c 10 -j 2 -t 4000 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 4000
number of transactions actually processed: 40000/40000
tps = 494.927941 (including connections establishing)
tps = 495.094597 (excluding connections establishing)
-bash-4.2$ lsblk

For the kernel, 3.10.0-799.el7.x86_64 the default one doesn't include the RingBuffer resize patches.

I have cloned the RHEL git repository and it has already merged them, so we could update the kernel to a higher version of 799.

Comment 21 Prasanna Kumar Kalever 2018-03-21 17:00:03 UTC
Related Patch:
https://github.com/gluster/gluster-block/pull/60

Comment 29 Shekhar Berry 2018-07-10 09:26:30 UTC
With newer builds we see that Gluster Block is performing well for postgresDB compared to gluster file. So this BZ can be closed.

Comment 30 krishnaram Karthick 2018-07-11 14:25:04 UTC
Moving the bug to verified based on comment 29.

Comment 32 errata-xmlrpc 2018-09-12 09:25:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2691


Note You need to log in before you can comment on or make changes to this bug.