Bug 1483849 - [GSS]Slow performance for small file work-load on arbiter volume
Summary: [GSS]Slow performance for small file work-load on arbiter volume
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: arbiter
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Karan Sandha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-22 06:54 UTC by Abhishek Kumar
Modified: 2020-12-14 09:38 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-15 10:20:59 UTC
Embargoed:


Attachments (Terms of Use)

Description Abhishek Kumar 2017-08-22 06:54:13 UTC
Description of problem:

Slow performance for small file work-load on arbiter volume

Version-Release number of selected component (if applicable):

RHGS 3.2

How reproducible:

Only reproducible in Cu environment 


Actual results:

Time taken for 60 MB folder is taking around 5-6 mins. Directory is consist of around 3K files & 600 directory.


Expected results:

Write speed should come around 1-2 mins as checked with my test machine

Additional info:

- Gluster nodes are vm deployed on hyper-v infra-structure.
- Back-end storage for volume is from Netapp exporting through SMB to hyper-v
- Volume type tested with arbiter only (1 x (2 + 1))

Comment 7 Poornima G 2017-08-24 05:37:03 UTC
There are a few forgets in the profile, can you please execute the following command and see if that helps:

# gluster volume set <VOLNAME> network.inode-lru-limit 200000
Restart the volume or force start the volume.

Also, run the workload after this command and collect the profile info:

# gluster vol profile <VOLNAME> start
# gluster vol profile <VOLNAME> info clear
..Workload..
# gluster vol profile <VOLNAME> info

Comment 13 Poornima G 2017-09-11 06:05:40 UTC
The upcalls have reduced, as in the server profile, the ammount spent in inodelk and lookup is the highest. I m not sure if we can reduce inodelks. WRT lookup, one reason why it could be happening is that, the afr can return stat from different subvol for each fop on the same file, thus invalidating the cache in md-cache.

Ravi, if this reproducible on local setup, can we try making changes in afr to choose the poststat from readsubvol/ any particular subvol?

Comment 14 Ravishankar N 2017-09-11 08:58:11 UTC
(In reply to Poornima G from comment #13)
> The upcalls have reduced, as in the server profile, the ammount spent in
> inodelk and lookup is the highest. I m not sure if we can reduce inodelks.
> WRT lookup, one reason why it could be happening is that, the afr can return
> stat from different subvol for each fop on the same file, thus invalidating
> the cache in md-cache.
> 


Since the profile info is from the bricks, lookup having high latency means it is not related to AFR right (and possibly the syscall itself taking more time)?

> Ravi, if this reproducible on local setup, can we try making changes in afr
> to choose the poststat from readsubvol/ any particular subvol?

No, on my VMs, I also get under 1 minute (Tried on 3.8.4-18.6):
[root@vm4 fuse_mnt]# time cp -pr /usr/share/doc/ .

real	1m25.356s
user	0m0.108s
sys	0m1.572s

Comment 15 Poornima G 2017-09-11 09:02:06 UTC
(In reply to Ravishankar N from comment #14)
> (In reply to Poornima G from comment #13)
> > The upcalls have reduced, as in the server profile, the ammount spent in
> > inodelk and lookup is the highest. I m not sure if we can reduce inodelks.
> > WRT lookup, one reason why it could be happening is that, the afr can return
> > stat from different subvol for each fop on the same file, thus invalidating
> > the cache in md-cache.
> > 
> 
> 
> Since the profile info is from the bricks, lookup having high latency means
> it is not related to AFR right (and possibly the syscall itself taking more
> time)?
I meant that based on the number of lookups per create. So the lookups are high in number and hence consume significant amount of time.

> 
> > Ravi, if this reproducible on local setup, can we try making changes in afr
> > to choose the poststat from readsubvol/ any particular subvol?
> 
> No, on my VMs, I also get under 1 minute (Tried on 3.8.4-18.6):
> [root@vm4 fuse_mnt]# time cp -pr /usr/share/doc/ .
> 
> real	1m25.356s
> user	0m0.108s
> sys	0m1.572s
Is this the same time that you get with dht(1*1) and afr(1*2 and 1*3?) as well?

Comment 16 Ravishankar N 2017-09-11 11:46:24 UTC
(In reply to Poornima G from comment #15)

My volume options:

Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
network.ping-timeout: 10
server.event-threads: 4
client.event-threads: 4
cluster.server-quorum-type: server
performance.read-ahead: on
performance.open-behind: on
performance.io-cache: on
diagnostics.brick-log-level: INFO
performance.write-behind: on
performance.strict-o-direct: off
network.remote-dio: disable
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
==============================
Time taken for cp -pr /usr/share/doc/ :
1) Arbiter
real	1m22.787s
user	0m0.120s
sys	0m0.731s

2)Replica 3:
real	1m24.424s
user	0m0.142s
sys	0m0.723s

3)Replica 2:
real	0m56.885s
user	0m0.136s
sys	0m0.721s

4) Plain distribute (1 brick):
real	0m25.744s
user	0m0.109s
sys	0m1.504s

==============================
In summary there is virtually no difference in the arbiter and replica 3 numbers, but while comparing it with plain distribute, we see that it is around 3 times  more than plain distribute.

FWIW, even in 1 brick distribute volume, we see a lot of lookups (9148) when compared to the no. of creates/mkdirs.

 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us           3675     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.00     102.00 us     102.00 us     102.00 us              1    GETXATTR
      0.00     152.00 us     136.00 us     176.00 us              3     SYMLINK
      0.12     495.00 us      31.00 us    1648.00 us             82      STATFS
      0.40     241.40 us      57.00 us     715.00 us            567    SETXATTR
      0.69     417.58 us     131.00 us    3425.00 us            567       MKDIR
      3.24     302.22 us      18.00 us   13118.00 us           3675       FLUSH
      3.60     322.32 us      46.00 us   18089.00 us           3823       WRITE
      4.81     194.13 us      17.00 us    3397.00 us           8490     ENTRYLK
      5.97     556.13 us     111.00 us  279574.00 us           3675      CREATE
      9.02     414.04 us      17.00 us   11138.00 us           7462    FINODELK
      9.83     349.81 us      36.00 us    8216.00 us           9626     SETATTR
     15.34     692.51 us      74.00 us   26309.00 us           7590    FXATTROP
     15.74     589.44 us      48.00 us   14496.00 us           9148      LOOKUP
     31.24     403.47 us      15.00 us   14432.00 us          26522     INODELK
      0.00       0.00 us       0.00 us       0.00 us              1      UPCALL
      0.00       0.00 us       0.00 us       0.00 us              1     CI_IATT


Note You need to log in before you can comment on or make changes to this bug.