Description of problem: Slow performance for small file work-load on arbiter volume Version-Release number of selected component (if applicable): RHGS 3.2 How reproducible: Only reproducible in Cu environment Actual results: Time taken for 60 MB folder is taking around 5-6 mins. Directory is consist of around 3K files & 600 directory. Expected results: Write speed should come around 1-2 mins as checked with my test machine Additional info: - Gluster nodes are vm deployed on hyper-v infra-structure. - Back-end storage for volume is from Netapp exporting through SMB to hyper-v - Volume type tested with arbiter only (1 x (2 + 1))
There are a few forgets in the profile, can you please execute the following command and see if that helps: # gluster volume set <VOLNAME> network.inode-lru-limit 200000 Restart the volume or force start the volume. Also, run the workload after this command and collect the profile info: # gluster vol profile <VOLNAME> start # gluster vol profile <VOLNAME> info clear ..Workload.. # gluster vol profile <VOLNAME> info
The upcalls have reduced, as in the server profile, the ammount spent in inodelk and lookup is the highest. I m not sure if we can reduce inodelks. WRT lookup, one reason why it could be happening is that, the afr can return stat from different subvol for each fop on the same file, thus invalidating the cache in md-cache. Ravi, if this reproducible on local setup, can we try making changes in afr to choose the poststat from readsubvol/ any particular subvol?
(In reply to Poornima G from comment #13) > The upcalls have reduced, as in the server profile, the ammount spent in > inodelk and lookup is the highest. I m not sure if we can reduce inodelks. > WRT lookup, one reason why it could be happening is that, the afr can return > stat from different subvol for each fop on the same file, thus invalidating > the cache in md-cache. > Since the profile info is from the bricks, lookup having high latency means it is not related to AFR right (and possibly the syscall itself taking more time)? > Ravi, if this reproducible on local setup, can we try making changes in afr > to choose the poststat from readsubvol/ any particular subvol? No, on my VMs, I also get under 1 minute (Tried on 3.8.4-18.6): [root@vm4 fuse_mnt]# time cp -pr /usr/share/doc/ . real 1m25.356s user 0m0.108s sys 0m1.572s
(In reply to Ravishankar N from comment #14) > (In reply to Poornima G from comment #13) > > The upcalls have reduced, as in the server profile, the ammount spent in > > inodelk and lookup is the highest. I m not sure if we can reduce inodelks. > > WRT lookup, one reason why it could be happening is that, the afr can return > > stat from different subvol for each fop on the same file, thus invalidating > > the cache in md-cache. > > > > > Since the profile info is from the bricks, lookup having high latency means > it is not related to AFR right (and possibly the syscall itself taking more > time)? I meant that based on the number of lookups per create. So the lookups are high in number and hence consume significant amount of time. > > > Ravi, if this reproducible on local setup, can we try making changes in afr > > to choose the poststat from readsubvol/ any particular subvol? > > No, on my VMs, I also get under 1 minute (Tried on 3.8.4-18.6): > [root@vm4 fuse_mnt]# time cp -pr /usr/share/doc/ . > > real 1m25.356s > user 0m0.108s > sys 0m1.572s Is this the same time that you get with dht(1*1) and afr(1*2 and 1*3?) as well?
(In reply to Poornima G from comment #15) My volume options: Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on network.ping-timeout: 10 server.event-threads: 4 client.event-threads: 4 cluster.server-quorum-type: server performance.read-ahead: on performance.open-behind: on performance.io-cache: on diagnostics.brick-log-level: INFO performance.write-behind: on performance.strict-o-direct: off network.remote-dio: disable features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.md-cache-timeout: 600 performance.cache-samba-metadata: on performance.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on ============================== Time taken for cp -pr /usr/share/doc/ : 1) Arbiter real 1m22.787s user 0m0.120s sys 0m0.731s 2)Replica 3: real 1m24.424s user 0m0.142s sys 0m0.723s 3)Replica 2: real 0m56.885s user 0m0.136s sys 0m0.721s 4) Plain distribute (1 brick): real 0m25.744s user 0m0.109s sys 0m1.504s ============================== In summary there is virtually no difference in the arbiter and replica 3 numbers, but while comparing it with plain distribute, we see that it is around 3 times more than plain distribute. FWIW, even in 1 brick distribute volume, we see a lot of lookups (9148) when compared to the no. of creates/mkdirs. %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3675 RELEASE 0.00 0.00 us 0.00 us 0.00 us 3 RELEASEDIR 0.00 102.00 us 102.00 us 102.00 us 1 GETXATTR 0.00 152.00 us 136.00 us 176.00 us 3 SYMLINK 0.12 495.00 us 31.00 us 1648.00 us 82 STATFS 0.40 241.40 us 57.00 us 715.00 us 567 SETXATTR 0.69 417.58 us 131.00 us 3425.00 us 567 MKDIR 3.24 302.22 us 18.00 us 13118.00 us 3675 FLUSH 3.60 322.32 us 46.00 us 18089.00 us 3823 WRITE 4.81 194.13 us 17.00 us 3397.00 us 8490 ENTRYLK 5.97 556.13 us 111.00 us 279574.00 us 3675 CREATE 9.02 414.04 us 17.00 us 11138.00 us 7462 FINODELK 9.83 349.81 us 36.00 us 8216.00 us 9626 SETATTR 15.34 692.51 us 74.00 us 26309.00 us 7590 FXATTROP 15.74 589.44 us 48.00 us 14496.00 us 9148 LOOKUP 31.24 403.47 us 15.00 us 14432.00 us 26522 INODELK 0.00 0.00 us 0.00 us 0.00 us 1 UPCALL 0.00 0.00 us 0.00 us 0.00 us 1 CI_IATT