Description of problem: Customer is facing a performance issue using NFS. Version-Release number of selected component (if applicable): upgraded to 3.8.4 glusterfs-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:22:48 2017 glusterfs-api-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:22:52 2017 glusterfs-cli-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:23:44 2017 glusterfs-client-xlators-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:22:52 2017 glusterfs-fuse-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:22:53 2017 glusterfs-geo-replication-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:23:49 2017 glusterfs-libs-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:22:44 2017 glusterfs-rdma-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:23:49 2017 glusterfs-server-3.8.4-18.4.el7rhgs.x86_64 Fri Jun 30 22:23:44 2017 nfs-ganesha-gluster-2.4.1-9.el7rhgs.x86_64 Fri Jun 30 22:23:49 2017 python-gluster-3.8.4-18.4.el7rhgs.noarch Fri Jun 30 22:23:49 2017 samba-vfs-glusterfs-4.4.6-5.el7rhgs.x86_64 Fri Jun 30 22:23:48 2017 vdsm-gluster-4.16.20-1.3.el7rhgs.noarch Sat Mar 26 23:29:33 2016 How reproducible: 48-72hs after a restart of the gluster service. Steps to Reproduce: 1. after about 3 days, run under dd command with different bs size. # dd if=/dev/zero of=/VOLUME_MNT/test-file bs={4-512}K count=1000 oflag=direct Actual results: performance is unstable and low. Expected results: performance should be better. Additional info: 1. Volumes are being accessed over pcs cluster through VIPs 2. No performance issue when using FUSE mount for another volume. 3. The packet capture of slow NFS did not show any Network issues, and network tests by iperf are good too.
The small writes are sync, the large writes are unstable. That explains the latency difference there. It's the client that determines this part, not the server, so we can't change that. Are they on different mountpoints, or using different test software settings? As for readdir latency, what is the setting for Dir_Max in the CACHEINODE section of the the Ganesha config?
It seems that the two most busy clients have been identified, and it is well possible that these clients have such a high resource usage (network, local disks, ...) that the other clients are affected (called "noisy neighbour"). Now, these clients are the Gluster/NFS servers, so there is an other hop between the NFS-client(s) that is not identified yet. It is important to know which NFS-clients connect to these servers. The "noisy neighbour" problem may not be an issue here in case a single NFS-server is doing all the work. It was mentioned that pacemaker is used for HA. Depending on the configuration, there may be one VIP that is used by the NFS-clients for mounting. In this case, a single Gluster/NFS server will receive all the NFS-requests from the NFS-clients. The Gluster/NFS server will then distribute the I/O to the bricks (like a proxy/gateway). Adding more VIPs to the pacemaker configuration (make sure the VIPs are assigned to different Gluster servers in the "everything is up" case), and use round-robin-DNS on the NFS-clients to connect to the NFS-servers should reduce the number of NFS-clients connected to a single NFS-server. My assumption is that the two busy clients are Gluster/NFS servers, and possibly the single(?) VIP has been moved from one Gluster/NFS server to an other. Almost all current I/O would happen from a single IP, the one that has the VIP for the NFS-clients to use. Please compare the above theory with the pacemaker configuration, and the relocating of the VIP. Check how the NFS-clients mount the storage, and see if setting up a DNS entry with multiple VIPs is possible for the customer. All of those VIPs should then point to different Gluster/NFS servers. Note that upon mounting the NFS-clients resolve a hostname (RR-DNS) to an IP-address. This IP-address will be used until the NFS-client unmounts. So when an RR-DNS is available with multiple VIPs, unmount as many NFS-clients as possible and remount them using the "rr-dns.host.name:/volume" notation.
Per comment above, setting NEEDINFO on jthottan