Created attachment 1035053 [details] packet trace Description of problem: ======================= Mount doesn't work on the client. packet trace shows hung at access call. while the volume can be mounted from other server, the server in question doesn't work. Was doing file creations with dd on a disperse volume and listing them with 'ls -l' Version-Release number of selected component (if applicable): ============================================================== How reproducible: ================= Often Steps to Reproduce: =================== Mount a distribute disperse volume and create files in 1000's. Do the listing of them from another terminal. Actual results: =============== Hang Expected results: Additional info: ================ packet trace will be attached. sosreport will be copied to rhsqe sosreports folder
Version-Release number of selected component (if applicable): ============================================================== [root@interstellar ~]# gluster --version glusterfs 3.7.0 built on Jun 1 2015 07:14:51 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@interstellar ~]#
(In reply to Bhaskarakiran from comment #0) > Steps to Reproduce: > =================== > Mount a distribute disperse volume and create files in 1000's. Do the > listing of them from another terminal. Does it only happen with "distribute disperse", or does it also happen with a single-brick volume? > Actual results: > =============== > Hang Does the "hang" resolve itself after a while? Is there any progress in the "ls" (with strace or similar) or any messages in the nfs.log? What options to "ls" do you (or a shell alias) pass? For next time, please do not zip packet captures, but use plain gzip instead. Wireshark can read gzip-compressed files, not .zip ones. The packet capture contains extremely little NFS packets. The NFS-calls that involve accessing the volume seem to trigger *many* PORTBYBRICK calls (to GlusterD) and those return an error, for example: $ tshark -r opt/nfs-hang.pcap -V 'frame.number == 9004' Frame 9004: 156 bytes on wire (1248 bits), 156 bytes captured (1248 bits) Remote Procedure Call, Type:Call XID:0x002208fa Gluster Portmap [Program Version: 1] [Gluster Portmap: PORTBYBRICK (1)] Brick: /rhs/brick2/b8 $ tshark -r opt/nfs-hang.pcap -V 'frame.number == 9005' Frame 9005: 112 bytes on wire (896 bits), 112 bytes captured (896 bits) Remote Procedure Call, Type:Reply XID:0x002208fa Gluster Portmap [Program Version: 1] [Gluster Portmap: PORTBYBRICK (1)] Return value: -1 Errno: 0 (Success) Status: 0 Port: 0 Please check if your bricks are available and responding to other clients. If you have a look at the logs, I guess you will get an understanding what could be wrong.
Bricks are available and if the same volume is mounted from another client, it works. client where the hang is seen doesn't work.
This could be related to throttling issue which we regularly run into in case of lareger workloads.
Similar issue is being debugged and triaged in the recent past as part of bug1306930.