Bug 1228553 - NFS hung at the access call
Summary: NFS hung at the access call
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-nfs
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Soumya Koduri
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On: 1306930
Blocks: 1223636
TreeView+ depends on / blocked
 
Reported: 2015-06-05 07:42 UTC by Bhaskarakiran
Modified: 2019-07-23 04:56 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-23 04:56:48 UTC
Embargoed:


Attachments (Terms of Use)
packet trace (4.43 MB, application/zip)
2015-06-05 07:42 UTC, Bhaskarakiran
no flags Details

Description Bhaskarakiran 2015-06-05 07:42:24 UTC
Created attachment 1035053 [details]
packet trace

Description of problem:
=======================
Mount doesn't work on the client. packet trace shows hung at access call.
while the volume can be mounted from other server, the server in question doesn't work. Was doing file creations with dd on a disperse volume and listing them with 'ls -l'

Version-Release number of selected component (if applicable):
==============================================================



How reproducible:
=================
Often

Steps to Reproduce:
===================
Mount a distribute disperse volume and create files in 1000's. Do the listing of them from another terminal. 

Actual results:
===============
Hang

Expected results:


Additional info:
================
packet trace will be attached. sosreport will be copied to rhsqe sosreports folder

Comment 2 Bhaskarakiran 2015-06-05 07:49:19 UTC
Version-Release number of selected component (if applicable):
==============================================================

[root@interstellar ~]# gluster --version
glusterfs 3.7.0 built on Jun  1 2015 07:14:51
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@interstellar ~]#

Comment 3 Niels de Vos 2015-06-05 08:21:49 UTC
(In reply to Bhaskarakiran from comment #0)
> Steps to Reproduce:
> ===================
> Mount a distribute disperse volume and create files in 1000's. Do the
> listing of them from another terminal. 

Does it only happen with "distribute disperse", or does it also happen with a single-brick volume?

> Actual results:
> ===============
> Hang

Does the "hang" resolve itself after a while? Is there any progress in the "ls" (with strace or similar) or any messages in the nfs.log? What options to "ls" do you (or a shell alias) pass?

For next time, please do not zip packet captures, but use plain gzip instead. Wireshark can read gzip-compressed files, not .zip ones.

The packet capture contains extremely little NFS packets. The NFS-calls that involve accessing the volume seem to trigger *many* PORTBYBRICK calls (to GlusterD) and those return an error, for example:

$ tshark -r opt/nfs-hang.pcap -V 'frame.number == 9004'
Frame 9004: 156 bytes on wire (1248 bits), 156 bytes captured (1248 bits)
Remote Procedure Call, Type:Call XID:0x002208fa
Gluster Portmap
    [Program Version: 1]
    [Gluster Portmap: PORTBYBRICK (1)]
    Brick: /rhs/brick2/b8


$ tshark -r opt/nfs-hang.pcap -V 'frame.number == 9005'
Frame 9005: 112 bytes on wire (896 bits), 112 bytes captured (896 bits)
Remote Procedure Call, Type:Reply XID:0x002208fa
Gluster Portmap
    [Program Version: 1]
    [Gluster Portmap: PORTBYBRICK (1)]
    Return value: -1
    Errno: 0 (Success)
    Status: 0
    Port: 0


Please check if your bricks are available and responding to other clients. If you have a look at the logs, I guess you will get an understanding what could be wrong.

Comment 4 Bhaskarakiran 2015-06-11 07:19:41 UTC
Bricks are available and if the same volume is mounted from another client, it works. client where the hang is seen doesn't work.

Comment 10 Soumya Koduri 2016-01-29 12:01:21 UTC
This could be related to throttling issue which we regularly run into in case of lareger workloads.

Comment 11 Soumya Koduri 2016-03-07 10:46:31 UTC
Similar issue is being debugged and triaged in the recent past as part of bug1306930.


Note You need to log in before you can comment on or make changes to this bug.