Bug 825197 - ping_pong hangs on nfs mount
ping_pong hangs on nfs mount
Status: CLOSED NOTABUG
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
3.3-beta
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Vinayaga Raman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-25 06:48 EDT by Shwetha Panduranga
Modified: 2014-03-30 21:29 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-30 03:09:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shwetha Panduranga 2012-05-25 06:48:42 EDT
Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa43

How reproducible:
-----------------
often

Steps to Reproduce:
-------------------
1.Create a replicate volume with 3 bricks 
2.create 6 nfs mounts 
3.start executing "ping_pong file1 7" on each nfs mount. 
  
Actual results:
---------------
ping_pong hangs on each mount when we start executing ping_pong on the mounts.

Expected results:
-----------------
ping_pong should run successfully.
Comment 1 Krishna Srinivas 2012-05-26 10:35:57 EDT
There seems to be mem leak in NLM. The nfs process got killed after a while. In your setup was nfs process still alive? did you check? Is this hang reproducible in your setup without replicate?
Comment 2 Shwetha Panduranga 2012-05-28 01:50:53 EDT
ping_pong on a file hangs on plain distribute volume also. 

Valgrind logs:-
-------------
==7014==    Use --log-fd=<number> to select an alternative log fd.
==7014== Warning: invalid file descriptor 1017 in syscall close()
==7014== Warning: invalid file descriptor 1018 in syscall close()
==7006== Warning: invalid file descriptor -1 in syscall close()
==7006== Warning: invalid file descriptor -1 in syscall close()
==7006== Warning: invalid file descriptor -1 in syscall close()
==7006== Thread 7:
==7006== Syscall param write(buf) points to uninitialised byte(s)
==7006==    at 0x36386D846D: ??? (in /lib64/libc-2.12.so)
==7006==    by 0x363870EF0A: writetcp (in /lib64/libc-2.12.so)
==7006==    by 0x363871592D: xdrrec_endofrecord (in /lib64/libc-2.12.so)
==7006==    by 0x363870ECF3: clnttcp_call (in /lib64/libc-2.12.so)
==7006==    by 0x981DF2D: nsm_monitor (nlm4.c:551)
==7006==    by 0x3638A077F0: start_thread (in /lib64/libpthread-2.12.so)
==7006==    by 0xCA266FF: ???
==7006==  Address 0x671acd8 is 88 bytes inside a block of size 8,004 alloc'd
==7006==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==7006==    by 0x36387151CD: xdrrec_create (in /lib64/libc-2.12.so)
==7006==    by 0x363870EA42: clnttcp_create (in /lib64/libc-2.12.so)
==7006==    by 0x363870D953: clnt_create (in /lib64/libc-2.12.so)
==7006==    by 0x981DE6F: nsm_monitor (nlm4.c:543)
==7006==    by 0x3638A077F0: start_thread (in /lib64/libpthread-2.12.so)
==7006==    by 0xCA266FF: ???
==7006==
Comment 3 Krishna Srinivas 2012-05-28 04:00:00 EDT
In your setup was nfs process still alive when ping_pong hangs?
Comment 4 Rajesh 2012-05-28 06:33:08 EDT
yes, the nfs process as well as the brick(s) are alive and listening (gdb bt showed them at epoll_wait). wireshark on one of the clients showed NLM_BLOCKED as the last reply from server.

I tried the same with 6 mounts on personal vm and local machine being the server. it worked fine. I suspect network issue, but ping-pong on fuse mounts contradict the same. need further investigation.
Comment 5 Krishna Srinivas 2012-07-30 03:09:00 EDT
ping_pong was being run on a client machine which was behind NAT. For locking to work fine the client machine's NLM service needs to be reachable by server machine's NLM service.

Note You need to log in before you can comment on or make changes to this bug.