Bug 815242 - locktests hangs on fuse mount
locktests hangs on fuse mount
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Anand Avati
:
Depends On:
Blocks: 817967
  Show dependency treegraph
 
Reported: 2012-04-23 04:00 EDT by Shwetha Panduranga
Modified: 2015-12-01 11:45 EST (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:33:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fuse mount log file (363.38 KB, text/x-log)
2012-04-23 04:00 EDT, Shwetha Panduranga
no flags Details

  None (edit)
Description Shwetha Panduranga 2012-04-23 04:00:19 EDT
Created attachment 579439 [details]
Fuse mount log file

Description of problem:
Following is the back trace of brick process

(gdb) info threads
  8 Thread 0x7f01171a6700 (LWP 27681)  0x0000003638a0dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
  7 Thread 0x7f01167a5700 (LWP 27682)  0x0000003638a0b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x7f0115da4700 (LWP 27683)  0x0000003638a0b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f0114f75700 (LWP 27686)  0x0000003638a0eccd in nanosleep () from /lib64/libpthread.so.0
  4 Thread 0x7f0114139700 (LWP 27705)  0x0000003638a0b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f010ec1a700 (LWP 27706)  0x0000003638a0b75b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f010eb19700 (LWP 27709)  0x0000003638a0b75b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7f01184ab700 (LWP 27680)  0x0000003638a0dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003638a0dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003638a09328 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x0000003638a091f7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f011892411e in inode_unref (inode=0x7f010dec80e0) at inode.c:448
#4  0x00007f011893abb6 in fd_destroy (fd=0x14a4aa8) at fd.c:525
#5  0x00007f011893acad in fd_unref (fd=0x14a4aa8) at fd.c:552
#6  0x00007f011893a4d6 in gf_fd_put (fdtable=0x146cc70, fd=417) at fd.c:354
#7  0x00007f010f3633e3 in server_release (req=0x7f010ee2d800) at server3_1-fops.c:3444
#8  0x00007f01186dd102 in rpcsvc_handle_rpc_call (svc=0x147c550, trans=0x14b6130, msg=0x17542b0) at rpcsvc.c:520
#9  0x00007f01186dd4a5 in rpcsvc_notify (trans=0x14b6130, mydata=0x147c550, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x17542b0) at rpcsvc.c:616
#10 0x00007f01186e2e84 in rpc_transport_notify (this=0x14b6130, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x17542b0) at rpc-transport.c:498
#11 0x00007f0115198280 in socket_event_poll_in (this=0x14b6130) at socket.c:1686
#12 0x00007f0115198804 in socket_event_handler (fd=13, idx=9, data=0x14b6130, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#13 0x00007f011893e664 in event_dispatch_epoll_handler (event_pool=0x14523a0, events=0x146b7f0, i=0) at event.c:794
#14 0x00007f011893e887 in event_dispatch_epoll (event_pool=0x14523a0) at event.c:856
#15 0x00007f011893ec12 in event_dispatch (event_pool=0x14523a0) at event.c:956
#16 0x0000000000408067 in main (argc=19, argv=0x7fff886ba2d8) at glusterfsd.c:1651
(gdb) t 8
[Switching to thread 8 (Thread 0x7f01171a6700 (LWP 27681))]#0  0x0000003638a0dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003638a0dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003638a09328 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x0000003638a091f7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f011892411e in inode_unref (inode=0x7f010dec80e0) at inode.c:448
#4  0x00007f011893abb6 in fd_destroy (fd=0x149e450) at fd.c:525
#5  0x00007f011893acad in fd_unref (fd=0x149e450) at fd.c:552
#6  0x00007f0118926a24 in inode_dump (inode=0x7f010dec80e0, prefix=0x7f01171a28d0 "conn.1.bound_xl./export1/dstore1.active.1") at inode.c:1601
#7  0x00007f0118926d28 in inode_table_dump (itable=0x1498860, prefix=0x7f01171a3940 "conn.1.bound_xl./export1/dstore1") at inode.c:1644
#8  0x00007f010f3480cd in server_inode (this=0x1479700) at server.c:482
#9  0x00007f0118945c61 in gf_proc_dump_xlator_info (top=0x1479700) at statedump.c:422
#10 0x00007f0118946644 in gf_proc_dump_info (signum=10) at statedump.c:668
#11 0x0000000000407736 in glusterfs_sigwaiter (arg=0x7fff886ba0d0) at glusterfsd.c:1389
#12 0x0000003638a077f1 in start_thread () from /lib64/libpthread.so.0
#13 0x00000036386e570d in clone () from /lib64/libc.so.6

Version-Release number of selected component (if applicable):
3.3.0qa37

How reproducible:
often

locktest.sh:-
------------
#!/bin/bash

while true; 
	do
	locktests -f ./lock_test_file -n 500
	done

Steps to Reproduce:
---------------------
1.create a replicate volume (1x4)
2.create 1 fuse,1 nfs mount (both mount from different machines)
3.execute "locktest.sh" on both fuse, nfs 
4.execute "kill -USR1 <pid_of_glusterfs_process"
  
Actual results:
locktest hung on fuse mount. 

Additional info:
--------------------
[04/23/12 - 18:19:41 root@APP-SERVER2 ~]# gluster volume info
 
Volume Name: dstore
Type: Replicate
Volume ID: 884c1aa1-a0b0-4b41-9c93-92620b15fda8
Status: Started
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: 192.168.2.35:/export1/dstore1
Brick2: 192.168.2.36:/export1/dstore1
Brick3: 192.168.2.35:/export2/dstore1
Brick4: 192.168.2.36:/export2/dstore1
Comment 1 Amar Tumballi 2012-04-23 07:24:00 EDT
Avati already sent a patch to fix this @ http://review.gluster.com/3210
Comment 2 Anand Avati 2012-04-24 12:44:46 EDT
CHANGE: http://review.gluster.com/3210 (statedump: fix deadlock during state dump of fds) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 3 Shwetha Panduranga 2012-05-14 02:02:08 EDT
Bug is fixed. Verified on 3.3.0qa41.

Note You need to log in before you can comment on or make changes to this bug.