Bug 784187 - [54b8d503dd23e72ed3076988c52e689f3554ebc8]: glusterfs server crashed in pl_inode_get since fd was NULL
Summary: [54b8d503dd23e72ed3076988c52e689f3554ebc8]: glusterfs server crashed in pl_in...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: locks
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-01-24 07:29 UTC by Raghavendra Bhat
Modified: 2013-07-24 17:37 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:37:49 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3.0qa40
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-01-24 07:29:26 UTC
Description of problem:
2 replica pure replicate volume. 1 fuse client. Started running ping_pong on the client and, then enable quota, set limit on /, disabled write-behind. While tests were going on enabled client side io-threads. Then brought down a brick and started it, and the brick crashed in __mq_add_new_contribution_node (nice loc->parent was NULL as now parent need not be resolved because of gfid based backend changes and for that crash a bug has been filed). So disabled quota, brought down a brick and started volume force. glusterfs server crashed with the following backtrace.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.hyperspace.mnt-sda7'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320)
    at ../../../../../xlators/features/locks/src/posix.c:988
988	        pl_inode = pl_inode_get (this, fd->inode);
(gdb) bt
#0  0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320)
    at ../../../../../xlators/features/locks/src/posix.c:988
#1  0x00007f061f9c4e5b in iot_lk_wrapper (frame=0x7f06266825ac, this=0x67c370, fd=0x0, cmd=7, flock=0x7f0626355320)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:1114
#2  0x00007f06280ba2cc in call_resume_wind (stub=0x7f06263552d8) at ../../../libglusterfs/src/call-stub.c:2350
#3  0x00007f06280c1278 in call_resume (stub=0x7f06263552d8) at ../../../libglusterfs/src/call-stub.c:3853
#4  0x00007f061f9be7c0 in iot_worker (data=0x686ed0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:138
#5  0x00007f062782ed8c in start_thread (arg=0x7f0624146700) at pthread_create.c:304
#6  0x00007f062757a04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()
(gdb) info thr
  6 Thread 29621  do_sigwait (set=<value optimized out>, sig=0x7f0626353eb8)
    at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65
  5 Thread 29620  __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373
  4 Thread 29622  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 29635  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  2 Thread 29624  0x00007f06278374bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 29634  0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320)
    at ../../../../../xlators/features/locks/src/posix.c:988
(gdb) f 0
#0  0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320)
    at ../../../../../xlators/features/locks/src/posix.c:988
988	        pl_inode = pl_inode_get (this, fd->inode);
(gdb) p fd
$1 = (fd_t *) 0x0


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Raghavendra Bhat 2012-01-24 09:32:40 UTC
Running ping_pong on the fuse client, bringing a brick down and bringing it back up will crash the brick.

Comment 2 Anand Avati 2012-01-25 16:08:44 UTC
CHANGE: http://review.gluster.com/2684 (protocol/client: if the remote_fd is -1, then unwind instead of sending the call to server) merged in master by Anand Avati (avati)

Comment 3 Raghavendra Bhat 2012-05-08 13:08:39 UTC
Ran ping_pong, brought a brick down and then brought it up. No brick crashed. Checked with glusterfs-3.3.0qa40.


Note You need to log in before you can comment on or make changes to this bug.