Bug 800412 - [glusterfs-3.3.0qa25]: glusterfs server crashed while clearing the locks
Summary: [glusterfs-3.3.0qa25]: glusterfs server crashed while clearing the locks
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: locks
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-06 13:04 UTC by Raghavendra Bhat
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:54:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-03-06 13:04:17 UTC
Description of problem:
2x2 distributed replicate volume with 1 fuse and 1 nfs client.
There was blocked locks on the server. So tried to clear-those locks using the clear-locks command. glusterfs server crashed with the below backtrace.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.10.1.11.130.export-'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f8c1cd0482c in is_same_lkowner (l1=0x490, l2=0x7f8be00c2810) at ../../../../../libglusterfs/src/lkowner.h:89
89              return ((l1->len == l2->len) && !memcmp(l1->data, l2->data, l1->len));
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x00007f8c1cd0482c in is_same_lkowner (l1=0x490, l2=0x7f8be00c2810) at ../../../../../libglusterfs/src/lkowner.h:89
#1  0x00007f8c1cd04a6f in same_inodelk_owner (l1=0xfffffffffffffff0, l2=0x7f8be00c2370)
    at ../../../../../xlators/features/locks/src/inodelk.c:126
#2  0x00007f8c1cd04ca5 in __owner_has_lock (dom=0x7f8c00010ff0, newlock=0x7f8be00c2370)
    at ../../../../../xlators/features/locks/src/inodelk.c:188
#3  0x00007f8c1cd04e5c in __lock_inodelk (this=0x2189a40, pl_inode=0x7f8c0000e260, lock=0x7f8be00c2370, can_block=0, dom=0x7f8c00010ff0)
    at ../../../../../xlators/features/locks/src/inodelk.c:227
#4  0x00007f8c1cd05ca2 in pl_inode_setlk (this=0x2189a40, pl_inode=0x7f8c0000e260, lock=0x7f8be00c2370, can_block=0, dom=0x7f8c00010ff0)
    at ../../../../../xlators/features/locks/src/inodelk.c:472
#5  0x00007f8c1cd0644a in pl_common_inodelk (frame=0x7f8c202e8c78, this=0x2189a40, volume=0x2399190 "mirror-replicate-0", 
    inode=0x7f8c16c8d3f4, cmd=6, flock=0x7f8c1ffbb138, loc=0x0, fd=0x21b4cf4) at ../../../../../xlators/features/locks/src/inodelk.c:621
#6  0x00007f8c1cd067fc in pl_finodelk (frame=0x7f8c202e8c78, this=0x2189a40, volume=0x2399190 "mirror-replicate-0", fd=0x21b4cf4, cmd=6, 
    flock=0x7f8c1ffbb138) at ../../../../../xlators/features/locks/src/inodelk.c:674
#7  0x00007f8c1cae93d9 in iot_finodelk_wrapper (frame=0x7f8c2034620c, this=0x218ac70, volume=0x2399190 "mirror-replicate-0", fd=0x21b4cf4, 
    cmd=6, lock=0x7f8c1ffbb138) at ../../../../../xlators/performance/io-threads/src/io-threads.c:2033
#8  0x00007f8c214ab002 in call_resume_wind (stub=0x7f8c1ffbb0e8) at ../../../libglusterfs/src/call-stub.c:2430
#9  0x00007f8c214b22d0 in call_resume (stub=0x7f8c1ffbb0e8) at ../../../libglusterfs/src/call-stub.c:3938
#10 0x00007f8c1cadb8cd in iot_worker (data=0x21a24d0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:138
#11 0x00000034c2a077e1 in start_thread () from /lib64/libpthread.so.0
#12 0x00000034c22e577d in clone () from /lib64/libc.so.6
(gdb)  f 0
#0  0x00007f8c1cd0482c in is_same_lkowner (l1=0x490, l2=0x7f8be00c2810) at ../../../../../libglusterfs/src/lkowner.h:89
89              return ((l1->len == l2->len) && !memcmp(l1->data, l2->data, l1->len));
(gdb) p l1
$1 = (gf_lkowner_t *) 0x490
(gdb) p l2
$2 = (gf_lkowner_t *) 0x7f8be00c2810
(gdb) f 1
#1  0x00007f8c1cd04a6f in same_inodelk_owner (l1=0xfffffffffffffff0, l2=0x7f8be00c2370)
    at ../../../../../xlators/features/locks/src/inodelk.c:126
126             return (is_same_lkowner (&l1->owner, &l2->owner) &&
(gdb) p l1
$3 = (pl_inode_lock_t *) 0xfffffffffffffff0
(gdb) p l2
$4 = (pl_inode_lock_t *) 0x7f8be00c2370
(gdb) f 2
#2  0x00007f8c1cd04ca5 in __owner_has_lock (dom=0x7f8c00010ff0, newlock=0x7f8be00c2370)
    at ../../../../../xlators/features/locks/src/inodelk.c:188
188                     if (same_inodelk_owner (lock, newlock))
(gdb) p lock
$5 = (pl_inode_lock_t *) 0xfffffffffffffff0
(gdb) p newlock
$6 = (pl_inode_lock_t *) 0x7f8be00c2370
(gdb) l
183                     if (same_inodelk_owner (lock, newlock))
184                             return 1;
185             }
186
187             list_for_each_entry (lock, &dom->blocked_inodelks, blocked_locks) {
188                     if (same_inodelk_owner (lock, newlock))
189                             return 1;
190             }
191
192             return 0;
(gdb) 



Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:
1. do clear-locks on a volume where inodelks/entrylks are blocked
2.
3.
  
Actual results:
glusterfs server crashed.

Expected results:

glusterfs server should not crash.

Additional info:

statedump which shows the active and blocked locks


[xlator.features.locks.mirror-locks.inode]
path=/run6686/Bonnie.6806
mandatory=0
inodelk-count=47
lock-dump.domain.domain=mirror-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=154664960, len=131072, pid = 18446744073709551615, owner=48ffff1cff7f0000, transport=0x
2251120, , granted at Thu Mar  1 15:14:48 2012

inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551615, owner=6cffff1cff7f0000, transport=0x2251120, , blocked at Tue Mar  6 06:43:06 2012

inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551615, owner=ffffff1cff7f0000, transport=0x2251120, , blocked at Tue Mar  6 06:43:06 2012

inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551615, owner=ffffff1cff7f0000, transport=0x2251120, , blocked at Tue Mar  6 06:43:06 2012


crashed process log:

fs/3.3.0qa25/xlator/protocol/server.so(server_finodelk_cbk+0x25e) [0x7f8c1c269f4a]))) 0-: Reply submission failed
[2012-03-06 06:45:45.469157] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of raghu.144machine.com-
20659-2012/03/06-06:24:39:543571-mirror-client-0-0
[2012-03-06 06:45:45.469186] I [socket.c:2377:socket_submit_reply] 0-tcp.mirror-server: not connected (priv->connected = 255)
[2012-03-06 06:45:45.469204] E [rpcsvc.c:1078:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x20x, Program: GlusterFS 3
.3.0qa25, ProgVers: 330, Proc: 30) to rpc-transport (tcp.mirror-server)
[2012-03-06 06:45:45.469247] E [server.c:166:server_submit_reply] (-->/usr/local/lib/libglusterfs.so.0(default_finodelk_cbk+0x14d) [0x7f8c2148
a7de] (-->/usr/local/lib/glusterfs/3.3.0qa25/xlator/debug/io-stats.so(io_stats_finodelk_cbk+0x23a) [0x7f8c1c498f2b] (-->/usr/local/lib/gluster
fs/3.3.0qa25/xlator/protocol/server.so(server_finodelk_cbk+0x25e) [0x7f8c1c269f4a]))) 0-: Reply submission failed
[2012-03-06 06:45:45.469280] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of raghu.144machine.com-20673-2012/03/06-06:30:14:772869-mirror-client-0-0
[2012-03-06 06:45:45.469315] I [socket.c:2377:socket_submit_reply] 0-tcp.mirror-server: not connected (priv->connected = 255)
[2012-03-06 06:45:45.469332] E [rpcsvc.c:1078:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x19x, Program: GlusterFS 3.3.0qa25, ProgVers: 330, Proc: 30) to rpc-transport (tcp.mirror-server)
[2012-03-06 06:45:45.469375] E [server.c:166:server_submit_reply] (-->/usr/local/lib/libglusterfs.so.0(default_finodelk_cbk+0x14d) [0x7f8c2148a7de] (-->/usr/local/lib/glusterfs/3.3.0qa25/xlator/debug/io-stats.so(io_stats_finodelk_cbk+0x23a) [0x7f8c1c498f2b] (-->/usr/local/lib/glusterfs/3.3.0qa25/xlator/protocol/server.so(server_finodelk_cbk+0x25e) [0x7f8c1c269f4a]))) 0-: Reply submission failed
[2012-03-06 06:45:45.469404] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of raghu.144machine.com-20684-2012/03/06-06:35:49:964516-mirror-client-0-0
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-06 06:45:45
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa25
/lib64/libc.so.6[0x34c2232980]
/usr/local/lib/glusterfs/3.3.0qa25/xlator/features/locks.so(+0x1282c)[0x7f8c1cd0482c]
/usr/local/lib/glusterfs/3.3.0qa25/xlator/features/locks.so(+0x12a6f)[0x7f8c1cd04a6f]
/usr/local/lib/glusterfs/3.3.0qa25/xlator/features/locks.so(+0x12ca5)[0x7f8c1cd04ca5]
/usr/local/lib/glusterfs/3.3.0qa25/xlator/features/locks.so(+0x12e5c)[0x7f8c1cd04e5c]
/usr/local/lib/glusterfs/3.3.0qa25/xlator/features/locks.so(+0x13ca2)[0x7f8c1cd05ca2]
/usr/local/lib/glusterfs/3.3.0qa25/xlator/features/locks.so(pl_common_inodelk+0x413)[0x7f8c1cd0644a]

Comment 1 Amar Tumballi 2012-03-12 09:46:37 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 2 Anand Avati 2012-03-14 12:56:18 UTC
CHANGE: http://review.gluster.com/2878 (locks: Fixed incorrect list ptr manipulation in clearing entrylks) merged in master by Vijay Bellur (vijay)

Comment 3 Anand Avati 2012-03-14 17:58:41 UTC
CHANGE: http://review.gluster.com/2922 (afr: Corrected getxattr 'key' matching in case of clrlk cmd) merged in master by Vijay Bellur (vijay)

Comment 4 Raghavendra Bhat 2012-04-05 10:28:44 UTC
Tested with glusterfs-3.3.0qa33. clear-locks did not lead to crash and it cleared the blocked locks.

 gluster volume clear-locks mirror /playground/thread_file kind blocked inode
Volume clear-locks successful
mirror-locks: inode blocked locks=2 granted locks=0
mirror-locks: inode blocked locks=4 granted locks=0
mirror-locks: inode blocked locks=5 granted locks=0


Note You need to log in before you can comment on or make changes to this bug.