Bug 1184417

Summary: Segmentation fault in locks while disconnecting client
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: locksAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: bugs, joe, pkarampu
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 03:53:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xavi Hernandez 2015-01-21 10:42:16 UTC
Description of problem:

When a client that has a lock disconnects without releasing it and a clear-locks operation has been executed (after acquiring the lock), the glusterfsd process dies with a segmentation fault.

Version-Release number of selected component (if applicable): mainline


How reproducible:

Always in the described situation, but it's hard to produce it in normal circumstances. It can be obtained as a side effect of another bug.

Steps to Reproduce:
1. Checkout revision 63dc6e1942dffcddd99c5048a498ca00eead8baa (this revision is just before the patch that solves the bug on the ec side)
2. compile and install
3. glusterd
4. gluster volume create test disperse 3 redundancy 1 server:/bricks/test{1..3} force
5. gluster volume start test
6. mount -t glusterfs server:/test /gluster/test
7. gluster volume clear-locks test / kind all inode

The last step causes the crash on the bricks

Actual results:

Segmentation fault on bricks.

Expected results:

Nothing special or visible should happen.

Additional info:

This is a backtrace of a crash:

Program received signal SIGSEGV, Segmentation fault.
0x00007fc901daf145 in lkowner_unparse (buf_len=2176, 
    buf=0x1f7aff0 "00000000471a2536-320000005fa3a797-ff7f000018000000-3000000020a4a797-ff7f000060a3a797-ff7f", '0' <repeats 12 times>, "-", '0' <repeats 16 times>, "-0000000050a1fe01-0000000070d4fc01-0000000088b2f301-0000000049652e36-32", '0' <repeats 11 times>..., lkowner=0x7fff97a7a330) at lkowner.h:43
43                      sprintf (&buf[j], "%02hhx", lkowner->data[i]);
(gdb) bt
#0  0x00007fc901daf145 in lkowner_unparse (buf_len=2176, 
    buf=0x1f7aff0 "00000000471a2536-320000005fa3a797-ff7f000018000000-3000000020a4a797-ff7f000060a3a797-ff7f", '0' <repeats 12 times>, "-", '0' <repeats 16 times>, "-0000000050a1fe01-0000000070d4fc01-0000000088b2f301-0000000049652e36-32", '0' <repeats 11 times>..., lkowner=0x7fff97a7a330) at lkowner.h:43
#1  lkowner_utoa (lkowner=lkowner@entry=0x7fff97a7a330) at common-utils.c:2177
#2  0x00007fc8f6e151c5 in pl_inodelk_log_cleanup (lock=0x7fff97a79e90) at inodelk.c:401
#3  pl_inodelk_client_cleanup (this=this@entry=0x1f85b30, ctx=ctx@entry=0x7fc8e8000e20) at inodelk.c:429
#4  0x00007fc8f6e12b82 in pl_client_disconnect_cbk (this=0x1f85b30, client=<optimized out>) at posix.c:2563
#5  0x00007fc901dea23d in gf_client_disconnect (client=client@entry=0x1feae10) at client_t.c:393
#6  0x00007fc8f5f60c28 in server_connection_cleanup (this=this@entry=0x1f8e4e0, client=client@entry=0x1feae10, flags=flags@entry=3) at server-helpers.c:353
#7  0x00007fc8f5f5babe in server_rpc_notify (rpc=<optimized out>, xl=0x1f8e4e0, event=<optimized out>, data=0x1fe8970) at server.c:531
#8  0x00007fc901b65d9f in rpcsvc_handle_disconnect (svc=0x1f9db20, trans=trans@entry=0x1fe8970) at rpcsvc.c:741
#9  0x00007fc901b65ec8 in rpcsvc_notify (trans=0x1fe8970, mydata=<optimized out>, event=<optimized out>, data=0x1fe8970) at rpcsvc.c:779
#10 0x00007fc901b68dc3 in rpc_transport_notify (this=this@entry=0x1fe8970, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x1fe8970) at rpc-transport.c:518
#11 0x00007fc8f7cb3962 in socket_event_poll_err (this=0x1fe8970) at socket.c:1161
#12 socket_event_handler (fd=<optimized out>, idx=6, data=data@entry=0x1fe8970, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2354
#13 0x00007fc901dec232 in event_dispatch_epoll_handler (i=<optimized out>, events=0x1f7a0f0, event_pool=0x1f59ac0) at event-epoll.c:384
#14 event_dispatch_epoll (event_pool=0x1f59ac0) at event-epoll.c:445
#15 0x0000000000404ec9 in main (argc=19, argv=0x7fff97a7bb58) at glusterfsd.c:2052

The cause is that in pl_inode_client_cleanup() (frame #3) it traverses the inodelk_lockers list of the client, but the clear-locks command has not removed the lock from this list before destroying it. This causes an access to garbage contents that finally fails.

Comment 1 Pranith Kumar K 2016-02-22 07:50:43 UTC
*** Bug 1307146 has been marked as a duplicate of this bug. ***

Comment 2 Amar Tumballi 2018-08-29 03:53:50 UTC
Lot of time since no activity on this bug. We have either fixed it already or it is mostly not critical anymore!

Please re-open the bug if the issue is burning for you, or you want to take the bug to closure with fixes.