Bug 1184417 - Segmentation fault in locks while disconnecting client
Summary: Segmentation fault in locks while disconnecting client
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: locks
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
: 1307146 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-21 10:42 UTC by Xavi Hernandez
Modified: 2018-08-29 03:53 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-08-29 03:53:50 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Xavi Hernandez 2015-01-21 10:42:16 UTC
Description of problem:

When a client that has a lock disconnects without releasing it and a clear-locks operation has been executed (after acquiring the lock), the glusterfsd process dies with a segmentation fault.

Version-Release number of selected component (if applicable): mainline


How reproducible:

Always in the described situation, but it's hard to produce it in normal circumstances. It can be obtained as a side effect of another bug.

Steps to Reproduce:
1. Checkout revision 63dc6e1942dffcddd99c5048a498ca00eead8baa (this revision is just before the patch that solves the bug on the ec side)
2. compile and install
3. glusterd
4. gluster volume create test disperse 3 redundancy 1 server:/bricks/test{1..3} force
5. gluster volume start test
6. mount -t glusterfs server:/test /gluster/test
7. gluster volume clear-locks test / kind all inode

The last step causes the crash on the bricks

Actual results:

Segmentation fault on bricks.

Expected results:

Nothing special or visible should happen.

Additional info:

This is a backtrace of a crash:

Program received signal SIGSEGV, Segmentation fault.
0x00007fc901daf145 in lkowner_unparse (buf_len=2176, 
    buf=0x1f7aff0 "00000000471a2536-320000005fa3a797-ff7f000018000000-3000000020a4a797-ff7f000060a3a797-ff7f", '0' <repeats 12 times>, "-", '0' <repeats 16 times>, "-0000000050a1fe01-0000000070d4fc01-0000000088b2f301-0000000049652e36-32", '0' <repeats 11 times>..., lkowner=0x7fff97a7a330) at lkowner.h:43
43                      sprintf (&buf[j], "%02hhx", lkowner->data[i]);
(gdb) bt
#0  0x00007fc901daf145 in lkowner_unparse (buf_len=2176, 
    buf=0x1f7aff0 "00000000471a2536-320000005fa3a797-ff7f000018000000-3000000020a4a797-ff7f000060a3a797-ff7f", '0' <repeats 12 times>, "-", '0' <repeats 16 times>, "-0000000050a1fe01-0000000070d4fc01-0000000088b2f301-0000000049652e36-32", '0' <repeats 11 times>..., lkowner=0x7fff97a7a330) at lkowner.h:43
#1  lkowner_utoa (lkowner=lkowner@entry=0x7fff97a7a330) at common-utils.c:2177
#2  0x00007fc8f6e151c5 in pl_inodelk_log_cleanup (lock=0x7fff97a79e90) at inodelk.c:401
#3  pl_inodelk_client_cleanup (this=this@entry=0x1f85b30, ctx=ctx@entry=0x7fc8e8000e20) at inodelk.c:429
#4  0x00007fc8f6e12b82 in pl_client_disconnect_cbk (this=0x1f85b30, client=<optimized out>) at posix.c:2563
#5  0x00007fc901dea23d in gf_client_disconnect (client=client@entry=0x1feae10) at client_t.c:393
#6  0x00007fc8f5f60c28 in server_connection_cleanup (this=this@entry=0x1f8e4e0, client=client@entry=0x1feae10, flags=flags@entry=3) at server-helpers.c:353
#7  0x00007fc8f5f5babe in server_rpc_notify (rpc=<optimized out>, xl=0x1f8e4e0, event=<optimized out>, data=0x1fe8970) at server.c:531
#8  0x00007fc901b65d9f in rpcsvc_handle_disconnect (svc=0x1f9db20, trans=trans@entry=0x1fe8970) at rpcsvc.c:741
#9  0x00007fc901b65ec8 in rpcsvc_notify (trans=0x1fe8970, mydata=<optimized out>, event=<optimized out>, data=0x1fe8970) at rpcsvc.c:779
#10 0x00007fc901b68dc3 in rpc_transport_notify (this=this@entry=0x1fe8970, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x1fe8970) at rpc-transport.c:518
#11 0x00007fc8f7cb3962 in socket_event_poll_err (this=0x1fe8970) at socket.c:1161
#12 socket_event_handler (fd=<optimized out>, idx=6, data=data@entry=0x1fe8970, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2354
#13 0x00007fc901dec232 in event_dispatch_epoll_handler (i=<optimized out>, events=0x1f7a0f0, event_pool=0x1f59ac0) at event-epoll.c:384
#14 event_dispatch_epoll (event_pool=0x1f59ac0) at event-epoll.c:445
#15 0x0000000000404ec9 in main (argc=19, argv=0x7fff97a7bb58) at glusterfsd.c:2052

The cause is that in pl_inode_client_cleanup() (frame #3) it traverses the inodelk_lockers list of the client, but the clear-locks command has not removed the lock from this list before destroying it. This causes an access to garbage contents that finally fails.

Comment 1 Pranith Kumar K 2016-02-22 07:50:43 UTC
*** Bug 1307146 has been marked as a duplicate of this bug. ***

Comment 2 Amar Tumballi 2018-08-29 03:53:50 UTC
Lot of time since no activity on this bug. We have either fixed it already or it is mostly not critical anymore!

Please re-open the bug if the issue is burning for you, or you want to take the bug to closure with fixes.


Note You need to log in before you can comment on or make changes to this bug.