Bug 766603 - [289c2902d6a81f7a5b6da04c24cc955bd5427178] client crashed with segfault at afr_changelog_post_op_cbk
Summary: [289c2902d6a81f7a5b6da04c24cc955bd5427178] client crashed with segfault at af...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
: GLUSTER-3703 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2011-12-12 12:27 UTC by Rahul C S
Modified: 2013-07-24 17:39 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:39:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 1f3a0dd4742a2fcd3215aee4a5e22125d7ea4f4d
Embargoed:


Attachments (Terms of Use)
logs of all bricks and client. (5.30 MB, application/x-compressed-tar)
2011-12-12 12:36 UTC, Rahul C S
no flags Details

Description Rahul C S 2011-12-12 12:27:20 UTC
Description of problem:

I got disconnects to the client and then after some saved frames unwind, I got this crash.

I was running dbench with some extra options and ran from a single client.

Core was generated by `/usr/local/sbin/glusterfs --volfile-id=vol --volfile-server=dagobah mount/'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f28ee9d6f50 in afr_changelog_post_op_cbk (frame=0x7f28f0fd0264, cookie=0x7f28f124b778, this=0x7f28dc78aeb0, op_ret=0, op_errno=22, 
    xattr=0x7f28dc3f58b0) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:347
347	                call_count = --local->call_count;
(gdb) bt
#0  0x00007f28ee9d6f50 in afr_changelog_post_op_cbk (frame=0x7f28f0fd0264, cookie=0x7f28f124b778, this=0x7f28dc78aeb0, op_ret=0, op_errno=22, 
    xattr=0x7f28dc3f58b0) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:347
#1  0x00007f28eec425e0 in client3_1_xattrop_cbk (req=0x7f28bbb599cc, iov=0x7f28bbb59a0c, count=1, myframe=0x7f28f124b778)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425
#2  0x00007f28f2a329c6 in rpc_clnt_handle_reply (clnt=0x7f28df162de0, pollin=0x7f28dc0e0f70) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
#3  0x00007f28f2a32d28 in rpc_clnt_notify (trans=0x7f28dd07e8c0, mydata=0x7f28df162e10, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f28dc0e0f70)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:908
#4  0x00007f28f2a2ee3d in rpc_transport_notify (this=0x7f28dd07e8c0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f28dc0e0f70)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#5  0x00007f28ef8a4359 in socket_event_poll_in (this=0x7f28dd07e8c0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675
#6  0x00007f28ef8a48cd in socket_event_handler (fd=168, idx=18, data=0x7f28dd07e8c0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790
#7  0x00007f28f2c834b9 in event_dispatch_epoll_handler (event_pool=0x231d2d0, events=0x2322a60, i=0) at ../../../libglusterfs/src/event.c:794
#8  0x00007f28f2c836d3 in event_dispatch_epoll (event_pool=0x231d2d0) at ../../../libglusterfs/src/event.c:856
#9  0x00007f28f2c83a45 in event_dispatch (event_pool=0x231d2d0) at ../../../libglusterfs/src/event.c:956
#10 0x0000000000407d83 in main (argc=4, argv=0x7ffffe95ca58) at ../../../glusterfsd/src/glusterfsd.c:1601
(gdb) f 1
#1  0x00007f28eec425e0 in client3_1_xattrop_cbk (req=0x7f28bbb599cc, iov=0x7f28bbb59a0c, count=1, myframe=0x7f28f124b778)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425
1425	        STACK_UNWIND_STRICT (xattrop, frame, op_ret,
(gdb) p *this
$3 = {name = 0x7f28dc515770 "vol-client-4", type = 0x7f28dc14a980 "protocol/client", next = 0x7f28dc78b820, prev = 0x7f28dc78cb00, 
  parents = 0x7f28dea2b4e0, children = 0x0, options = 0x7f28dc5b1e80, dlhandle = 0x2329420, fops = 0x7f28eee61460, cbks = 0x7f28eee61440, 
  dumpops = 0x7f28eee61700, volume_options = {next = 0x7f28dc14aa30, prev = 0x7f28de928e50}, fini = 0x7f28eec3a78e <fini>, init = 0x7f28eec3a5c5 <init>, 
  reconfigure = 0x7f28eec3a418 <reconfigure>, mem_acct_init = 0x7f28eec39fd3 <mem_acct_init>, notify = 0x7f28eec39c68 <notify>, loglevel = GF_LOG_NONE, 
  latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 45 times>}, ctx = 0x2305010, graph = 0x7f28de816d70, itable = 0x0, 
  init_succeeded = 1 '\001', private = 0x7f28df10c880, mem_acct = {num_types = 91, rec = 0x7f28de3cfea0}}

Logs:
[2011-12-12 15:51:38.051586] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l
ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c
lient-5: forced unwinding frame type(GlusterFS 3.1) op(RENAME(8)) called at 2011-12-12 15:51:37.657225
[2011-12-12 15:51:38.051605] W [client3_1-fops.c:2015:client3_1_rename_cbk] 17-vol-client-5: remote operation failed: Transport endpoint is not connected
[2011-12-12 15:51:38.051799] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l
ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c
lient-5: forced unwinding frame type(GlusterFS 3.1) op(RELEASEDIR(42)) called at 2011-12-12 15:51:37.694882
[2011-12-12 15:51:38.051838] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l
ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c
lient-5: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-12-12 15:51:37.695624
[2011-12-12 15:51:38.051865] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 17-vol-client-5: remote operation failed: Transport endpoint is not connected. Pa
th: /clients/client2/~dmtmp/COREL/GRAPH2.BAK
[2011-12-12 15:51:38.051959] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l
ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c
lient-5: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-12-12 15:51:37.697138
[2011-12-12 15:51:38.051977] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 17-vol-client-5: remote operation failed: Transport endpoint is not connected. Pa
th: /clients/client8
Version-Release number of selected component (if applicable):


How reproducible:
Hit it only once. Trying to reproduce.

Steps to Reproduce:
Though not exact steps, this is how i got the crash. 
1. Created a distributed replicate volume 3x2.
2. enabled geo-rep & quota
3. made numerous graph changes by doing stat-prefetch on/off
4. when the crash happened stat-prefetch was off.
5. ran dbench -s -F -S --stat-check 10
  
Actual results:
the client crashed after a brick had been crashed. Raising a different bug for it.

Expected results:
Should have completed dbench successfully.

Additional info:

Comment 1 Rahul C S 2011-12-12 12:36:41 UTC
Created attachment 545693 [details]
logs of all bricks and client.

Comment 2 Anand Avati 2011-12-13 11:40:56 UTC
CHANGE: http://review.gluster.com/783 (cluster/afr: Double the call count if transaction is for rename) merged in master by Vijay Bellur (vijay)

Comment 3 Pranith Kumar K 2011-12-16 09:25:14 UTC
*** Bug 765435 has been marked as a duplicate of this bug. ***

Comment 4 Rahul C S 2012-04-05 11:22:38 UTC
Crash not seen with same steps along with brick up/down. Tested against latest git head.


Note You need to log in before you can comment on or make changes to this bug.