Bug 765435 (GLUSTER-3703) - [glusterfs-3.3.0qa14] crash in afr_changelog_post_op_cbk
Summary: [glusterfs-3.3.0qa14] crash in afr_changelog_post_op_cbk
Keywords:
Status: CLOSED DUPLICATE of bug 766603
Alias: GLUSTER-3703
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-05 12:28 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:55 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-16 09:25:14 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
gluster nfs server logs (20.76 KB, text/x-log)
2011-10-05 09:28 UTC, M S Vishwanath Bhat
no flags Details

Description M S Vishwanath Bhat 2011-10-05 12:28:06 UTC
Create a volume with tcp,rdma transport type. mounted via nfs client in another machine and started running the dbench. Now nfs server crashed with the folowing back trace

Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff57c75000
Core was generated by `/usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaaad0f12b in afr_changelog_post_op_cbk (frame=0x2acb959d7990, cookie=0x2acb95733028, this=0x10e2e2b0, op_ret=0, op_errno=22, xattr=0x136cd670) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:347
347                     call_count = --local->call_count;
(gdb) bt
#0  0x00002aaaaad0f12b in afr_changelog_post_op_cbk (frame=0x2acb959d7990, cookie=0x2acb95733028, this=0x10e2e2b0, op_ret=0, op_errno=22, xattr=0x136cd670) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:347
#1  0x00002aaaaaac501a in client3_1_xattrop_cbk (req=0x2aaab1750730, iov=0x2aaab1750770, count=1, myframe=0x2acb95733028) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425
#2  0x00002acb94a8e25e in rpc_clnt_handle_reply (clnt=0x124a9cd0, pollin=0x143b95e0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
#3  0x00002acb94a8e586 in rpc_clnt_notify (trans=0x124a57f0, mydata=0x124a9d00, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x143b95e0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:902
#4  0x00002acb94a8a9f3 in rpc_transport_notify (this=0x124a57f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x143b95e0) at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#5  0x00002aaaade75044 in rdma_pollin_notify (peer=0x124a5c18, post=0x12011dc0) at ../../../../../rpc/rpc-transport/rdma/src/rdma.c:3085
#6  0x00002aaaade7538e in rdma_recv_reply (peer=0x124a5c18, post=0x12011dc0) at ../../../../../rpc/rpc-transport/rdma/src/rdma.c:3172
#7  0x00002aaaade756ab in rdma_process_recv (peer=0x124a5c18, wc=0x4421e0d0) at ../../../../../rpc/rpc-transport/rdma/src/rdma.c:3262
#8  0x00002aaaade7593e in rdma_recv_completion_proc (data=0x10e345e0) at ../../../../../rpc/rpc-transport/rdma/src/rdma.c:3347
#9  0x000000328420673d in start_thread () from /lib64/libpthread.so.0
#10 0x0000003283ad40cd in clone () from /lib64/libc.so.6
(gdb) f 1
#1  0x00002aaaaaac501a in client3_1_xattrop_cbk (req=0x2aaab1750730, iov=0x2aaab1750770, count=1, myframe=0x2acb95733028) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425
1425            STACK_UNWIND_STRICT (xattrop, frame, op_ret,
(gdb) f 2
#2  0x00002acb94a8e25e in rpc_clnt_handle_reply (clnt=0x124a9cd0, pollin=0x143b95e0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
789             req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame);


I have attached the client log. core file is too big to attach. I have archived it.

Comment 1 Pranith Kumar K 2011-10-07 08:25:22 UTC
What is the version of the QA release for the crash?.

Comment 2 M S Vishwanath Bhat 2011-10-07 08:32:34 UTC
(In reply to comment #1)
> What is the version of the QA release for the crash?.

It's 3.3.0qa14

Comment 3 Pranith Kumar K 2011-10-19 10:30:03 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > What is the version of the QA release for the crash?.
> 
> It's 3.3.0qa14

Vishwa,
      Did you get a chance to verify if it is due to the rdma corruption bug?

Pranith

Comment 4 Harshavardhana 2011-11-15 21:48:37 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > What is the version of the QA release for the crash?.
> > 
> > It's 3.3.0qa14
> 
> Vishwa,
>
why is here op_ret set to '-1' for an EINVAL? doesn't it means this can be happening from an incorrectly sent return value?.

Comment 5 Pranith Kumar K 2011-11-23 13:06:36 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > (In reply to comment #1)
> > > > What is the version of the QA release for the crash?.
> > > 
> > > It's 3.3.0qa14
> > 
> > Vishwa,
> >
> why is here op_ret set to '-1' for an EINVAL? doesn't it means this can be
> happening from an incorrectly sent return value?.

Where do you see the op_ret -1?. We should not look at the op_errno unless op_ret is set to -1.

Pranith

Comment 6 Pranith Kumar K 2011-11-23 13:09:38 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > What is the version of the QA release for the crash?.
> > 
> > It's 3.3.0qa14
> 
> Vishwa,
>       Did you get a chance to verify if it is due to the rdma corruption bug?
> 
> Pranith

Vishwa,
         I am moving the bug to INFO-REQUESTED

Pranith

Comment 7 Pranith Kumar K 2011-12-16 09:25:14 UTC

*** This bug has been marked as a duplicate of bug 766603 ***


Note You need to log in before you can comment on or make changes to this bug.