Bug 761893 (GLUSTER-161)

Summary: unfs3 crashes on link system call by fileop
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: libglusterfsclientAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 2.0.4CC: gluster-bugs, lakshmipathi
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shehjar Tikoo 2009-07-23 08:17:52 UTC
In this configuration, replicate sits beneath distribute. Now when libglusterfsclient sends the link fop down the xlator tree, in dht_link in distribute, the function dht_linkfile_create is being called. This function in turn calls mknod for the new link file. BUT, since the link file does not exist there will be no inode associated with it. This NULL inode is passed down the stack in the local structures for all xlators.

When the callback for this mknod comes back, the replicate's translator sees a NULL inode in its mknod call back and crashes on dereferencing a NULL ptr.

Comment 1 Shehjar Tikoo 2009-07-23 08:31:14 UTC
This is a bug in libglusterfsclient, in that we need to ref the inode of the link destination before calling the link fop.

Comment 2 Shehjar Tikoo 2009-07-23 09:43:04 UTC
(In reply to comment #2)
> This is a bug in libglusterfsclient, in that we need to ref the inode of the
> link destination before calling the link fop.

That is a overly simplified description. The primary problem was that for a file that does not exist. any creation fop on that file must be performed with its loc already containing a new inode either from inode_new or in case of link, from an inode_ref on the old/existing/target inode. This is the part that is missing.

A patch is on the way.

Comment 3 Shehjar Tikoo 2009-07-23 10:21:41 UTC
With iozone comes a tool calles fileop that is used to measure system call latency over a filesystem.

With a NFS mount with unfs3booster as the server, using release 0.5 of unfs3booster and 2.0.4 of GlusterFS the link syscall fails or hangs the fileop tool because no reply is received to the NFSv3 Link request.

The cause of the hang has been traced to the following crash that occurs in GlusterFS.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x426ff940 (LWP 25773)]
0x00002ac70512b0d2 in pthread_spin_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00002ac70512b0d2 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00002ac705dcc6a6 in afr_set_read_child (this=0x3871fa0, inode=0x0, read_child=1) at afr.c:135
#2  0x00002ac705dd7d73 in afr_mknod_wind_cbk (frame=0x2aaaac0d8190, cookie=0x0, this=0x3871fa0, op_ret=0,
    op_errno=0, inode=0x0, buf=0x426feca0) at afr-dir-write.c:380
#3  0x00002ac705bb8c4c in client_mknod_cbk (frame=0x3912630, hdr=0x38bd290, hdrlen=108, iobuf=0x0)
    at client-protocol.c:4102
#4  0x00002ac705bbd129 in protocol_client_interpret (this=0x3870690, trans=0x3877430, hdr_p=0x38bd290 "",
    hdrlen=108, iobuf=0x0) at client-protocol.c:5878
#5  0x00002ac705bbdda1 in protocol_client_pollin (this=0x3870690, trans=0x3877430) at client-protocol.c:6169
#6  0x00002ac705bbdf35 in notify (this=0x3870690, event=2, data=0x3877430) at client-protocol.c:6213
#7  0x00002ac706acaba4 in socket_event_poll_in (this=0x3877430) at socket.c:714
#8  0x00002ac706acaea2 in socket_event_handler (fd=9, idx=1, data=0x3877430, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:814
#9  0x00002ac704ad80e9 in event_dispatch_epoll_handler (event_pool=0x386b200, events=0x387cb30, i=0)
    at event.c:804
#10 0x00002ac704ad82be in event_dispatch_epoll (event_pool=0x386b200) at event.c:867
#11 0x00002ac704ad85d4 in event_dispatch (event_pool=0x386b200) at event.c:975
#12 0x00002ac704d01c91 in poll_proc (ptr=0x386b040) at libglusterfsclient.c:633
#13 0x00002ac705126367 in start_thread () from /lib64/libpthread.so.0
#14 0x00002ac704822f7d in clone () from /lib64/libc.so.6
(gdb) ret
Make selected stack frame return now? (y or n) y
#0  afr_set_read_child (this=0x3871fa0, inode=0x0, read_child=1)
    at afr.c:137
137                     ret = __inode_ctx_get (inode, this, &ctx);
(gdb) p inode
$1 = (inode_t *) 0x0



=======================

The bug is classified as booster because there is no clarity yet on where the problem is and the use case was unfs3booster.

Comment 4 Anand Avati 2009-07-23 19:20:15 UTC
PATCH: http://patches.gluster.com/patch/814 in release-2.0 (libglusterfsclient: Fill new loc with target's ino on link)

Comment 5 Anand Avati 2009-07-23 19:20:19 UTC
PATCH: http://patches.gluster.com/patch/815 in release-2.0 (libglusterfsclient: Avoid overwrite of inode found through ino number)

Comment 6 Anand Avati 2009-07-27 13:32:30 UTC
PATCH: http://patches.gluster.com/patch/814 in master (libglusterfsclient: Fill new loc with target's ino on link)

Comment 7 Anand Avati 2009-07-27 13:32:34 UTC
PATCH: http://patches.gluster.com/patch/815 in master (libglusterfsclient: Avoid overwrite of inode found through ino number)