Bug 761894 (GLUSTER-162) - Replication segfaults with many nodes
Summary: Replication segfaults with many nodes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-162
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Vijay Bellur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-24 04:53 UTC by Ville Tuulos
Modified: 2009-11-12 09:28 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
Volume file with 4 nodes (3.69 KB, text/plain)
2009-07-24 01:53 UTC, Ville Tuulos
no flags Details
Volume file with 8 nodes (segfaults after chdir) (6.12 KB, text/plain)
2009-07-24 01:54 UTC, Ville Tuulos
no flags Details

Description Ville Tuulos 2009-07-24 01:54:19 UTC
Created attachment 42 [details]
Proposed patch for sigprocmask(2) defect

Comment 1 Ville Tuulos 2009-07-24 04:53:35 UTC
My distribute+replicate volfile (attached) seems to work correctly with a small number of nodes (a volfile with 4 nodes, 8 volumes attached). However, when I increase the number of nodes to >6, I can mount glusterfs ok but as soon as I chdir to the glusterfs mountpoint, the glusterfs process segfaults on the node where I access the directory and also on the nodes that are in the same replication group.

I get two kinds of stack traces:

-- stack trace 1 --

(no errors before the trace)

pending frames:
frame : type(1) op(INODELK)
>> message repeats many times
frame : type(1) op(STAT)
>> message repeats many times
patchset: git://git.sv.gnu.org/gluster.git
signal received: 11
time of crash: 2009-07-23 19:44:46
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.1.0git
/lib/libc.so.6[0x7f45de6d6f60]
/usr/local/lib/libglusterfs.so.0(inode_ref+0xe)[0x7f45dee3872e]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk_resume+0x1b1)[0x7f45dd456261]
/usr/local/lib/libglusterfs.so.0(call_resume+0x2c0)[0x7f45dee39f50]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk+0x15a)[0x7f45dd458a6a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(protocol_server_pollin+0x9a)[0x7f45dd453e0a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(notify+0x8b)[0x7f45dd453e9b]
/usr/local/lib/libglusterfs.so.0(transport_peerproc+0x8a)[0x7f45dee35f9a]
/lib/libpthread.so.0[0x7f45de9fefc7]
/lib/libc.so.6(clone+0x6d)[0x7f45de7745ad]
---------

-- stack trace 2 --

[2009-07-23 21:33:00] E [afr.c:2246:notify] repl1-vol2: All subvolumes are down. Going offline until atleast one of them comes back up.
pending frames:
frame : type(1) op(INODELK)
>> message repeats many times
patchset: git://git.sv.gnu.org/gluster.git
signal received: 11
time of crash: 2009-07-23 21:33:00
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.1.0git
/lib/libc.so.6[0x7f636e25cf60]
/usr/local/lib/libglusterfs.so.0(inode_ref+0xe)[0x7f636e9be72e]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk_resume+0x1b1)[0x7f636cfdc261]
/usr/local/lib/libglusterfs.so.0(call_resume+0x2c0)[0x7f636e9bff50]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk+0x15a)[0x7f636cfdea6a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(protocol_server_pollin+0x9a)[0x7f636cfd9e0a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(notify+0x8b)[0x7f636cfd9e9b]
/usr/local/lib/libglusterfs.so.0(xlator_notify+0x43)[0x7f636e9b18e3]
/usr/local/lib/glusterfs/2.1.0git/transport/socket.so(socket_event_handler+0xd0)[0x7f636c11bfa0]
/usr/local/lib/libglusterfs.so.0[0x7f636e9cac77]
glusterfs(main+0x8ad)[0x403ffd]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f636e2491a6]
glusterfs[0x402859]
---------

Comment 2 Basavanagowda Kanur 2009-07-24 08:44:11 UTC
the problem is with protocol/server and has nothing to do with the increased number of nodes.

thanks for reporting the bug. fix will be available soon on the git repository.

Comment 3 Anand Avati 2009-07-27 15:33:59 UTC
PATCH: http://patches.gluster.com/patch/818 in release-2.0 (protocol/server: add checks for updatation of loc->parent in entrylk() or inodelk().)


Note You need to log in before you can comment on or make changes to this bug.