Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 761894 (GLUSTER-162)

Summary:

Replication segfaults with many nodes

Product:

[Community] GlusterFS

Reporter:

Ville Tuulos <tuulos>

Component:

protocol

Assignee:

Vijay Bellur <vbellur>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

low

Version:

mainline

CC:

gluster-bugs, gowda, tuulos, vijay

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

---

Regression:

RTP

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Volume file with 4 nodes	none
Volume file with 8 nodes (segfaults after chdir)	none

Description Ville Tuulos 2009-07-24 01:54:19 UTC

Created attachment 42 [details]
Proposed patch for sigprocmask(2) defect

Comment 1 Ville Tuulos 2009-07-24 04:53:35 UTC

My distribute+replicate volfile (attached) seems to work correctly with a small number of nodes (a volfile with 4 nodes, 8 volumes attached). However, when I increase the number of nodes to >6, I can mount glusterfs ok but as soon as I chdir to the glusterfs mountpoint, the glusterfs process segfaults on the node where I access the directory and also on the nodes that are in the same replication group.

I get two kinds of stack traces:

-- stack trace 1 --

(no errors before the trace)

pending frames:
frame : type(1) op(INODELK)
>> message repeats many times
frame : type(1) op(STAT)
>> message repeats many times
patchset: git://git.sv.gnu.org/gluster.git
signal received: 11
time of crash: 2009-07-23 19:44:46
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.1.0git
/lib/libc.so.6[0x7f45de6d6f60]
/usr/local/lib/libglusterfs.so.0(inode_ref+0xe)[0x7f45dee3872e]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk_resume+0x1b1)[0x7f45dd456261]
/usr/local/lib/libglusterfs.so.0(call_resume+0x2c0)[0x7f45dee39f50]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk+0x15a)[0x7f45dd458a6a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(protocol_server_pollin+0x9a)[0x7f45dd453e0a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(notify+0x8b)[0x7f45dd453e9b]
/usr/local/lib/libglusterfs.so.0(transport_peerproc+0x8a)[0x7f45dee35f9a]
/lib/libpthread.so.0[0x7f45de9fefc7]
/lib/libc.so.6(clone+0x6d)[0x7f45de7745ad]
---------

-- stack trace 2 --

[2009-07-23 21:33:00] E [afr.c:2246:notify] repl1-vol2: All subvolumes are down. Going offline until atleast one of them comes back up.
pending frames:
frame : type(1) op(INODELK)
>> message repeats many times
patchset: git://git.sv.gnu.org/gluster.git
signal received: 11
time of crash: 2009-07-23 21:33:00
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.1.0git
/lib/libc.so.6[0x7f636e25cf60]
/usr/local/lib/libglusterfs.so.0(inode_ref+0xe)[0x7f636e9be72e]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk_resume+0x1b1)[0x7f636cfdc261]
/usr/local/lib/libglusterfs.so.0(call_resume+0x2c0)[0x7f636e9bff50]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk+0x15a)[0x7f636cfdea6a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(protocol_server_pollin+0x9a)[0x7f636cfd9e0a]
/usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(notify+0x8b)[0x7f636cfd9e9b]
/usr/local/lib/libglusterfs.so.0(xlator_notify+0x43)[0x7f636e9b18e3]
/usr/local/lib/glusterfs/2.1.0git/transport/socket.so(socket_event_handler+0xd0)[0x7f636c11bfa0]
/usr/local/lib/libglusterfs.so.0[0x7f636e9cac77]
glusterfs(main+0x8ad)[0x403ffd]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f636e2491a6]
glusterfs[0x402859]
---------

Comment 2 Basavanagowda Kanur 2009-07-24 08:44:11 UTC

the problem is with protocol/server and has nothing to do with the increased number of nodes.

thanks for reporting the bug. fix will be available soon on the git repository.

Comment 3 Anand Avati 2009-07-27 15:33:59 UTC

PATCH: http://patches.gluster.com/patch/818 in release-2.0 (protocol/server: add checks for updatation of loc->parent in entrylk() or inodelk().)