| Summary: | Replication segfaults with many nodes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Ville Tuulos <tuulos> | ||||||
| Component: | protocol | Assignee: | Vijay Bellur <vbellur> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | mainline | CC: | gluster-bugs, gowda, tuulos, vijay | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | --- | |||||||
| Regression: | RTP | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
My distribute+replicate volfile (attached) seems to work correctly with a small number of nodes (a volfile with 4 nodes, 8 volumes attached). However, when I increase the number of nodes to >6, I can mount glusterfs ok but as soon as I chdir to the glusterfs mountpoint, the glusterfs process segfaults on the node where I access the directory and also on the nodes that are in the same replication group. I get two kinds of stack traces: -- stack trace 1 -- (no errors before the trace) pending frames: frame : type(1) op(INODELK) >> message repeats many times frame : type(1) op(STAT) >> message repeats many times patchset: git://git.sv.gnu.org/gluster.git signal received: 11 time of crash: 2009-07-23 19:44:46 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 2.1.0git /lib/libc.so.6[0x7f45de6d6f60] /usr/local/lib/libglusterfs.so.0(inode_ref+0xe)[0x7f45dee3872e] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk_resume+0x1b1)[0x7f45dd456261] /usr/local/lib/libglusterfs.so.0(call_resume+0x2c0)[0x7f45dee39f50] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk+0x15a)[0x7f45dd458a6a] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(protocol_server_pollin+0x9a)[0x7f45dd453e0a] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(notify+0x8b)[0x7f45dd453e9b] /usr/local/lib/libglusterfs.so.0(transport_peerproc+0x8a)[0x7f45dee35f9a] /lib/libpthread.so.0[0x7f45de9fefc7] /lib/libc.so.6(clone+0x6d)[0x7f45de7745ad] --------- -- stack trace 2 -- [2009-07-23 21:33:00] E [afr.c:2246:notify] repl1-vol2: All subvolumes are down. Going offline until atleast one of them comes back up. pending frames: frame : type(1) op(INODELK) >> message repeats many times patchset: git://git.sv.gnu.org/gluster.git signal received: 11 time of crash: 2009-07-23 21:33:00 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 2.1.0git /lib/libc.so.6[0x7f636e25cf60] /usr/local/lib/libglusterfs.so.0(inode_ref+0xe)[0x7f636e9be72e] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk_resume+0x1b1)[0x7f636cfdc261] /usr/local/lib/libglusterfs.so.0(call_resume+0x2c0)[0x7f636e9bff50] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(server_inodelk+0x15a)[0x7f636cfdea6a] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(protocol_server_pollin+0x9a)[0x7f636cfd9e0a] /usr/local/lib/glusterfs/2.1.0git/xlator/protocol/server.so(notify+0x8b)[0x7f636cfd9e9b] /usr/local/lib/libglusterfs.so.0(xlator_notify+0x43)[0x7f636e9b18e3] /usr/local/lib/glusterfs/2.1.0git/transport/socket.so(socket_event_handler+0xd0)[0x7f636c11bfa0] /usr/local/lib/libglusterfs.so.0[0x7f636e9cac77] glusterfs(main+0x8ad)[0x403ffd] /lib/libc.so.6(__libc_start_main+0xe6)[0x7f636e2491a6] glusterfs[0x402859] --------- the problem is with protocol/server and has nothing to do with the increased number of nodes. thanks for reporting the bug. fix will be available soon on the git repository. PATCH: http://patches.gluster.com/patch/818 in release-2.0 (protocol/server: add checks for updatation of loc->parent in entrylk() or inodelk().) |
Created attachment 42 [details] Proposed patch for sigprocmask(2) defect