Hide Forgot
Ok Amar, I have a HUGE problem with 2.0.8 I have a client up and everything is fine UNTIL I try to write a specific file (a backup of a client vol) when I try to write both replicate servers that host that file immediately crash. Here is the client log: [2009-11-13 19:32:56] N [glusterfsd.c:1306:main] glusterfs: Successfully started [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto1-g2: Connected to 10.251.42.47:6996, attached to remote volume 'brick'. [2009-11-13 19:32:56] N [afr.c:2194:notify] replicate1: Subvolume 'sto1-g2' came back up; going online. [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto2-g2: Connected to 10.251.126.52:6996, attached to remote volume 'brick'. [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto4-g2: Connected to 10.251.67.145:6996, attached to remote volume 'brick'. [2009-11-13 19:32:56] N [afr.c:2194:notify] replicate2: Subvolume 'sto4-g2' came back up; going online. [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto3-g2: Connected to 10.251.27.208:6996, attached to remote volume 'brick'. [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto2-g2: Connected to 10.251.126.52:6996, attached to remote volume 'brick'. [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto3-g2: Connected to 10.251.27.208:6996, attached to remote volume 'brick'. [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto4-g2: Connected to 10.251.67.145:6996, attached to remote volume 'brick'. [2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto1-g2: Connected to 10.251.42.47:6996, attached to remote volume 'brick'. [2009-11-13 19:33:26] E [saved-frames.c:165:saved_frames_unwind] sto1-g2: forced unwinding frame type(1) op(WRITE) [2009-11-13 19:33:26] N [client-protocol.c:6438:notify] sto1-g2: disconnected [2009-11-13 19:33:26] E [socket.c:745:socket_connect_finish] sto1-g2: connection to 10.251.42.47:6996 failed (Connection refused) [2009-11-13 19:33:26] E [saved-frames.c:165:saved_frames_unwind] sto2-g2: forced unwinding frame type(1) op(WRITE) [2009-11-13 19:33:26] W [fuse-bridge.c:1534:fuse_writev_cbk] glusterfs-fuse: 554: WRITE => -1 (Transport endpoint is not connected) [2009-11-13 19:33:26] N [client-protocol.c:6438:notify] sto2-g2: disconnected [2009-11-13 19:33:26] E [afr.c:2218:notify] replicate1: All subvolumes are down. Going offline until atleast one of them comes back up. [2009-11-13 19:33:26] W [fuse-bridge.c:882:fuse_err_cbk] glusterfs-fuse: 555: FLUSH() ERR => -1 (Transport endpoint is not connected) [2009-11-13 19:33:26] E [socket.c:745:socket_connect_finish] sto2-g2: connection to 10.251.126.52:6996 failed (Connection refused) [2009-11-13 19:33:26] E [socket.c:745:socket_connect_finish] sto2-g2: connection to 10.251.126.52:6996 failed (Connection refused) [2009-11-13 19:33:29] E [socket.c:745:socket_connect_finish] sto1-g2: connection to 10.251.42.47:6996 failed (Connection refused) Here is both server's output from the very handy trace command you gave me [2009-11-13 19:33:26] T [server-protocol.c:3419:server_lookup_resume] brick: 401: LOOKUP '33580634/glusterfs-client.vol' [2009-11-13 19:33:26] T [server-protocol.c:3821:server_open_resume] brick: 402: OPEN '/conf/glusterfs/glusterfs-client.vol (33580655)' [2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531047: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072 [2009-11-13 19:33:26] T [server-protocol.c:3920:server_readv] brick: 144: READV 'fd=0 (33580655); offset=0; size=65536 [2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531048: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072 [2009-11-13 19:33:26] T [server-protocol.c:4634:server_xattrop_resume] brick: 145: XATTROP '/conf/glusterfs/glusterfs-client.vol (33580655)' [2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531049: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072 [2009-11-13 19:33:26] T [server-protocol.c:5639:server_inodelk_resume] brick: 146: INODELK '/conf/glusterfs/glusterfs-client.vol (33580655)' [2009-11-13 19:33:26] T [common.c:563:pl_setlk] locks: Unlock (pid=11646) 0 - 0 => OK [2009-11-13 19:33:26] T [server-protocol.c:4123:server_flush] brick: 147: FLUSH 'fd=0 (33580655)' [2009-11-13 19:33:26] T [server-protocol.c:5564:server_utimens_resume] brick: 403: UTIMENS '/conf/glusterfs/glusterfs-client.vol (33580655)' [2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531050: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072 [2009-11-13 19:33:26] T [server-protocol.c:4036:server_release] brick: 148: RELEASE 'fd=0' [2009-11-13 19:33:26] T [server-protocol.c:3821:server_open_resume] brick: 404: OPEN '/conf/glusterfs/glusterfs-client.vol (33580655)' [2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531051: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072 [2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531052: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072 [2009-11-13 19:33:26] T [server-protocol.c:5783:server_finodelk] brick: 149: FINODELK 'fd=0 (33580655)' [2009-11-13 19:33:26] T [common.c:563:pl_setlk] locks: Lock (pid=134882368) 0 - 1505 => OK [2009-11-13 19:33:26] T [server-protocol.c:4605:server_fxattrop] brick: 150: FXATTROP 'fd=0 (33580655)' pending frames: patchset: v2.0.8 signal received: 6 time of crash: 2009-11-13 19:33:26 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 2.0.8 [0xbfffe420] /lib/tls/i686/nosegneg/libc.so.6(abort+0x101)[0xb7e2cd71] /lib/tls/i686/nosegneg/libc.so.6[0xb7e644fc] /lib/tls/i686/nosegneg/libc.so.6(__fortify_fail+0x48)[0xb7eef758] /lib/tls/i686/nosegneg/libc.so.6(__fortify_fail+0x0)[0xb7eef710] /lib/glusterfs/2.0.8/xlator/storage/posix.so[0xb75b86f4] /lib/glusterfs/2.0.8/xlator/storage/posix.so[0xb75acbb5] /lib/glusterfs/2.0.8/xlator/storage/posix.so(posix_fxattrop+0x41)[0xb75acc01] /lib/libglusterfs.so.0(default_fxattrop+0xb4)[0xb7f843d4] /lib/libglusterfs.so.0(default_fxattrop+0xb4)[0xb7f843d4] /lib/glusterfs/2.0.8/xlator/protocol/server.so(server_fxattrop+0x16c)[0xb757e88c] /lib/glusterfs/2.0.8/xlator/protocol/server.so(protocol_server_interpret+0xc5)[0xb75755a5] /lib/glusterfs/2.0.8/xlator/protocol/server.so(protocol_server_pollin+0x97)[0xb7575847] /lib/glusterfs/2.0.8/xlator/protocol/server.so(notify+0x7f)[0xb75758cf] /lib/glusterfs/2.0.8/transport/socket.so(socket_event_poll_in+0x3b)[0xb656298b] /lib/glusterfs/2.0.8/transport/socket.so(socket_event_handler+0xa3)[0xb6562da3] /lib/libglusterfs.so.0[0xb7f97689] /lib/libglusterfs.so.0(event_dispatch+0x21)[0xb7f96551] /sbin/glusterfs(main+0xcd2)[0x804bbf2] /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe0)[0xb7e16450] /sbin/glusterfs[0x8049de1]
PATCH: http://patches.gluster.com/patch/2232 in master (fixing a crash in posix (on 32bit))
I had this same problem on 3.0.0pre1 - only in 32bit userspace. 64bit userspace was fine. Applying this patch fixed it for me.
PATCH: http://patches.gluster.com/patch/2231 in release-2.0 (segfault fix in posix)