Bug 762115 (GLUSTER-383)

Summary: glusterfs server crash on 2.0.8
Product: [Community] GlusterFS Reporter: Amar Tumballi <amarts>
Component: posixAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs, john, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Amar Tumballi 2009-11-16 02:56:20 UTC
Ok Amar, I have a HUGE problem with 2.0.8

I have a client up and everything is fine UNTIL I try to write a specific file (a backup of a client vol) when I try to write both replicate servers that host that file immediately crash.
Here is the client log:

[2009-11-13 19:32:56] N [glusterfsd.c:1306:main] glusterfs: Successfully started
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto1-g2: Connected to 10.251.42.47:6996, attached to remote volume 'brick'.
[2009-11-13 19:32:56] N [afr.c:2194:notify] replicate1: Subvolume 'sto1-g2' came back up; going online.
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto2-g2: Connected to 10.251.126.52:6996, attached to remote volume 'brick'.
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto4-g2: Connected to 10.251.67.145:6996, attached to remote volume 'brick'.
[2009-11-13 19:32:56] N [afr.c:2194:notify] replicate2: Subvolume 'sto4-g2' came back up; going online.
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto3-g2: Connected to 10.251.27.208:6996, attached to remote volume 'brick'.
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto2-g2: Connected to 10.251.126.52:6996, attached to remote volume 'brick'.
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto3-g2: Connected to 10.251.27.208:6996, attached to remote volume 'brick'.
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto4-g2: Connected to 10.251.67.145:6996, attached to remote volume 'brick'.
[2009-11-13 19:32:56] N [client-protocol.c:5733:client_setvolume_cbk] sto1-g2: Connected to 10.251.42.47:6996, attached to remote volume 'brick'.
[2009-11-13 19:33:26] E [saved-frames.c:165:saved_frames_unwind] sto1-g2: forced unwinding frame type(1) op(WRITE)
[2009-11-13 19:33:26] N [client-protocol.c:6438:notify] sto1-g2: disconnected
[2009-11-13 19:33:26] E [socket.c:745:socket_connect_finish] sto1-g2: connection to 10.251.42.47:6996 failed (Connection refused)
[2009-11-13 19:33:26] E [saved-frames.c:165:saved_frames_unwind] sto2-g2: forced unwinding frame type(1) op(WRITE)
[2009-11-13 19:33:26] W [fuse-bridge.c:1534:fuse_writev_cbk] glusterfs-fuse: 554: WRITE => -1 (Transport endpoint is not connected)
[2009-11-13 19:33:26] N [client-protocol.c:6438:notify] sto2-g2: disconnected
[2009-11-13 19:33:26] E [afr.c:2218:notify] replicate1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2009-11-13 19:33:26] W [fuse-bridge.c:882:fuse_err_cbk] glusterfs-fuse: 555: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2009-11-13 19:33:26] E [socket.c:745:socket_connect_finish] sto2-g2: connection to 10.251.126.52:6996 failed (Connection refused)
[2009-11-13 19:33:26] E [socket.c:745:socket_connect_finish] sto2-g2: connection to 10.251.126.52:6996 failed (Connection refused)
[2009-11-13 19:33:29] E [socket.c:745:socket_connect_finish] sto1-g2: connection to 10.251.42.47:6996 failed (Connection refused)

Here is both server's output from the very handy trace command you gave me

[2009-11-13 19:33:26] T [server-protocol.c:3419:server_lookup_resume] brick: 401: LOOKUP '33580634/glusterfs-client.vol'
[2009-11-13 19:33:26] T [server-protocol.c:3821:server_open_resume] brick: 402: OPEN '/conf/glusterfs/glusterfs-client.vol (33580655)'
[2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531047: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072
[2009-11-13 19:33:26] T [server-protocol.c:3920:server_readv] brick: 144: READV 'fd=0 (33580655); offset=0; size=65536
[2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531048: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072
[2009-11-13 19:33:26] T [server-protocol.c:4634:server_xattrop_resume] brick: 145: XATTROP '/conf/glusterfs/glusterfs-client.vol (33580655)'
[2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531049: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072
[2009-11-13 19:33:26] T [server-protocol.c:5639:server_inodelk_resume] brick: 146: INODELK '/conf/glusterfs/glusterfs-client.vol (33580655)'
[2009-11-13 19:33:26] T [common.c:563:pl_setlk] locks: Unlock (pid=11646) 0 - 0 => OK
[2009-11-13 19:33:26] T [server-protocol.c:4123:server_flush] brick: 147: FLUSH 'fd=0 (33580655)'
[2009-11-13 19:33:26] T [server-protocol.c:5564:server_utimens_resume] brick: 403: UTIMENS '/conf/glusterfs/glusterfs-client.vol (33580655)'
[2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531050: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072
[2009-11-13 19:33:26] T [server-protocol.c:4036:server_release] brick: 148: RELEASE 'fd=0'
[2009-11-13 19:33:26] T [server-protocol.c:3821:server_open_resume] brick: 404: OPEN '/conf/glusterfs/glusterfs-client.vol (33580655)'
[2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531051: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072
[2009-11-13 19:33:26] T [server-protocol.c:5141:server_readdir] brick: 66531052: READDIR 'fd=0 (184549504); offset=-9007199254740991998; size=131072
[2009-11-13 19:33:26] T [server-protocol.c:5783:server_finodelk] brick: 149: FINODELK 'fd=0 (33580655)'
[2009-11-13 19:33:26] T [common.c:563:pl_setlk] locks: Lock (pid=134882368) 0 - 1505 => OK
[2009-11-13 19:33:26] T [server-protocol.c:4605:server_fxattrop] brick: 150: FXATTROP 'fd=0 (33580655)'
pending frames:

patchset: v2.0.8
signal received: 6
time of crash: 2009-11-13 19:33:26
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.0.8
[0xbfffe420]
/lib/tls/i686/nosegneg/libc.so.6(abort+0x101)[0xb7e2cd71]
/lib/tls/i686/nosegneg/libc.so.6[0xb7e644fc]
/lib/tls/i686/nosegneg/libc.so.6(__fortify_fail+0x48)[0xb7eef758]
/lib/tls/i686/nosegneg/libc.so.6(__fortify_fail+0x0)[0xb7eef710]
/lib/glusterfs/2.0.8/xlator/storage/posix.so[0xb75b86f4]
/lib/glusterfs/2.0.8/xlator/storage/posix.so[0xb75acbb5]
/lib/glusterfs/2.0.8/xlator/storage/posix.so(posix_fxattrop+0x41)[0xb75acc01]
/lib/libglusterfs.so.0(default_fxattrop+0xb4)[0xb7f843d4]
/lib/libglusterfs.so.0(default_fxattrop+0xb4)[0xb7f843d4]
/lib/glusterfs/2.0.8/xlator/protocol/server.so(server_fxattrop+0x16c)[0xb757e88c]
/lib/glusterfs/2.0.8/xlator/protocol/server.so(protocol_server_interpret+0xc5)[0xb75755a5]
/lib/glusterfs/2.0.8/xlator/protocol/server.so(protocol_server_pollin+0x97)[0xb7575847]
/lib/glusterfs/2.0.8/xlator/protocol/server.so(notify+0x7f)[0xb75758cf]
/lib/glusterfs/2.0.8/transport/socket.so(socket_event_poll_in+0x3b)[0xb656298b]
/lib/glusterfs/2.0.8/transport/socket.so(socket_event_handler+0xa3)[0xb6562da3]
/lib/libglusterfs.so.0[0xb7f97689]
/lib/libglusterfs.so.0(event_dispatch+0x21)[0xb7f96551]
/sbin/glusterfs(main+0xcd2)[0x804bbf2]
/lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe0)[0xb7e16450]
/sbin/glusterfs[0x8049de1]

Comment 1 Anand Avati 2009-11-16 05:41:31 UTC
PATCH: http://patches.gluster.com/patch/2232 in master (fixing a crash in posix (on 32bit))

Comment 2 John Leach 2009-11-17 17:37:31 UTC
I had this same problem on 3.0.0pre1 - only in 32bit userspace. 64bit userspace was fine.

Applying this patch fixed it for me.

Comment 3 Anand Avati 2009-11-19 07:45:39 UTC
PATCH: http://patches.gluster.com/patch/2231 in release-2.0 (segfault fix in posix)