Hi again, With some more experimentation, I found a segfault with a chaining framework, where I have a simple 3 machine cluster like: server -> chain -> client ------------------------------------------- server.vol: volume posix type storage/posix option directory /tmp/gluster end-volume volume posix-locks type features/locks subvolumes posix end-volume volume server type protocol/server subvolumes posix-locks option transport-type tcp option auth.addr.posix-locks.allow * end-volume ------------------------------------------- chain.vol: volume chain type protocol/client option transport-type tcp option remote-host server option remote-subvolume posix-locks end-volume volume server type protocol/server subvolumes chain option transport-type tcp option auth.addr.chain.allow * end-volume ------------------------------------------- client.vol: volume client type protocol/client option transport-type tcp option remote-host chain option remote-subvolume chain end-volume ------------------------------------------- The server and chain start up fine, but when the client tries to connect to the chain, the chain segfaults with this: #0 0x00007ffff69fae2d in client_lookup (frame=0x617ee0, this=0x609c80, loc=0x617b60, xattr_req=0x615dc0) at client-protocol.c:3141 #1 0x00007ffff67deeec in server_lookup_resume (frame=<value optimized out>, bound_xl=0x609c80) at server-protocol.c:2353 #2 0x00007ffff67defb2 in server_resolve_done (frame=0x0) at server-resolve.c:515 #3 0x00007ffff67df735 in server_resolve_all (frame=0x0) at server-resolve.c:550 #4 0x00007ffff67df62a in server_resolve (frame=0x0) at server-resolve.c:499 #5 0x00007ffff67df726 in server_resolve_all (frame=0x0) at server-resolve.c:546 #6 0x00007ffff67df5a5 in server_resolve_inode (frame=0x614a40) at server-resolve.c:436 #7 0x00007ffff67df65d in server_resolve (frame=0x0) at server-resolve.c:492 #8 0x00007ffff67df703 in server_resolve_all (frame=0x0) at server-resolve.c:539 #9 0x00007ffff67df843 in resolve_and_resume (frame=0x614a40, fn=0x609c80) at server-resolve.c:574 #10 0x00007ffff67decf1 in server_lookup (frame=0x614a40, bound_xl=0x609c80, hdr=0x614b90, hdrlen=<value optimized out>, iobuf=<value optimized out>) at server-protocol.c:2425 #11 0x00007ffff67cf8ba in protocol_server_pollin (this=0x60ea00, trans=0x614650) at server-protocol.c:6361 #12 0x00007ffff67cf943 in notify (this=0x60ea00, event=<value optimized out>, data=0x7fffffffd6a8) at server-protocol.c:6417 #13 0x00007ffff7baf263 in xlator_notify (xl=0x60ea00, event=2, data=0x614650) at xlator.c:923 #14 0x00007ffff5bc2108 in socket_event_handler (fd=<value optimized out>, idx=4, data=0x614650, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:829 #15 0x00007ffff7bc91cd in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:804 #16 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:867 #17 0x000000000040409a in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1379
PATCH: http://patches.gluster.com/patch/2371 in master (add an extra check for loc->parent when loc->ino is not 1)
Hi Again! I just tested the latest head of 3.0, and I've run into another crash. It's using the same specs as before, but now the chain's crashing in a new location when the client tries to connect to it: #0 0x00007ffff69f81cd in client_lookup (frame=0x615cf0, this=0x609c80, loc=0x615970, xattr_req=0x613d50) at client-protocol.c:3274 #1 0x00007ffff67dceec in server_lookup_resume (frame=<value optimized out>, bound_xl=0x609c80) at server-protocol.c:2353 #2 0x00007ffff67dcfb2 in server_resolve_done (frame=0x0) at server-resolve.c:527 #3 0x00007ffff67dd815 in server_resolve_all (frame=0x0) at server-resolve.c:562 #4 0x00007ffff67dd70a in server_resolve (frame=0x0) at server-resolve.c:511 #5 0x00007ffff67dd806 in server_resolve_all (frame=0x0) at server-resolve.c:558 #6 0x00007ffff67dd685 in server_resolve_inode (frame=0x613f90) at server-resolve.c:448 #7 0x00007ffff67dd73d in server_resolve (frame=0x0) at server-resolve.c:504 #8 0x00007ffff67dd7e3 in server_resolve_all (frame=0x0) at server-resolve.c:551 #9 0x00007ffff67dd923 in resolve_and_resume (frame=0x613f90, fn=0x609c80) at server-resolve.c:586 #10 0x00007ffff67dccf1 in server_lookup (frame=0x613f90, bound_xl=0x609c80, hdr=0x6141d0, hdrlen=<value optimized out>, iobuf=<value optimized out>) at server-protocol.c:2425 #11 0x00007ffff67cd8ba in protocol_server_pollin (this=0x60ea00, trans=0x6142d0) at server-protocol.c:6361 #12 0x00007ffff67cd943 in notify (this=0x60ea00, event=<value optimized out>, data=0x7fffffffd6a8) at server-protocol.c:6417 #13 0x00007ffff7baf263 in xlator_notify (xl=0x60ea00, event=2, data=0x6142d0) at xlator.c:923 #14 0x00007ffff5bc0108 in socket_event_handler (fd=<value optimized out>, idx=3, data=0x6142d0, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:829 #15 0x00007ffff7bc91ed in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:804 #16 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:867 #17 0x0000000000404096 in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1377 server log: [2009-11-30 10:31:50] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so [2009-11-30 10:31:50] N [glusterfsd.c:1360:main] glusterfs: Successfully started [2009-11-30 10:32:31] D [addr.c:190:gf_auth] posix-locks: allowed = "*", received addr = "..." [2009-11-30 10:32:31] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1023 [2009-11-30 10:32:31] D [addr.c:190:gf_auth] posix-locks: allowed = "*", received addr = "..." [2009-11-30 10:32:31] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1022 chain log: [2009-11-30 10:32:31] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so [2009-11-30 10:32:31] D [client-protocol.c:6558:init] chain: defaulting frame-timeout to 30mins [2009-11-30 10:32:31] D [client-protocol.c:6569:init] chain: defaulting ping-timeout to 42 [2009-11-30 10:32:31] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so [2009-11-30 10:32:31] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so [2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:31] N [glusterfsd.c:1360:main] glusterfs: Successfully started [2009-11-30 10:32:31] D [client-protocol.c:6996:notify] chain: got GF_EVENT_CHILD_UP [2009-11-30 10:32:31] D [client-protocol.c:6996:notify] chain: got GF_EVENT_CHILD_UP [2009-11-30 10:32:31] N [client-protocol.c:6202:client_setvolume_cbk] chain: Connected to ...:6996, attached to remote volume 'posix-locks'. [2009-11-30 10:32:31] N [client-protocol.c:6202:client_setvolume_cbk] chain: Connected to ...:6996, attached to remote volume 'posix-locks'. [2009-11-30 10:32:39] D [addr.c:190:gf_auth] chain: allowed = "*", received addr = "..." [2009-11-30 10:32:39] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1023 [2009-11-30 10:32:39] D [addr.c:190:gf_auth] chain: allowed = "*", received addr = "..." [2009-11-30 10:32:39] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1022 [2009-11-30 10:32:39] D [server-resolve.c:584:resolve_and_resume] chain: RESOLVE LOOKUP() on / (null) client log: [2009-11-30 10:32:39] D [glusterfsd.c:1334:main] glusterfs: running in pid 14695 [2009-11-30 10:32:39] D [client-protocol.c:6558:init] client: defaulting frame-timeout to 30mins [2009-11-30 10:32:39] D [client-protocol.c:6569:init] client: defaulting ping-timeout to 42 [2009-11-30 10:32:39] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so [2009-11-30 10:32:39] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so [2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-11-30 10:32:39] N [glusterfsd.c:1360:main] glusterfs: Successfully started [2009-11-30 10:32:39] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP [2009-11-30 10:32:39] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP [2009-11-30 10:32:39] N [client-protocol.c:6202:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'chain'. [2009-11-30 10:32:39] D [fuse-bridge.c:3080:fuse_thread_proc] fuse: pthread_cond_timedout returned non zero value ret: 0 errno: 0 [2009-11-30 10:32:39] N [fuse-bridge.c:2932:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.11 [2009-11-30 10:32:39] N [client-protocol.c:6202:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'chain'. [2009-11-30 10:33:22] E [client-protocol.c:415:client_ping_timer_expired] client: Server ...:6996 has not responded in the last 42 seconds, disconnecting. [2009-11-30 10:33:22] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(1) op(LOOKUP) [2009-11-30 10:33:22] D [fuse-bridge.c:2977:fuse_first_lookup_cbk] fuse: first lookup on root failed. [2009-11-30 10:33:22] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(2) op(PING) [2009-11-30 10:33:22] D [client-protocol.c:516:client_ping_cbk] client: timer must have expired [2009-11-30 10:33:22] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number [2009-11-30 10:33:22] D [socket.c:1316:socket_submit] client: not connected (priv->connected = 255) [2009-11-30 10:33:22] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 2: ERR => -1 (Transport endpoint is not connected) [2009-11-30 10:33:22] N [client-protocol.c:6949:notify] client: disconnected [2009-11-30 10:33:22] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP [2009-11-30 10:33:22] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP [2009-11-30 10:33:22] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number [2009-11-30 10:33:22] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 3: ERR => -1 (Transport endpoint is not connected) [2009-11-30 10:38:12] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number [2009-11-30 10:38:12] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 4: ERR => -1 (Transport endpoint is not connected) [2009-11-30 10:38:12] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number [2009-11-30 10:38:12] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 5: ERR => -1 (Transport endpoint is not connected) [2009-11-30 10:44:57] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number [2009-11-30 10:44:57] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 6: ERR => -1 (Transport endpoint is not connected) [2009-11-30 10:44:57] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number [2009-11-30 10:44:57] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 7: ERR => -1 (Transport endpoint is not connected)
Hi Erick, I tried to run this config with latest git head, and could run this without any issues. I will try to run it over different network, and see. But currently, this is 'works for me' status. Regards, Amar
You're right, I just checked, and I was missing a compile error and ended up reusing an old release of gluster. It's working now :)