Bug 762109 (GLUSTER-377) - segfault in protocol/client with chaining
Summary: segfault in protocol/client with chaining
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-377
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Vijay Bellur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-12 23:19 UTC by idadesub
Modified: 2009-12-02 22:04 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description idadesub 2009-11-12 23:19:13 UTC
Hi again,

With some more experimentation, I found a segfault with a chaining framework, where I have a simple 3 machine cluster like:

server -> chain -> client

-------------------------------------------
server.vol:

volume posix
  type storage/posix
  option directory /tmp/gluster
end-volume

volume posix-locks
    type features/locks
    subvolumes posix
end-volume

volume server
  type protocol/server
  subvolumes posix-locks
  option transport-type tcp
  option auth.addr.posix-locks.allow *
end-volume
-------------------------------------------

chain.vol:

volume chain
        type protocol/client
        option transport-type tcp
        option remote-host server
        option remote-subvolume posix-locks
end-volume

volume server
  type protocol/server
  subvolumes chain
  option transport-type tcp
  option auth.addr.chain.allow *
end-volume
-------------------------------------------

client.vol:

volume client
  type protocol/client
  option transport-type tcp
  option remote-host chain
  option remote-subvolume chain
end-volume
-------------------------------------------

The server and chain start up fine, but when the client tries to connect to the chain, the chain segfaults with this:


#0  0x00007ffff69fae2d in client_lookup (frame=0x617ee0, this=0x609c80, loc=0x617b60, xattr_req=0x615dc0) at client-protocol.c:3141
#1  0x00007ffff67deeec in server_lookup_resume (frame=<value optimized out>, bound_xl=0x609c80) at server-protocol.c:2353
#2  0x00007ffff67defb2 in server_resolve_done (frame=0x0) at server-resolve.c:515
#3  0x00007ffff67df735 in server_resolve_all (frame=0x0) at server-resolve.c:550
#4  0x00007ffff67df62a in server_resolve (frame=0x0) at server-resolve.c:499
#5  0x00007ffff67df726 in server_resolve_all (frame=0x0) at server-resolve.c:546
#6  0x00007ffff67df5a5 in server_resolve_inode (frame=0x614a40) at server-resolve.c:436
#7  0x00007ffff67df65d in server_resolve (frame=0x0) at server-resolve.c:492
#8  0x00007ffff67df703 in server_resolve_all (frame=0x0) at server-resolve.c:539
#9  0x00007ffff67df843 in resolve_and_resume (frame=0x614a40, fn=0x609c80) at server-resolve.c:574
#10 0x00007ffff67decf1 in server_lookup (frame=0x614a40, bound_xl=0x609c80, hdr=0x614b90, hdrlen=<value optimized out>, iobuf=<value optimized out>) at server-protocol.c:2425
#11 0x00007ffff67cf8ba in protocol_server_pollin (this=0x60ea00, trans=0x614650) at server-protocol.c:6361
#12 0x00007ffff67cf943 in notify (this=0x60ea00, event=<value optimized out>, data=0x7fffffffd6a8) at server-protocol.c:6417
#13 0x00007ffff7baf263 in xlator_notify (xl=0x60ea00, event=2, data=0x614650) at xlator.c:923
#14 0x00007ffff5bc2108 in socket_event_handler (fd=<value optimized out>, idx=4, data=0x614650, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:829
#15 0x00007ffff7bc91cd in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:804
#16 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:867
#17 0x000000000040409a in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1379

Comment 1 Anand Avati 2009-11-26 07:49:10 UTC
PATCH: http://patches.gluster.com/patch/2371 in master (add an extra check for loc->parent when loc->ino is not 1)

Comment 2 idadesub 2009-11-30 15:51:54 UTC
Hi Again! I just tested the latest head of 3.0, and I've run into another crash. It's using the same specs as before, but now the chain's crashing in a new location when the client tries to connect to it:

#0  0x00007ffff69f81cd in client_lookup (frame=0x615cf0, this=0x609c80, loc=0x615970, xattr_req=0x613d50) at client-protocol.c:3274
#1  0x00007ffff67dceec in server_lookup_resume (frame=<value optimized out>, bound_xl=0x609c80) at server-protocol.c:2353
#2  0x00007ffff67dcfb2 in server_resolve_done (frame=0x0) at server-resolve.c:527
#3  0x00007ffff67dd815 in server_resolve_all (frame=0x0) at server-resolve.c:562
#4  0x00007ffff67dd70a in server_resolve (frame=0x0) at server-resolve.c:511
#5  0x00007ffff67dd806 in server_resolve_all (frame=0x0) at server-resolve.c:558
#6  0x00007ffff67dd685 in server_resolve_inode (frame=0x613f90) at server-resolve.c:448
#7  0x00007ffff67dd73d in server_resolve (frame=0x0) at server-resolve.c:504
#8  0x00007ffff67dd7e3 in server_resolve_all (frame=0x0) at server-resolve.c:551
#9  0x00007ffff67dd923 in resolve_and_resume (frame=0x613f90, fn=0x609c80) at server-resolve.c:586
#10 0x00007ffff67dccf1 in server_lookup (frame=0x613f90, bound_xl=0x609c80, hdr=0x6141d0, hdrlen=<value optimized out>, iobuf=<value optimized out>)
    at server-protocol.c:2425
#11 0x00007ffff67cd8ba in protocol_server_pollin (this=0x60ea00, trans=0x6142d0) at server-protocol.c:6361
#12 0x00007ffff67cd943 in notify (this=0x60ea00, event=<value optimized out>, data=0x7fffffffd6a8) at server-protocol.c:6417
#13 0x00007ffff7baf263 in xlator_notify (xl=0x60ea00, event=2, data=0x6142d0) at xlator.c:923
#14 0x00007ffff5bc0108 in socket_event_handler (fd=<value optimized out>, idx=3, data=0x6142d0, poll_in=1, poll_out=0, poll_err=<value optimized out>)
    at socket.c:829
#15 0x00007ffff7bc91ed in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:804
#16 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:867
#17 0x0000000000404096 in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1377


server log:

[2009-11-30 10:31:50] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so
[2009-11-30 10:31:50] N [glusterfsd.c:1360:main] glusterfs: Successfully started
[2009-11-30 10:32:31] D [addr.c:190:gf_auth] posix-locks: allowed = "*", received addr = "..."
[2009-11-30 10:32:31] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1023
[2009-11-30 10:32:31] D [addr.c:190:gf_auth] posix-locks: allowed = "*", received addr = "..."
[2009-11-30 10:32:31] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1022


chain log:

[2009-11-30 10:32:31] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so
[2009-11-30 10:32:31] D [client-protocol.c:6558:init] chain: defaulting frame-timeout to 30mins
[2009-11-30 10:32:31] D [client-protocol.c:6569:init] chain: defaulting ping-timeout to 42
[2009-11-30 10:32:31] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so
[2009-11-30 10:32:31] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so
[2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:31] D [client-protocol.c:6982:notify] chain: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:31] N [glusterfsd.c:1360:main] glusterfs: Successfully started
[2009-11-30 10:32:31] D [client-protocol.c:6996:notify] chain: got GF_EVENT_CHILD_UP
[2009-11-30 10:32:31] D [client-protocol.c:6996:notify] chain: got GF_EVENT_CHILD_UP
[2009-11-30 10:32:31] N [client-protocol.c:6202:client_setvolume_cbk] chain: Connected to ...:6996, attached to remote volume 'posix-locks'.
[2009-11-30 10:32:31] N [client-protocol.c:6202:client_setvolume_cbk] chain: Connected to ...:6996, attached to remote volume 'posix-locks'.
[2009-11-30 10:32:39] D [addr.c:190:gf_auth] chain: allowed = "*", received addr = "..."
[2009-11-30 10:32:39] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1023
[2009-11-30 10:32:39] D [addr.c:190:gf_auth] chain: allowed = "*", received addr = "..."
[2009-11-30 10:32:39] N [server-protocol.c:5522:mop_setvolume] server: accepted client from ...:1022
[2009-11-30 10:32:39] D [server-resolve.c:584:resolve_and_resume] chain: RESOLVE LOOKUP() on / (null)


client log:

[2009-11-30 10:32:39] D [glusterfsd.c:1334:main] glusterfs: running in pid 14695
[2009-11-30 10:32:39] D [client-protocol.c:6558:init] client: defaulting frame-timeout to 30mins
[2009-11-30 10:32:39] D [client-protocol.c:6569:init] client: defaulting ping-timeout to 42
[2009-11-30 10:32:39] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so
[2009-11-30 10:32:39] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.1.0git/transport/socket.so
[2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:39] D [client-protocol.c:6982:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-11-30 10:32:39] N [glusterfsd.c:1360:main] glusterfs: Successfully started
[2009-11-30 10:32:39] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP
[2009-11-30 10:32:39] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP
[2009-11-30 10:32:39] N [client-protocol.c:6202:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'chain'.
[2009-11-30 10:32:39] D [fuse-bridge.c:3080:fuse_thread_proc] fuse:  pthread_cond_timedout returned non zero value ret: 0 errno: 0
[2009-11-30 10:32:39] N [fuse-bridge.c:2932:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.11
[2009-11-30 10:32:39] N [client-protocol.c:6202:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'chain'.
[2009-11-30 10:33:22] E [client-protocol.c:415:client_ping_timer_expired] client: Server ...:6996 has not responded in the last 42 seconds, disconnecting.
[2009-11-30 10:33:22] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(1) op(LOOKUP)
[2009-11-30 10:33:22] D [fuse-bridge.c:2977:fuse_first_lookup_cbk] fuse: first lookup on root failed.
[2009-11-30 10:33:22] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(2) op(PING)
[2009-11-30 10:33:22] D [client-protocol.c:516:client_ping_cbk] client: timer must have expired
[2009-11-30 10:33:22] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number
[2009-11-30 10:33:22] D [socket.c:1316:socket_submit] client: not connected (priv->connected = 255)
[2009-11-30 10:33:22] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 2: ERR => -1 (Transport endpoint is not connected)
[2009-11-30 10:33:22] N [client-protocol.c:6949:notify] client: disconnected
[2009-11-30 10:33:22] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP
[2009-11-30 10:33:22] D [client-protocol.c:6996:notify] client: got GF_EVENT_CHILD_UP
[2009-11-30 10:33:22] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number
[2009-11-30 10:33:22] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 3: ERR => -1 (Transport endpoint is not connected)
[2009-11-30 10:38:12] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number
[2009-11-30 10:38:12] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 4: ERR => -1 (Transport endpoint is not connected)
[2009-11-30 10:38:12] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number
[2009-11-30 10:38:12] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 5: ERR => -1 (Transport endpoint is not connected)
[2009-11-30 10:44:57] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number
[2009-11-30 10:44:57] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 6: ERR => -1 (Transport endpoint is not connected)
[2009-11-30 10:44:57] D [client-protocol.c:1594:client_statfs] client: STATFS 1 (/): failed to get remote inode number
[2009-11-30 10:44:57] W [fuse-bridge.c:2357:fuse_statfs_cbk] glusterfs-fuse: 7: ERR => -1 (Transport endpoint is not connected)

Comment 3 Amar Tumballi 2009-12-02 17:51:17 UTC
Hi Erick,
I tried to run this config with latest git head, and could run this without any issues. I will try to run it over different network, and see. But currently, this is 'works for me' status.

Regards,
Amar

Comment 4 idadesub 2009-12-02 18:20:31 UTC
You're right, I just checked, and I was missing a compile error and ended up reusing an old release of gluster. It's working now :)


Note You need to log in before you can comment on or make changes to this bug.