Bug 761765 (GLUSTER-33) - iozone can consistently cause a "Transport endpoint is not connected"
Summary: iozone can consistently cause a "Transport endpoint is not connected"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-33
Product: GlusterFS
Classification: Community
Component: core
Version: pre-2.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Anand Avati
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-23 00:30 UTC by idadesub
Modified: 2015-09-01 23:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTNR
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description idadesub 2009-06-23 00:30:16 UTC
I've been doing some load testing with iozone, from http://www.iozone.org/, and I can consistently get it to cause a glusterfs client to temporarily lose connection with the glusterfsd server. It's very repeatable, and I've recreated this with 2.0.2 and the tip of the release-2.0 branch. Any idea what could be going on?


I'm using the simplest gluster:

server:
volume posix
  type storage/posix
  option directory /tmp/gluster
end-volume

volume posix-locks
    type features/locks
    subvolumes posix
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option auth.addr.posix-locks.allow *
  subvolumes posix-locks
end-volume


client:
volume client
  type protocol/client
  option transport-type tcp
  option remote-host server
  option remote-subvolume posix-locks
end-volume


And when run with "iozone -a -n 512M -g 4G", I get this:

        Iozone: Performance Test of File I/O
                Version $Revision: 3.326 $
                Compiled for 32 bit mode.
                Build: linux

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

        Run began: Mon Jun 22 17:12:20 2009

        Auto Mode
        Using minimum file size of 524288 kilobytes.
        Using maximum file size of 4194304 kilobytes.
        Command line used: /opt/iozone/bin/iozone -a -n 512M -g 4G
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
          524288      64   73917   82350   115678   115527fsync: Transport endpoint is not connected

iozone: interrupted

exiting iozone



server log:

[2009-06-22 17:12:08] N [server-helpers.c:782:server_connection_destroy] server: destroyed connection of ...
[2009-06-22 17:12:09] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1022
[2009-06-22 17:12:09] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1021
[2009-06-22 17:13:34] N [server-protocol.c:7796:notify] server: ...:1021 disconnected
[2009-06-22 17:13:34] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1023
[2009-06-22 17:13:34] N [server-protocol.c:7796:notify] server: ...:1022 disconnected
[2009-06-22 17:13:34] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1020


client log:

[2009-06-22 17:12:09] D [glusterfsd.c:1203:main] glusterfs: running in pid 30646
[2009-06-22 17:12:09] D [client-protocol.c:5948:init] client: defaulting frame-timeout to 30mins
[2009-06-22 17:12:09] D [client-protocol.c:5959:init] client: defaulting ping-timeout to 10
[2009-06-22 17:12:09] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.0.2git/transport/socket.so
[2009-06-22 17:12:09] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.0.2git/transport/socket.so
[2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport
[2009-06-22 17:12:09] N [glusterfsd.c:1222:main] glusterfs: Successfully started
[2009-06-22 17:12:09] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP
[2009-06-22 17:12:09] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP
[2009-06-22 17:12:09] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'.
[2009-06-22 17:12:09] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'.
[2009-06-22 17:13:32] E [client-protocol.c:440:client_ping_timer_expired] client: Server ...:6996 has not responded in the last 10 seconds, disconnecting.
[2009-06-22 17:13:32] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(1) op(FSYNC)
[2009-06-22 17:13:32] W [fuse-bridge.c:882:fuse_err_cbk] glusterfs-fuse: 41014: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2009-06-22 17:13:32] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(2) op(PING)
[2009-06-22 17:13:32] D [client-protocol.c:541:client_ping_cbk] client: timer must have expired
[2009-06-22 17:13:32] D [socket.c:1294:socket_submit] client: not connected (priv->connected = 255)
[2009-06-22 17:13:32] D [fuse-bridge.c:296:need_fresh_lookup] fuse-bridge: revalidate of /iozone.tmp failed (Transport endpoint is not connected)
[2009-06-22 17:13:32] W [fuse-bridge.c:395:fuse_entry_cbk] glusterfs-fuse: 41015: LOOKUP() /iozone.tmp => -1 (Transport endpoint is not connected)
[2009-06-22 17:13:32] D [fuse-bridge.c:296:need_fresh_lookup] fuse-bridge: revalidate of /iozone.tmp failed (Transport endpoint is not connected)
[2009-06-22 17:13:32] W [fuse-bridge.c:395:fuse_entry_cbk] glusterfs-fuse: 41016: LOOKUP() /iozone.tmp => -1 (Transport endpoint is not connected)
[2009-06-22 17:13:32] W [fuse-bridge.c:395:fuse_entry_cbk] glusterfs-fuse: 41017: LOOKUP() /iozone.tmp.DUMMY => -1 (Transport endpoint is not connected)
[2009-06-22 17:13:32] W [fuse-bridge.c:882:fuse_err_cbk] glusterfs-fuse: 41018: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2009-06-22 17:13:32] N [client-protocol.c:6242:notify] client: disconnected
[2009-06-22 17:13:32] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP
[2009-06-22 17:13:32] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP
[2009-06-22 17:13:34] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'.
[2009-06-22 17:13:34] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'.

Comment 1 Amar Tumballi 2009-06-23 04:04:41 UTC
Hi Erick,
 This behavior is seen when you are not using 'performance/io-threads' on server side. Please use that on server volume and you should not be seeing this behavior.

Comment 2 idadesub 2009-06-23 17:40:02 UTC
Thanks again Amar, that did seem to work. Any chance saying "you should add io-threads to avoid this error" to the error message or the wiki?

Comment 3 Anand Avati 2009-07-28 16:39:46 UTC
io-threads had to be loaded


Note You need to log in before you can comment on or make changes to this bug.