Hide Forgot
I've been doing some load testing with iozone, from http://www.iozone.org/, and I can consistently get it to cause a glusterfs client to temporarily lose connection with the glusterfsd server. It's very repeatable, and I've recreated this with 2.0.2 and the tip of the release-2.0 branch. Any idea what could be going on? I'm using the simplest gluster: server: volume posix type storage/posix option directory /tmp/gluster end-volume volume posix-locks type features/locks subvolumes posix end-volume volume server type protocol/server option transport-type tcp option auth.addr.posix-locks.allow * subvolumes posix-locks end-volume client: volume client type protocol/client option transport-type tcp option remote-host server option remote-subvolume posix-locks end-volume And when run with "iozone -a -n 512M -g 4G", I get this: Iozone: Performance Test of File I/O Version $Revision: 3.326 $ Compiled for 32 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root. Run began: Mon Jun 22 17:12:20 2009 Auto Mode Using minimum file size of 524288 kilobytes. Using maximum file size of 4194304 kilobytes. Command line used: /opt/iozone/bin/iozone -a -n 512M -g 4G Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 524288 64 73917 82350 115678 115527fsync: Transport endpoint is not connected iozone: interrupted exiting iozone server log: [2009-06-22 17:12:08] N [server-helpers.c:782:server_connection_destroy] server: destroyed connection of ... [2009-06-22 17:12:09] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1022 [2009-06-22 17:12:09] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1021 [2009-06-22 17:13:34] N [server-protocol.c:7796:notify] server: ...:1021 disconnected [2009-06-22 17:13:34] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1023 [2009-06-22 17:13:34] N [server-protocol.c:7796:notify] server: ...:1022 disconnected [2009-06-22 17:13:34] N [server-protocol.c:7035:mop_setvolume] server: accepted client from ...:1020 client log: [2009-06-22 17:12:09] D [glusterfsd.c:1203:main] glusterfs: running in pid 30646 [2009-06-22 17:12:09] D [client-protocol.c:5948:init] client: defaulting frame-timeout to 30mins [2009-06-22 17:12:09] D [client-protocol.c:5959:init] client: defaulting ping-timeout to 10 [2009-06-22 17:12:09] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.0.2git/transport/socket.so [2009-06-22 17:12:09] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.0.2git/transport/socket.so [2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-06-22 17:12:09] D [client-protocol.c:6276:notify] client: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-06-22 17:12:09] N [glusterfsd.c:1222:main] glusterfs: Successfully started [2009-06-22 17:12:09] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP [2009-06-22 17:12:09] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP [2009-06-22 17:12:09] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'. [2009-06-22 17:12:09] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'. [2009-06-22 17:13:32] E [client-protocol.c:440:client_ping_timer_expired] client: Server ...:6996 has not responded in the last 10 seconds, disconnecting. [2009-06-22 17:13:32] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(1) op(FSYNC) [2009-06-22 17:13:32] W [fuse-bridge.c:882:fuse_err_cbk] glusterfs-fuse: 41014: FSYNC() ERR => -1 (Transport endpoint is not connected) [2009-06-22 17:13:32] E [saved-frames.c:165:saved_frames_unwind] client: forced unwinding frame type(2) op(PING) [2009-06-22 17:13:32] D [client-protocol.c:541:client_ping_cbk] client: timer must have expired [2009-06-22 17:13:32] D [socket.c:1294:socket_submit] client: not connected (priv->connected = 255) [2009-06-22 17:13:32] D [fuse-bridge.c:296:need_fresh_lookup] fuse-bridge: revalidate of /iozone.tmp failed (Transport endpoint is not connected) [2009-06-22 17:13:32] W [fuse-bridge.c:395:fuse_entry_cbk] glusterfs-fuse: 41015: LOOKUP() /iozone.tmp => -1 (Transport endpoint is not connected) [2009-06-22 17:13:32] D [fuse-bridge.c:296:need_fresh_lookup] fuse-bridge: revalidate of /iozone.tmp failed (Transport endpoint is not connected) [2009-06-22 17:13:32] W [fuse-bridge.c:395:fuse_entry_cbk] glusterfs-fuse: 41016: LOOKUP() /iozone.tmp => -1 (Transport endpoint is not connected) [2009-06-22 17:13:32] W [fuse-bridge.c:395:fuse_entry_cbk] glusterfs-fuse: 41017: LOOKUP() /iozone.tmp.DUMMY => -1 (Transport endpoint is not connected) [2009-06-22 17:13:32] W [fuse-bridge.c:882:fuse_err_cbk] glusterfs-fuse: 41018: FLUSH() ERR => -1 (Transport endpoint is not connected) [2009-06-22 17:13:32] N [client-protocol.c:6242:notify] client: disconnected [2009-06-22 17:13:32] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP [2009-06-22 17:13:32] D [client-protocol.c:6290:notify] client: got GF_EVENT_CHILD_UP [2009-06-22 17:13:34] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'. [2009-06-22 17:13:34] N [client-protocol.c:5551:client_setvolume_cbk] client: Connected to ...:6996, attached to remote volume 'posix-locks'.
Hi Erick, This behavior is seen when you are not using 'performance/io-threads' on server side. Please use that on server volume and you should not be seeing this behavior.
Thanks again Amar, that did seem to work. Any chance saying "you should add io-threads to avoid this error" to the error message or the wiki?
io-threads had to be loaded