Bug 828509 - client unable to communicate with server until forcibly remounted
Summary: client unable to communicate with server until forcibly remounted
Keywords:
Status: CLOSED DUPLICATE of bug 767359
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: fuse
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Csaba Henk
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-04 20:06 UTC by b.candler
Modified: 2012-06-05 09:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-05 09:53:22 UTC
Embargoed:


Attachments (Terms of Use)

Description b.candler 2012-06-04 20:06:15 UTC
Storage mounted on /gluster/scratch. The client somehow got into a state whereby
    ls /gluster/scratch
always returned "Transport endpoint not connected". No file under /var/log/gluster/ was appended to after each attempt.

However, after
    umount /gluster/scratch
    mount /gluster/scratch
everything was fine.

Relevant /etc/fstab entry on client:

storage1:/scratch3 /gluster/scratch3 glusterfs defaults,_netdev 0 0

Client info:
Ubuntu 10.04 x86_64
glusterfs 3.2.5-1

Server info:
Ubuntu 12.04 x86_64
glusterfs 3.2.5-1
(this system was upgraded online from 10.04 to 12.04)

How reproducible:

Not really, have seen this once in a while.

Additional info:

Here is the end of /var/log/glusterfs/gluster-scratch.log, which suggests there was some sort of crash in the client (from which the client was unable to recover automatically, presumably). 2012-06-04 is when the unmount/remount was done.

...
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
time of crash: frame : type(1) op(WRITE)
2012-04-25 12:35:57
frame : type(1) op(WRITE)
configuration details:
frame : type(1) op(WRITE)
argp 1
frame : type(1) op(WRITE)
backtrace 1
frame : type(1) op(WRITE)
dlfcn 1
frame : type(1) op(WRITE)
fdatasync 1
frame : type(1) op(WRITE)
libpthread 1
frame : type(1) op(WRITE)
llistxattr 1
frame : type(1) op(WRITE)
setfsid 1
frame : type(1) op(WRITE)
spinlock 1
frame : type(1) op(WRITE)
epoll.h 1
frame : type(1) op(WRITE)
xattr.h 1
frame : type(1) op(WRITE)
st_atim.tv_nsec 1
frame : type(1) op(WRITE)
package-string: glusterfs 3.2.5
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
... more of same
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-04-25 12:35:57
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2.5
/lib/libc.so.6(+0x33af0)[0x7f282cdddaf0]
/lib/libc.so.6(+0x33af0)[0x7f282cdddaf0]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_sync_cbk+0x30)[0x7f2829eaf060]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_sync_cbk+0x30)[0x7f2829eaf060]
/usr/lib/glusterfs/3.2.5/xlator/cluster/distribute.so(dht_writev_cbk+0xd3)[0x7f282a0c6a43]
/usr/lib/glusterfs/3.2.5/xlator/protocol/client.so(client3_1_writev+0x13a)[0x7f282a30e89a]
/usr/lib/glusterfs/3.2.5/xlator/protocol/client.so(client_writev+0xa4)[0x7f282a2f2aa4]
/usr/lib/glusterfs/3.2.5/xlator/cluster/distribute.so(dht_writev+0x162)[0x7f282a0cb982]
/usr/lib/glusterfs/3.2.5/xlator/cluster/distribute.so(dht_writev_cbk+0xd3)[0x7f282a0c6a43]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_sync+0x4fa)[0x7f2829ea838a]
/usr/lib/glusterfs/3.2.5/xlator/protocol/client.so(client3_1_writev+0x13a)[0x7f282a30e89a]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_do_ops+0x53)[0x7f2829eac443]
/usr/lib/glusterfs/3.2.5/xlator/protocol/client.so(client_writev+0xa4)[0x7f282a2f2aa4]
/usr/lib/glusterfs/3.2.5/xlator/cluster/distribute.so(dht_writev+0x162)[0x7f282a0cb982]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_process_queue+0xe8)[0x7f2829ea97c8]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_sync+0x4fa)[0x7f2829ea838a]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_writev+0x887)[0x7f2829eabf87]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_do_ops+0x53)[0x7f2829eac443]
/usr/lib/glusterfs/3.2.5/xlator/performance/read-ahead.so(ra_writev+0x18f)[0x7f2829c9ddaf]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_process_queue+0xe8)[0x7f2829ea97c8]
/usr/lib/glusterfs/3.2.5/xlator/performance/io-cache.so(ioc_writev+0x175)[0x7f2829a8d3f5]
/usr/lib/glusterfs/3.2.5/xlator/performance/write-behind.so(wb_sync_cbk+0xfa)[0x7f2829eaf12a]
/usr/lib/glusterfs/3.2.5/xlator/performance/quick-read.so(qr_writev+0x224)[0x7f2829880e84]
/usr/lib/glusterfs/3.2.5/xlator/cluster/distribute.so(dht_writev_cbk+0xd3)[0x7f282a0c6a43]
/usr/lib/glusterfs/3.2.5/xlator/protocol/client.so(client3_1_writev_cbk+0x515)[0x7f282a30acb5]
/usr/lib/glusterfs/3.2.5/xlator/performance/stat-prefetch.so(sp_writev+0x178)[0x7f28296683c8]
/usr/lib/glusterfs/3.2.5/xlator/debug/io-stats.so(io_stats_writev+0x1f6)[0x7f2829448766]
/usr/lib/glusterfs/3.2.5/xlator/mount/fuse.so(fuse_write_resume+0x181)[0x7f282bb97331]
/usr/lib/glusterfs/3.2.5/xlator/mount/fuse.so(fuse_resolve_and_resume+0x52)[0x7f282bb8d692]
/usr/lib/glusterfs/3.2.5/xlator/mount/fuse.so(+0x17e5d)[0x7f282bb9ee5d]
/lib/libpthread.so.0(+0x69ca)[0x7f282d1339ca]
/lib/libc.so.6(clone+0x6d)[0x7f282ce9070d]
---------
/usr/lib/libgfrpc.so.0(saved_frames_unwind+0x1c9)[0x7f282d56e3d9]
[2012-06-04 20:47:11.40519] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.5
[2012-06-04 20:47:11.468286] W [write-behind.c:3023:init] 0-scratch-write-behind: disabling write-behind for first 0 bytes
[2012-06-04 20:47:11.475928] I [client.c:1935:notify] 0-scratch-client-0: parent translators are ready, attempting connect on transport
[2012-06-04 20:47:11.476879] I [client.c:1935:notify] 0-scratch-client-1: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume scratch-client-0
  2:     type protocol/client
  3:     option remote-host storage2
  4:     option remote-subvolume /disk/scratch/scratch
  5:     option transport-type tcp
  6: end-volume
  7: 
  8: volume scratch-client-1
  9:     type protocol/client
 10:     option remote-host storage3
 11:     option remote-subvolume /disk/scratch/scratch
 12:     option transport-type tcp
 13: end-volume
 14: 
 15: volume scratch-dht
 16:     type cluster/distribute
 17:     subvolumes scratch-client-0 scratch-client-1
 18: end-volume
 19: 
 20: volume scratch-write-behind
 21:     type performance/write-behind
 22:     subvolumes scratch-dht
 23: end-volume
 24: 
 25: volume scratch-read-ahead
 26:     type performance/read-ahead
 27:     subvolumes scratch-write-behind
 28: end-volume
 29: 
 30: volume scratch-io-cache
 31:     type performance/io-cache
 32:     subvolumes scratch-read-ahead
 33: end-volume
 34: 
 35: volume scratch-quick-read
 36:     type performance/quick-read
 37:     subvolumes scratch-io-cache
 38: end-volume
 39: 
 40: volume scratch-stat-prefetch
 41:     type performance/stat-prefetch
 42:     subvolumes scratch-quick-read
 43: end-volume
 44: 
 45: volume scratch
 46:     type debug/io-stats
 47:     option latency-measurement off
 48:     option count-fop-hits off
 49:     subvolumes scratch-stat-prefetch
 50: end-volume

+------------------------------------------------------------------------------+
[2012-06-04 20:47:11.477867] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-scratch-client-0: changing port to 24009 (from 0)
[2012-06-04 20:47:11.478028] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-scratch-client-1: changing port to 24010 (from 0)
[2012-06-04 20:47:15.164693] I [client-handshake.c:1090:select_server_supported_programs] 0-scratch-client-0: Using Program GlusterFS 3.2.5, Num (1298437), Version (310)
[2012-06-04 20:47:15.165256] I [client-handshake.c:1090:select_server_supported_programs] 0-scratch-client-1: Using Program GlusterFS 3.2.5, Num (1298437), Version (310)
[2012-06-04 20:47:15.165504] I [client-handshake.c:913:client_setvolume_cbk] 0-scratch-client-0: Connected to 192.168.6.71:24009, attached to remote volume '/disk/scratch/scratch'.
[2012-06-04 20:47:15.165713] I [client-handshake.c:913:client_setvolume_cbk] 0-scratch-client-1: Connected to 192.168.6.72:24010, attached to remote volume '/disk/scratch/scratch'.
[2012-06-04 20:47:15.177925] I [fuse-bridge.c:3339:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-06-04 20:47:15.178193] I [fuse-bridge.c:2927:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13

Comment 2 Eduardo 2012-06-04 20:44:21 UTC
I Have the same problem with 3.2.6, time to time in a randon basis some 
server give-me the "Transport endpoint not connected".

I Have to reboot the server to make it connect again.

I run Fedora 16 and Gluster 3.2.6-2

Comment 3 csb sysadmin 2012-06-04 20:52:01 UTC
are you guys using rdma?

Comment 4 csb sysadmin 2012-06-04 20:52:43 UTC
(In reply to comment #3)
> are you guys using rdma?

nm dumb question, didn't see the client config at first.

Comment 5 b.candler 2012-06-04 21:04:12 UTC
In my case it's 10G ethernet (Intel X520-DA2 cards, SFP+ cables, Netgear XSM7224S switch)

Comment 6 Eduardo 2012-06-04 21:34:12 UTC
I´m using Gigabit ethernet cards, some with bond, in client and servers.

Comment 7 Amar Tumballi 2012-06-05 09:53:22 UTC
This bug is fixed with 3.2.6 release, and also is not valid in 3.3.0 release. Please upgrade to one of the above release.

*** This bug has been marked as a duplicate of bug 767359 ***


Note You need to log in before you can comment on or make changes to this bug.