Bug 1314202 - An operation will make "Transport endpoint is not connected" error.
Summary: An operation will make "Transport endpoint is not connected" error.
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: stripe
Version: 3.7.8
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-03 08:25 UTC by vori003
Modified: 2017-03-08 10:55 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-03-08 10:55:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
/var/log/gluster/mnt-vol.log (262.42 KB, text/plain)
2016-03-09 01:52 UTC, vori003
no flags Details

Description vori003 2016-03-03 08:25:30 UTC
Description of problem:

An operation series always make an error: Transport endpoint is not connected

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

[misc. our server settings]
node: node0, node1, node2, node4
vol: stripe 2 replica 2

1. mount the same vol in two node., e.g., /mnt/glsvol
2. cd into the mounted volume in the both nodes, e.g, cd /mnt/glsvol
3.run "while [ 1 ]; do echo "a" > b; mv b a; done" in a node. 
4.run "while [ 1 ]; do cat a; done " in the another node.

Actual results:
cat: a:  "Transport endpoint is not connected"
cat: a:  "Transport endpoint is not connected"
....

And the mounted volume will be disabled.
It come back with
1. umount /mnt/glsvol
2. mount /mnt/glsvol

Expected results:
a
a
... 

Additional info:

Reproducibility: 100%

Comment 1 Niels de Vos 2016-03-08 12:24:07 UTC
Please attach the log for the mountpoints from both systems. They would be named /var/log/glusterfs/mnt-glsvol.log.

Could you try on different test-volumes with different properties and report if you can reproduce the problem?
- a single brick
- two bricks in replicated mode
- two bricks in stripe mode

Note that we do not recommend using stripe at all. An improved version of striping has been introduced with glusterfs-3.7.0 called "sharding".

Comment 2 vori003 2016-03-09 01:52:13 UTC
Created attachment 1134368 [details]
/var/log/gluster/mnt-vol.log

In original file, some messages had been written in Japanese. 
These messages were translated to English.

Translated messages are the following three.
1. no such file or directory
2. Invalid arguments 
3. Operation not permitted

Comment 3 vori003 2016-03-09 02:12:19 UTC
I have attached the log file.

Unfortunately, we have already leaved from test phase, 
and also have no so much man power.
So, we cannot test more in our environment.
We are sorry for that.

Comment 4 Niels de Vos 2016-03-09 05:30:12 UTC
The log contains a segmentation fault, this would have caused the "Transport endpoint is not connected" error on the client.

Filtered stack:

libglusterfs.so.0(_gf_msg_backtrace_nomem)
libglusterfs.so.0(gf_print_trace)
libc.so.6()
glusterfs/3.7.8/xlator/cluster/stripe.so(stripe_readv_fstat_cbk)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_fstat_wind)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_read_txn_refresh_done)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_inode_refresh_done)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_inode_refresh_subvol_cbk)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_inode_refresh_subvol_with_fstat_cbk)
glusterfs/3.7.8/xlator/protocol/client.so(client3_3_fstat_cbk)
libgfrpc.so.0(rpc_clnt_handle_reply)
libgfrpc.so.0(rpc_clnt_notify)
libgfrpc.so.0(rpc_transport_notify)
glusterfs/3.7.8/rpc-transport/socket.so()
glusterfs/3.7.8/rpc-transport/socket.so()
libglusterfs.so.0()
libpthread.so.0()
libc.so.6(clone)


My strong recommendation is to re-create the volume and remove the stripe layer. Stripe is surely *not* what you want in any case.

  http://joejulian.name/blog/should-i-use-stripe-on-glusterfs/

If you have big files that would benefit from being split into smaller piecees to get distributed, you should enable sharding instead. Sharding is much more tested than stripe, and is actively maintained. Stripe will most likely be removed in an upcoming release, we are not spending much time on fixing its bugs. More information on sharding can be find on the blog of the main developer:

  http://blog.gluster.org/2015/12/introducing-shard-translator/

Comment 5 vori003 2016-03-09 06:09:55 UTC
Thank you for interesting suggestion.

Of cause I am looking forward to shading. 
However, to my knowledge, shading is still under "experimental".

I do not have enough information about stable striping vs experimental shading, 
to decide use of shading.

But..., OK, I will try it because you, developers, are strongly recommending use of shading.
Our project is still in early stage, so now we can go back to test phase yet.

Comment 6 Niels de Vos 2016-03-09 07:43:26 UTC
Sharding is not experimental anymore. It was when glusterfs-3.7.0 was released, but in the mean time many bug fixes have been included. You are already on 3.7.8 and sharding should be very stable with that.

Comment 7 vori003 2016-03-09 07:54:06 UTC
I got it. 
Now, we are starting to restructure our glusterfs with sharding.
Thank you so much for your kindly follow up.

Comment 8 Kaushal 2017-03-08 10:55:48 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.


Note You need to log in before you can comment on or make changes to this bug.