Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1314202

Summary:

An operation will make "Transport endpoint is not connected" error.

Product:

[Community] GlusterFS

Reporter:

vori003

Component:

stripe

Assignee:

bugs <bugs>

Status:

CLOSED EOL

QA Contact:

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

3.7.8

CC:

bugs, kdhananj, ndevos, vori003

Target Milestone:

---

Keywords:

Triaged

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-03-08 10:55:48 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
/var/log/gluster/mnt-vol.log	none

Description vori003 2016-03-03 08:25:30 UTC

Description of problem:

An operation series always make an error: Transport endpoint is not connected

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

[misc. our server settings]
node: node0, node1, node2, node4
vol: stripe 2 replica 2

1. mount the same vol in two node., e.g., /mnt/glsvol
2. cd into the mounted volume in the both nodes, e.g, cd /mnt/glsvol
3.run "while [ 1 ]; do echo "a" > b; mv b a; done" in a node. 
4.run "while [ 1 ]; do cat a; done " in the another node.

Actual results:
cat: a:  "Transport endpoint is not connected"
cat: a:  "Transport endpoint is not connected"
....

And the mounted volume will be disabled.
It come back with
1. umount /mnt/glsvol
2. mount /mnt/glsvol

Expected results:
a
a
... 

Additional info:

Reproducibility: 100%

Comment 1 Niels de Vos 2016-03-08 12:24:07 UTC

Please attach the log for the mountpoints from both systems. They would be named /var/log/glusterfs/mnt-glsvol.log.

Could you try on different test-volumes with different properties and report if you can reproduce the problem?
- a single brick
- two bricks in replicated mode
- two bricks in stripe mode

Note that we do not recommend using stripe at all. An improved version of striping has been introduced with glusterfs-3.7.0 called "sharding".

Comment 2 vori003 2016-03-09 01:52:13 UTC

Created attachment 1134368 [details]
/var/log/gluster/mnt-vol.log

In original file, some messages had been written in Japanese. 
These messages were translated to English.

Translated messages are the following three.
1. no such file or directory
2. Invalid arguments 
3. Operation not permitted

Comment 3 vori003 2016-03-09 02:12:19 UTC

I have attached the log file.

Unfortunately, we have already leaved from test phase, 
and also have no so much man power.
So, we cannot test more in our environment.
We are sorry for that.

Comment 4 Niels de Vos 2016-03-09 05:30:12 UTC

The log contains a segmentation fault, this would have caused the "Transport endpoint is not connected" error on the client.

Filtered stack:

libglusterfs.so.0(_gf_msg_backtrace_nomem)
libglusterfs.so.0(gf_print_trace)
libc.so.6()
glusterfs/3.7.8/xlator/cluster/stripe.so(stripe_readv_fstat_cbk)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_fstat_wind)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_read_txn_refresh_done)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_inode_refresh_done)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_inode_refresh_subvol_cbk)
glusterfs/3.7.8/xlator/cluster/replicate.so(afr_inode_refresh_subvol_with_fstat_cbk)
glusterfs/3.7.8/xlator/protocol/client.so(client3_3_fstat_cbk)
libgfrpc.so.0(rpc_clnt_handle_reply)
libgfrpc.so.0(rpc_clnt_notify)
libgfrpc.so.0(rpc_transport_notify)
glusterfs/3.7.8/rpc-transport/socket.so()
glusterfs/3.7.8/rpc-transport/socket.so()
libglusterfs.so.0()
libpthread.so.0()
libc.so.6(clone)


My strong recommendation is to re-create the volume and remove the stripe layer. Stripe is surely *not* what you want in any case.

  http://joejulian.name/blog/should-i-use-stripe-on-glusterfs/

If you have big files that would benefit from being split into smaller piecees to get distributed, you should enable sharding instead. Sharding is much more tested than stripe, and is actively maintained. Stripe will most likely be removed in an upcoming release, we are not spending much time on fixing its bugs. More information on sharding can be find on the blog of the main developer:

  http://blog.gluster.org/2015/12/introducing-shard-translator/

Comment 5 vori003 2016-03-09 06:09:55 UTC

Thank you for interesting suggestion.

Of cause I am looking forward to shading. 
However, to my knowledge, shading is still under "experimental".

I do not have enough information about stable striping vs experimental shading, 
to decide use of shading.

But..., OK, I will try it because you, developers, are strongly recommending use of shading.
Our project is still in early stage, so now we can go back to test phase yet.

Comment 6 Niels de Vos 2016-03-09 07:43:26 UTC

Sharding is not experimental anymore. It was when glusterfs-3.7.0 was released, but in the mean time many bug fixes have been included. You are already on 3.7.8 and sharding should be very stable with that.

Comment 7 vori003 2016-03-09 07:54:06 UTC

I got it. 
Now, we are starting to restructure our glusterfs with sharding.
Thank you so much for your kindly follow up.

Comment 8 Kaushal 2017-03-08 10:55:48 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.