Bug 761803 (GLUSTER-71) - dns failure causing "Transport endpoint is not connected"
Summary: dns failure causing "Transport endpoint is not connected"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-71
Product: GlusterFS
Classification: Community
Component: transport
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-25 07:09 UTC by Basavanagowda Kanur
Modified: 2009-08-04 05:10 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Basavanagowda Kanur 2009-06-25 07:09:34 UTC
[Migrated from savannah BTS] - bug 26462 [https://savannah.nongnu.org/bugs/index.php?26462]

Wed 06 May 2009 12:00:39 AM GMT, original submission by Erick Tryzelaar <erickt>:

We ran into a problem where one of the machines we intended to use happened to not be in our dns. When we touch a file, we get this error:

> touch /mnt/glusterfs/foo

touch: setting times of `/mnt/glusterfs/foo': Transport endpoint is not connected

This only happens when we have this specific client .vol:

volume server1
type protocol/client
option transport-type tcp
option remote-host server1
option remote-subvolume write-behind
end-volume

volume server2
type protocol/client
option transport-type tcp
option remote-host server2-does-not-exist
option remote-subvolume write-behind
end-volume

volume stripe1
type cluster/stripe
subvolumes server1 server2
end-volume

volume server3
type protocol/client
option transport-type tcp
option remote-host server3
option remote-subvolume write-behind
end-volume

volume server4
type protocol/client
option transport-type tcp
option remote-host server4
option remote-subvolume write-behind
end-volume

volume stripe2
type cluster/stripe
subvolumes server3 server4
end-volume

volume mirror
type cluster/replicate
subvolumes stripe1 stripe2
end-volume

volume distribute
type cluster/distribute
subvolumes mirror
end-volume

For completion, here's the server.vol:

volume posix
type storage/posix
option directory /tmp/gluster
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume io-threads
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume

volume read-ahead
type performance/read-ahead
option page-size 1MB
option page-count 4
subvolumes io-threads
end-volume

volume write-behind
type performance/write-behind
option aggregate-size 128KB
option window-size 1MB
subvolumes read-ahead
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.write-behind.allow *
subvolumes write-behind
end-volume

Anyway, if I remove any of the cluster translators, this error doesn't happen. I've attached the glusterfs.log if that helps.

Comment 1 Raghavendra G 2009-07-30 10:24:38 UTC
The bug was traced to replicate.

need_unwind was initialised to 1 in afr_utimens_wind_cbk causing replicate
to unwind just after first reply, irrespective of whether it was a success
or failure.

As a patch has been submitted for review at 
http://patches.gluster.com/patch/835/

Comment 2 Anand Avati 2009-07-30 16:01:09 UTC
PATCH: http://patches.gluster.com/patch/835 in release-2.0 (afr: fix afr_utimens to wait for success of utimens on atleast priv->wait_count children.)


Note You need to log in before you can comment on or make changes to this bug.