Bug 761803 (GLUSTER-71)

Summary: dns failure causing "Transport endpoint is not connected"
Product: [Community] GlusterFS Reporter: Basavanagowda Kanur <gowda>
Component: transportAssignee: Raghavendra G <raghavendra>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs, gowda
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Basavanagowda Kanur 2009-06-25 07:09:34 UTC
[Migrated from savannah BTS] - bug 26462 [https://savannah.nongnu.org/bugs/index.php?26462]

Wed 06 May 2009 12:00:39 AM GMT, original submission by Erick Tryzelaar <erickt>:

We ran into a problem where one of the machines we intended to use happened to not be in our dns. When we touch a file, we get this error:

> touch /mnt/glusterfs/foo

touch: setting times of `/mnt/glusterfs/foo': Transport endpoint is not connected

This only happens when we have this specific client .vol:

volume server1
type protocol/client
option transport-type tcp
option remote-host server1
option remote-subvolume write-behind
end-volume

volume server2
type protocol/client
option transport-type tcp
option remote-host server2-does-not-exist
option remote-subvolume write-behind
end-volume

volume stripe1
type cluster/stripe
subvolumes server1 server2
end-volume

volume server3
type protocol/client
option transport-type tcp
option remote-host server3
option remote-subvolume write-behind
end-volume

volume server4
type protocol/client
option transport-type tcp
option remote-host server4
option remote-subvolume write-behind
end-volume

volume stripe2
type cluster/stripe
subvolumes server3 server4
end-volume

volume mirror
type cluster/replicate
subvolumes stripe1 stripe2
end-volume

volume distribute
type cluster/distribute
subvolumes mirror
end-volume

For completion, here's the server.vol:

volume posix
type storage/posix
option directory /tmp/gluster
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume io-threads
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume

volume read-ahead
type performance/read-ahead
option page-size 1MB
option page-count 4
subvolumes io-threads
end-volume

volume write-behind
type performance/write-behind
option aggregate-size 128KB
option window-size 1MB
subvolumes read-ahead
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.write-behind.allow *
subvolumes write-behind
end-volume

Anyway, if I remove any of the cluster translators, this error doesn't happen. I've attached the glusterfs.log if that helps.

Comment 1 Raghavendra G 2009-07-30 10:24:38 UTC
The bug was traced to replicate.

need_unwind was initialised to 1 in afr_utimens_wind_cbk causing replicate
to unwind just after first reply, irrespective of whether it was a success
or failure.

As a patch has been submitted for review at 
http://patches.gluster.com/patch/835/

Comment 2 Anand Avati 2009-07-30 16:01:09 UTC
PATCH: http://patches.gluster.com/patch/835 in release-2.0 (afr: fix afr_utimens to wait for success of utimens on atleast priv->wait_count children.)