Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 761803 (GLUSTER-71)

Summary:	dns failure causing "Transport endpoint is not connected"
Product:	[Community] GlusterFS	Reporter:	Basavanagowda Kanur <gowda>
Component:	transport	Assignee:	Raghavendra G <raghavendra>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	low	Docs Contact:
Priority:	low
Version:	mainline	CC:	gluster-bugs, gowda
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Basavanagowda Kanur 2009-06-25 07:09:34 UTC

[Migrated from savannah BTS] - bug 26462 [https://savannah.nongnu.org/bugs/index.php?26462]

Wed 06 May 2009 12:00:39 AM GMT, original submission by Erick Tryzelaar <erickt>:

We ran into a problem where one of the machines we intended to use happened to not be in our dns. When we touch a file, we get this error:

> touch /mnt/glusterfs/foo

touch: setting times of `/mnt/glusterfs/foo': Transport endpoint is not connected

This only happens when we have this specific client .vol:

volume server1
type protocol/client
option transport-type tcp
option remote-host server1
option remote-subvolume write-behind
end-volume

volume server2
type protocol/client
option transport-type tcp
option remote-host server2-does-not-exist
option remote-subvolume write-behind
end-volume

volume stripe1
type cluster/stripe
subvolumes server1 server2
end-volume

volume server3
type protocol/client
option transport-type tcp
option remote-host server3
option remote-subvolume write-behind
end-volume

volume server4
type protocol/client
option transport-type tcp
option remote-host server4
option remote-subvolume write-behind
end-volume

volume stripe2
type cluster/stripe
subvolumes server3 server4
end-volume

volume mirror
type cluster/replicate
subvolumes stripe1 stripe2
end-volume

volume distribute
type cluster/distribute
subvolumes mirror
end-volume

For completion, here's the server.vol:

volume posix
type storage/posix
option directory /tmp/gluster
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume io-threads
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume

volume read-ahead
type performance/read-ahead
option page-size 1MB
option page-count 4
subvolumes io-threads
end-volume

volume write-behind
type performance/write-behind
option aggregate-size 128KB
option window-size 1MB
subvolumes read-ahead
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.write-behind.allow *
subvolumes write-behind
end-volume

Anyway, if I remove any of the cluster translators, this error doesn't happen. I've attached the glusterfs.log if that helps.

Comment 1 Raghavendra G 2009-07-30 10:24:38 UTC

The bug was traced to replicate.

need_unwind was initialised to 1 in afr_utimens_wind_cbk causing replicate
to unwind just after first reply, irrespective of whether it was a success
or failure.

As a patch has been submitted for review at 
http://patches.gluster.com/patch/835/

Comment 2 Anand Avati 2009-07-30 16:01:09 UTC

PATCH: http://patches.gluster.com/patch/835 in release-2.0 (afr: fix afr_utimens to wait for success of utimens on atleast priv->wait_count children.)