Hide Forgot
[Migrated from savannah BTS] - bug 26462 [https://savannah.nongnu.org/bugs/index.php?26462] Wed 06 May 2009 12:00:39 AM GMT, original submission by Erick Tryzelaar <erickt>: We ran into a problem where one of the machines we intended to use happened to not be in our dns. When we touch a file, we get this error: > touch /mnt/glusterfs/foo touch: setting times of `/mnt/glusterfs/foo': Transport endpoint is not connected This only happens when we have this specific client .vol: volume server1 type protocol/client option transport-type tcp option remote-host server1 option remote-subvolume write-behind end-volume volume server2 type protocol/client option transport-type tcp option remote-host server2-does-not-exist option remote-subvolume write-behind end-volume volume stripe1 type cluster/stripe subvolumes server1 server2 end-volume volume server3 type protocol/client option transport-type tcp option remote-host server3 option remote-subvolume write-behind end-volume volume server4 type protocol/client option transport-type tcp option remote-host server4 option remote-subvolume write-behind end-volume volume stripe2 type cluster/stripe subvolumes server3 server4 end-volume volume mirror type cluster/replicate subvolumes stripe1 stripe2 end-volume volume distribute type cluster/distribute subvolumes mirror end-volume For completion, here's the server.vol: volume posix type storage/posix option directory /tmp/gluster end-volume volume locks type features/locks subvolumes posix end-volume volume io-threads type performance/io-threads option thread-count 8 subvolumes locks end-volume volume read-ahead type performance/read-ahead option page-size 1MB option page-count 4 subvolumes io-threads end-volume volume write-behind type performance/write-behind option aggregate-size 128KB option window-size 1MB subvolumes read-ahead end-volume volume server type protocol/server option transport-type tcp option auth.addr.write-behind.allow * subvolumes write-behind end-volume Anyway, if I remove any of the cluster translators, this error doesn't happen. I've attached the glusterfs.log if that helps.
The bug was traced to replicate. need_unwind was initialised to 1 in afr_utimens_wind_cbk causing replicate to unwind just after first reply, irrespective of whether it was a success or failure. As a patch has been submitted for review at http://patches.gluster.com/patch/835/
PATCH: http://patches.gluster.com/patch/835 in release-2.0 (afr: fix afr_utimens to wait for success of utimens on atleast priv->wait_count children.)