Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 764511 (GLUSTER-2779)

Summary:	If server is down and distribute volume used, writes should not fail
Product:	[Community] GlusterFS	Reporter:	Jacob Shucart <jacob>
Component:	core	Assignee:	Anand Avati <aavati>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	low	Docs Contact:
Priority:	medium
Version:	pre-2.0	CC:	amarts, chrisw, gluster-bugs, jdarcy, vijay
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jeff Darcy 2011-04-15 16:03:26 UTC

In a slightly different but related world, this feature is called "hinted handoff" based on an Amazon paper about their Dynamo system.  The key questions are:

* How do clients find the "hinted" location?

* How do the writes propagate to the "correct" server when it comes back up?

* What happens if that server *never* comes back up?

These are significant challenges even within the Dynamo world of atomic key/value pairs and weak consistency.  They're likely to be even more challenging in the context of a POSIX filesytem like GlusterFS, and the solution to the write-propagation issue in particular is likely to require a kind of server-to-server communication that GlusterFS has mostly eschewed until now.  Lastly, I don't think the current elastic hashing implementation in DHT lends itself very well to this.  For CloudFS I have written a translator that takes a more Dynamo-like approach in which additional or alternative bricks will automatically be found and used if the primary brick for a file is down.  This might support hinted handoff a little better, in addition to addressing issues of directory-operation scalability, replication performance, etc.  I'd be glad to collaborate with others on moving this forward, as it's currently stalled behind other CloudFS priorities.

For now and for the immediate future, I think the solution to this problem will be to run distribute over replicate, so that single-node failures are never visible to DHT.

Comment 1 Jacob Shucart 2011-04-15 18:26:17 UTC

If you write files to a distributed volume, and one of the servers is down, Gluster still attempts to write to that server and gets "Transport endpoint not connected" error.  Can we set things so that files will get written instead to a server that is up?

This was with mounting as glusterfs.

Comment 2 Amar Tumballi 2011-07-15 05:48:55 UTC

Jeff's answer is very apt. With the current design of Distribute (the hash based location finding), handling migration of a 'open' fd when a server goes down is not a feasibility. 

Recommended usage scenario if user wants high availability is to use replicate with distribute at the moment. I will be closing the bug as 'Wont fix'.