Bug 764511 (GLUSTER-2779) - If server is down and distribute volume used, writes should not fail
Summary: If server is down and distribute volume used, writes should not fail
Keywords:
Status: CLOSED WONTFIX
Alias: GLUSTER-2779
Product: GlusterFS
Classification: Community
Component: core
Version: pre-2.0
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
Assignee: Anand Avati
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-15 18:26 UTC by Jacob Shucart
Modified: 2015-09-01 23:05 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Jeff Darcy 2011-04-15 16:03:26 UTC
In a slightly different but related world, this feature is called "hinted handoff" based on an Amazon paper about their Dynamo system.  The key questions are:

* How do clients find the "hinted" location?

* How do the writes propagate to the "correct" server when it comes back up?

* What happens if that server *never* comes back up?

These are significant challenges even within the Dynamo world of atomic key/value pairs and weak consistency.  They're likely to be even more challenging in the context of a POSIX filesytem like GlusterFS, and the solution to the write-propagation issue in particular is likely to require a kind of server-to-server communication that GlusterFS has mostly eschewed until now.  Lastly, I don't think the current elastic hashing implementation in DHT lends itself very well to this.  For CloudFS I have written a translator that takes a more Dynamo-like approach in which additional or alternative bricks will automatically be found and used if the primary brick for a file is down.  This might support hinted handoff a little better, in addition to addressing issues of directory-operation scalability, replication performance, etc.  I'd be glad to collaborate with others on moving this forward, as it's currently stalled behind other CloudFS priorities.

For now and for the immediate future, I think the solution to this problem will be to run distribute over replicate, so that single-node failures are never visible to DHT.

Comment 1 Jacob Shucart 2011-04-15 18:26:17 UTC
If you write files to a distributed volume, and one of the servers is down, Gluster still attempts to write to that server and gets "Transport endpoint not connected" error.  Can we set things so that files will get written instead to a server that is up?

This was with mounting as glusterfs.

Comment 2 Amar Tumballi 2011-07-15 05:48:55 UTC
Jeff's answer is very apt. With the current design of Distribute (the hash based location finding), handling migration of a 'open' fd when a server goes down is not a feasibility. 

Recommended usage scenario if user wants high availability is to use replicate with distribute at the moment. I will be closing the bug as 'Wont fix'.


Note You need to log in before you can comment on or make changes to this bug.