Bug 764620 (GLUSTER-2888) - "Key Node" problem in DHT
Summary: "Key Node" problem in DHT
Keywords:
Status: CLOSED DUPLICATE of bug 764264
Alias: GLUSTER-2888
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.1.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-09 02:18 UTC by hz02ruc
Modified: 2013-12-09 01:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description hz02ruc 2011-05-09 02:18:36 UTC
Background:
We use GlusterFS in our system configured with DHT(Data loss is ok for us because
we have other copies outside the system). When a server is down,  we don't want to
stop our service, and failing to reading/writing from/to a file hashed to the node
that is down is acceptable. In other words, we want to continue our service with
the rest servers before recovering the failed node. 

Sometimes when a node is down, the access(ls command) to client is ok, read/write
also returns ok unless target file is hashed to the failed node. this is just what
we want. 

But there is also situation when a node is down,  'ls MOUNTPOINT' output
"transport endpoint is not connected", and all of read/write upon it fail. 

So i wonder if there is a "key node" in DHT only system, and if it exists, 
is it a bug? because it influences on all the access to system.


Below is the procedure how i find the "key node".

Test environment:
Server: 3 nodes, each with one brick.
Client: configured with DHT.

In my environment, 
(1) 
Kill the glusterfsd process of node2, 
'ls MOUNTPOINT' output "transport endpoint is not connected",  and read/write 
also returns error.

(2)
Kill the glusterfsd of process of node1 or node 3 or both of them, and leave node2,
'ls MOUNTPOINT' runs ok, and output filenames of node2. File read/write returns ok if the
file is not hashed to failed nodes.(read/write from/to node1 &node3 fails is acceptable
because it's down).

In order to find which is the "key node" ,  i did some debugs and read some sources about dht.
I found that after the glusterfsd process of node2 is killed, 'ls MOUNTPOINT' will trigger a 
invocation of dht_lookup() .

In dht_lookup, first find the cached subvol of MOUNTPOINT, and do lookup in cached_subvol. 
In my environment, the cached subvol correspond to node2, because node2 is down, 
lookup returns -1 and the errno is 107(transport endpoint is not connected). 
Then dht_revalidate_cbk() simply UNWIND the stack and send err to fuse, and output the error
in command line.

but in situation (2), because the cached subvol(node 2)is ok and dht_lookup return ok.
so ls command runs normally.


Conclusion:
if dht_lookup the MOUNTPOINT and its cached subvol is down, the operations return err, ls
command output err, read/write also fails.



At last:
When the cached subvol is down, if it tries to do lookups on all other nodes, we might avoid
the "key node" problem. 

Remove the brick on the failed node and restart the client can runs ok, but most of times,
we cannot stop our own service that read/write from/to the client, so it's no acceptable.

I don't know whether my understanding is right, please correct me if these is any error. 
If you have any advice to solve the problem mentioned in the beginning, please tell me.
Thanks in advance.

Comment 1 shishir gowda 2011-05-09 03:44:29 UTC
Hi,

This bug was fixed as part of the bug 764264.
This is part of 3.2 release, and a 3.1 release with this patch will be out soon.

Patches are available here:
release.3-1 http://patches.gluster.com/patch/6729/
release.3-2 http://patches.gluster.com/patch/6728/

With this patch, the lookup will go to all subvols, and not just the cached subvol

*** This bug has been marked as a duplicate of bug 2532 ***


Note You need to log in before you can comment on or make changes to this bug.