Bug 761950 (GLUSTER-218) - NUFA fails if a single volume is missing
Summary: NUFA fails if a single volume is missing
Keywords:
Status: CLOSED DUPLICATE of bug 761846
Alias: GLUSTER-218
Product: GlusterFS
Classification: Community
Component: nufa
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-16 23:11 UTC by Ville Tuulos
Modified: 2015-12-01 16:45 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
NUFA volume file (7.93 KB, text/plain)
2009-08-16 20:12 UTC, Ville Tuulos
no flags Details

Description Ville Tuulos 2009-08-16 20:12:10 UTC
Created attachment 60 [details]
Output of startx when using the installer-generated XF86Config

Comment 1 Ville Tuulos 2009-08-16 20:13:27 UTC
Fix typo:

"I managed to create other directories on dx26"

should be 

"I managed to create other directories on dx29"

Comment 2 Ville Tuulos 2009-08-16 23:11:16 UTC
One of our nodes (dx26) was forced to shut down due to hardware failure. The node was a part of glusterfs / NUFA cluster. I assumed that disabling a single node doesn't affect rest of the cluster, since it didn't contain any data and NUFA should use local nodes for new files.

However, when I tried to create a directory 'ocsid-29' on another node, dx29, it fails with "Invalid argument" and the following message in the client log:

[2009-08-16 15:44:08] D [dht-layout.c:101:dht_layout_search] nufa: no subvolume for hash (value) = 2926383828
[2009-08-16 15:44:08] D [dht-helper.c:228:dht_subvol_get_hashed] nufa: could not find subvolume for path=/ocsid-29
[2009-08-16 15:44:08] D [nufa.c:136:nufa_local_lookup_cbk] nufa: no subvolume in layout for path=/ocsid-29
[2009-08-16 15:44:08] W [fuse-bridge.c:432:fuse_entry_cbk] glusterfs-fuse: 157: LOOKUP() /ocsid-29 => -1 (Invalid argument)

I managed to create other directories on dx26, e.g. "test", but deleting the directory failed with "Transport endpoint is not connected" and the following messages:

[2009-08-16 15:48:43] D [client-protocol.c:2293:client_opendir] dx26-vol1: OPENDIR 70884283247 (/test): failed to get remote inode number
[2009-08-16 15:48:43] D [dht-common.c:3188:dht_rmdir_opendir_cbk] nufa: opendir on dx26-vol1 for /test failed (Transport endpoint is not connected)
[2009-08-16 15:48:43] D [client-protocol.c:1061:client_rmdir] dx26-vol1: RMDIR 1/test (/test): failed to get remote inode number for parent
[2009-08-16 15:48:43] D [dht-common.c:3083:dht_rmdir_cbk] nufa: rmdir on dx26-vol1 for /test failed (Transport endpoint is not connected)
[2009-08-16 15:48:43] D [client-protocol.c:952:client_mkdir] dx26-vol1: MKDIR 1/test (/test): failed to get remote inode number for parent

I would assume that NUFA would survive from failures of individual nodes, as long as I don't try to read data that has been stored on the failed node. It is rather inconvenient if our whole 30-node NUFA cluster becomes unusable due to failure of a single volume.

Why NUFA tries to connect to dx26 when I'm creating directory on dx29 anways - isn't it supposed to use the volume specified 'local-volume-name'?

Please find my volume file attached.

Comment 3 Amar Tumballi 2010-03-09 10:29:50 UTC

*** This bug has been marked as a duplicate of bug 114 ***


Note You need to log in before you can comment on or make changes to this bug.