Hide Forgot
Created attachment 60 [details] Output of startx when using the installer-generated XF86Config
Fix typo: "I managed to create other directories on dx26" should be "I managed to create other directories on dx29"
One of our nodes (dx26) was forced to shut down due to hardware failure. The node was a part of glusterfs / NUFA cluster. I assumed that disabling a single node doesn't affect rest of the cluster, since it didn't contain any data and NUFA should use local nodes for new files. However, when I tried to create a directory 'ocsid-29' on another node, dx29, it fails with "Invalid argument" and the following message in the client log: [2009-08-16 15:44:08] D [dht-layout.c:101:dht_layout_search] nufa: no subvolume for hash (value) = 2926383828 [2009-08-16 15:44:08] D [dht-helper.c:228:dht_subvol_get_hashed] nufa: could not find subvolume for path=/ocsid-29 [2009-08-16 15:44:08] D [nufa.c:136:nufa_local_lookup_cbk] nufa: no subvolume in layout for path=/ocsid-29 [2009-08-16 15:44:08] W [fuse-bridge.c:432:fuse_entry_cbk] glusterfs-fuse: 157: LOOKUP() /ocsid-29 => -1 (Invalid argument) I managed to create other directories on dx26, e.g. "test", but deleting the directory failed with "Transport endpoint is not connected" and the following messages: [2009-08-16 15:48:43] D [client-protocol.c:2293:client_opendir] dx26-vol1: OPENDIR 70884283247 (/test): failed to get remote inode number [2009-08-16 15:48:43] D [dht-common.c:3188:dht_rmdir_opendir_cbk] nufa: opendir on dx26-vol1 for /test failed (Transport endpoint is not connected) [2009-08-16 15:48:43] D [client-protocol.c:1061:client_rmdir] dx26-vol1: RMDIR 1/test (/test): failed to get remote inode number for parent [2009-08-16 15:48:43] D [dht-common.c:3083:dht_rmdir_cbk] nufa: rmdir on dx26-vol1 for /test failed (Transport endpoint is not connected) [2009-08-16 15:48:43] D [client-protocol.c:952:client_mkdir] dx26-vol1: MKDIR 1/test (/test): failed to get remote inode number for parent I would assume that NUFA would survive from failures of individual nodes, as long as I don't try to read data that has been stored on the failed node. It is rather inconvenient if our whole 30-node NUFA cluster becomes unusable due to failure of a single volume. Why NUFA tries to connect to dx26 when I'm creating directory on dx29 anways - isn't it supposed to use the volume specified 'local-volume-name'? Please find my volume file attached.
*** This bug has been marked as a duplicate of bug 114 ***