Bug 1062287 - On breaking the connection between replicated volumes certain files return -ENOTCONN
Summary: On breaking the connection between replicated volumes certain files return -E...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.4.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-06 15:15 UTC by Anirban Ghoshal
Modified: 2015-10-07 13:50 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-10-07 13:49:43 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Anirban Ghoshal 2014-02-06 15:15:41 UTC
Description of problem:

Suppose you have two servers, server1 and server2 is a replica 2 setup. You have,
a) testvol - a replica 2 volume
b) directory /testvol on each of server1 and server2 where testvol is mounted.

Version-Release number of selected component (if applicable):

3.4.2, 3.4.0 

How reproducible:
Intermittent (higher for very high number of soft links and deep directory nesting)

Steps to Reproduce:

1. You have a replica 2 volume (testvol) on server1 and server2. You assume that on either server, it is also locally mounted via mount.glusterfs at /testvol.
2. You have a large number of soft-linked files within the volume.
3. You check heal info (all its facets) to ensure not a single file is out of sync.
4. You abrupty take down the ethernet device over which the servers are conencted (ip link set <eth-dev> down).

Actual results:

1. On one of the servers (say, server1 for definiteness), if you do an 'ls -l' readlink returns 'Transport endpoint is not connected'.
2. The error resolves all by itself if you get the eth-link up.
3. The error is intermittent, and not all soft-linked files have the issue.
4. If you take a directory containing soft-linked files, and if you do a ls -l _on_the_directory, like so,

server1$ ls -l /testvol/somedir/bin/

ls: cannot read symbolic link /testvol/somedir/bin/reset: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/bzless: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/i386: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/kill: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/linux32: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/linux64: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/logger: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/x86_64: Transport endpoint is not connected
ls: cannot read symbolic link /testvol/somedir/bin/python2: Transport endpoint is not connected
connected


5. If, however, you take a faulty soft-link and do an ls -l on it directly, then it rights itself immediately.

server1$ ls -l /testvol/somedir/bin/x86_64
lrwxrwxrwx 1 root root 7 May  7 23:11 /testvol/somedir/bin/x86_64 -> setarch

6. The error also goes away if a remount is performed on the volume at server1.

Expected results:

Step (5) and (6) indicated in section: 'Actual results' indicates that all the data to complete the readlink is actually 'there' on server1. Hence, readlink should succeed.

Additional info:

Comment 1 Anirban Ghoshal 2014-02-06 15:17:33 UTC
I tried raising the client log level to 'trace'. Here's what I saw:

Upon READLINK failures, (ls -l /testvol/somedir/bin/):

[2010-05-09 01:13:28.140265] T [fuse-bridge.c:2453:fuse_readdir_cbk] 0-glusterfs-fuse: 2783484: READDIR => 23/4096,1380
[2010-05-09 01:13:28.140444] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 45
[2010-05-09 01:13:28.140477] T [fuse-bridge.c:708:fuse_getattr_resume] 0-glusterfs-fuse: 2783485: GETATTR 140299577689176 (/testvol/somedir/bin)
[2010-05-09 01:13:28.140618] T [fuse-bridge.c:641:fuse_attr_cbk] 0-glusterfs-fuse: 2783485: STAT() /testvol/somedir/bin => -5626802993936595428
[2010-05-09 01:13:28.140722] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52
[2010-05-09 01:13:28.140737] T [fuse-bridge.c:506:fuse_lookup_resume] 0-glusterfs-fuse: 2783486: LOOKUP /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
[2010-05-09 01:13:28.140851] T [fuse-bridge.c:376:fuse_entry_cbk] 0-glusterfs-fuse: 2783486: LOOKUP() /testvol/somedir/bin/x86_64 => -4857810743645185021
[2010-05-09 01:13:28.140954] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52
[2010-05-09 01:13:28.140975] T [fuse-bridge.c:1296:fuse_readlink_resume] 0-glusterfs-fuse: 2783487 READLINK /testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03 
[2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child] 0-testvol-replicate-0: Returning -107, call_child: -1, last_index: -1
[2010-05-09 01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk] 0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 => -1 (Transport endpoint is not connected)

Upon successful readlink (ls -l /testvol/somedir/bin/x86_64):

[2010-05-09 01:13:37.717904] T [fuse-bridge.c:376:fuse_entry_cbk] 0-glusterfs-fuse: 2790073: LOOKUP() /testvol/somedir/bin => -5626802993936595428
[2010-05-09 01:13:37.718070] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52
[2010-05-09 01:13:37.718127] T [fuse-bridge.c:506:fuse_lookup_resume] 0-glusterfs-fuse: 2790074: LOOKUP /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
[2010-05-09 01:13:37.718306] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-testvol-replicate-0: /testvol/somedir/bin/x86_64: failed to get the gfid from dict
[2010-05-09 01:13:37.718355] T [rpc-clnt.c:1301:rpc_clnt_record] 0-testvol-client-1: Auth Info: pid: 3343, uid: 0, gid: 0, owner: 0000000000000000
[2010-05-09 01:13:37.718383] T [rpc-clnt.c:1181:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 464, payload: 368, rpc hdr: 96
[2010-05-09 01:13:37.718459] T [rpc-clnt.c:1498:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x1527955x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (testvol-client-1)
[2010-05-09 01:13:37.718775] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-testvol-client-1: received rpc message (RPC XID: 0x1527955x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (testvol-client-1)
[2010-05-09 01:13:37.718848] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvol-replicate-0: pending_matrix: [ 0 0 ]
[2010-05-09 01:13:37.718884] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvol-replicate-0: pending_matrix: [ 1 0 ]
[2010-05-09 01:13:37.718893] D [afr-self-heal-common.c:887:afr_mark_sources] 0-testvol-replicate-0: Number of sources: 1
[2010-05-09 01:13:37.718903] D [afr-self-heal-data.c:814:afr_lookup_select_read_child_by_txn_type] 0-testvol-replicate-0: returning read_child: 1
[2010-05-09 01:13:37.718916] D [afr-common.c:1397:afr_lookup_select_read_child] 0-testvol-replicate-0: Source selected as 1 for /testvol/somedir/bin/x86_64
[2010-05-09 01:13:37.718927] D [afr-common.c:1134:afr_lookup_build_response_params] 0-testvol-replicate-0: Building lookup response from 1
[2010-05-09 01:13:37.718954] D [afr-common.c:1749:afr_lookup_perform_self_heal] 0-testvol-replicate-0: Only 1 child up - do not attempt to detect self heal
[2010-05-09 01:13:37.718970] T [io-cache.c:224:ioc_lookup_cbk] 0-testvol-io-cache: locked inode(0x7f9a057d3fe0)
[2010-05-09 01:13:37.718981] T [io-cache.c:233:ioc_lookup_cbk] 0-testvol-io-cache: unlocked inode(0x7f9a057d3fe0)
[2010-05-09 01:13:37.718991] T [io-cache.c:128:ioc_inode_flush] 0-testvol-io-cache: locked inode(0x7f9a057d3fe0)
[2010-05-09 01:13:37.719001] T [io-cache.c:132:ioc_inode_flush] 0-testvol-io-cache: unlocked inode(0x7f9a057d3fe0)
[2010-05-09 01:13:37.719010] T [io-cache.c:242:ioc_lookup_cbk] 0-testvol-io-cache: locked table(0x668ec0)
[2010-05-09 01:13:37.719020] T [io-cache.c:247:ioc_lookup_cbk] 0-testvol-io-cache: unlocked table(0x668ec0)
[2010-05-09 01:13:37.719050] T [fuse-bridge.c:376:fuse_entry_cbk] 0-glusterfs-fuse: 2790074: LOOKUP() /testvol/somedir/bin/x86_64 => -4857810743645185021
[2010-05-09 01:13:37.719239] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52
[2010-05-09 01:13:37.719291] T [fuse-bridge.c:1296:fuse_readlink_resume] 0-glusterfs-fuse: 2790075 READLINK /testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03
[2010-05-09 01:13:37.719426] D [afr-common.c:760:afr_get_call_child] 0-testvol-replicate-0: Returning 0, call_child: 1, last_index: -1
[2010-05-09 01:13:37.719468] T [rpc-clnt.c:1301:rpc_clnt_record] 0-testvol-client-1: Auth Info: pid: 3343, uid: 0, gid: 0, owner: 0000000000000000
[2010-05-09 01:13:37.719485] T [rpc-clnt.c:1181:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 120, payload: 24, rpc hdr: 96
[2010-05-09 01:13:37.719561] T [rpc-clnt.c:1498:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x1527956x Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) to rpc-transport (testvol-client-1)
[2010-05-09 01:13:37.719728] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-testvol-client-1: received rpc message (RPC XID: 0x1527956x Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) from rpc-transport (testvol-client-1)
[2010-05-09 01:13:37.719780] T [fuse-bridge.c:1265:fuse_readlink_cbk] 0-glusterfs-fuse: 2790075: /testvol/somedir/bin/x86_64 => setarch



I cannot quite figure out how the whole thing works, but looks like in the failure case we end up with the wrong set of read_child(ren). If we first take a look at the fuse_lookup_cbk, which I reproduce here,


static int
fuse_lookup_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
                 int32_t op_ret, int32_t op_errno,
                 inode_t *inode, struct iatt *stat, dict_t *dict,
                 struct iatt *postparent)
{
        fuse_state_t            *state = NULL;
        call_frame_t            *prev = NULL;
        inode_table_t           *itable = NULL;

        state = frame->root->state;
        prev  = cookie;

        if (op_ret == -1 && state->is_revalidate == 1) {
                itable = state->itable;
                inode_unref (state->loc.inode);
                state->loc.inode = inode_new (itable);
                state->is_revalidate = 2;
                if (uuid_is_null (state->gfid))
                        uuid_generate (state->gfid);
                fuse_gfid_set (state);

                STACK_WIND (frame, fuse_lookup_cbk,
                            prev->this, prev->this->fops->lookup,
                            &state->loc, state->xdata);
                return 0;
        }

        fuse_entry_cbk (frame, cookie, this, op_ret, op_errno, inode, stat,
                        dict);
        return 0;
}


then one can see that a resolve failure causing op_ret to be -1 (note, from logs its is_revalidate == 1 for both cases) would cause the mount program to refresh the inode context. In fact, this is where the flow diverges between the success and the failure scenarios. I am inclined to think that at this point a soft-link on which readlink fails (on the mount on server1) has its read-child set as the afr-client on server2. Thus, in theory, a refresh of context would be necessary.

Now, if we take a look at the actual failure log:

[2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child] 
0-testvol-replicate-0: Returning -107, call_child: -1, last_index: -1
[2010-05-09
 01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk] 
0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 => -1 
(Transport endpoint is not connected)

Here, if we see the below code snippet from afr_get_call_child(), then looks like there might be a problem (or maybe, I am just not reading it right)?

        if (child_up[read_child]) {
                *call_child = read_child;
        } else {
                for (i = 0; i < priv->child_count; i++) {
                        if (fresh_children[i] == -1)
                                break;
                        if (child_up[fresh_children[i]]) {
                                *call_child = fresh_children[i];
                                ret = 0;
                                break;
                        }
                }

                if (*call_child == -1) {
                        ret = -ENOTCONN;
                        goto out;
                }

                *last_index = i;
        }
out:
        gf_log (this->name, GF_LOG_DEBUG, "Returning %d, call_child: %d, "
                "last_index: %d", ret, *call_child, *last_index);
        return ret;


I would suppose 'child_up [read_child]' should equal 0, in which case we would loop over all children. This is where I am confused. If we read a link on server1, and server1 allows me to read so manyother files correctly, then surely all subvolumes on server1 MUST be active? But then, we return -ENOTCONN only when it is discovered that not one of the subvolumes is active!

Comment 2 Anirban Ghoshal 2014-02-06 15:19:36 UTC
(In reply to Anirban Ghoshal from comment #1)
> I tried raising the client log level to 'trace'. Here's what I saw:
> 
> Upon READLINK failures, (ls -l /testvol/somedir/bin/):
> 
> [2010-05-09 01:13:28.140265] T [fuse-bridge.c:2453:fuse_readdir_cbk]
> 0-glusterfs-fuse: 2783484: READDIR => 23/4096,1380
> [2010-05-09 01:13:28.140444] T [fuse-resolve.c:51:fuse_resolve_loc_touchup]
> 0-fuse: return value inode_path 45
> [2010-05-09 01:13:28.140477] T [fuse-bridge.c:708:fuse_getattr_resume]
> 0-glusterfs-fuse: 2783485: GETATTR 140299577689176 (/testvol/somedir/bin)
> [2010-05-09 01:13:28.140618] T [fuse-bridge.c:641:fuse_attr_cbk]
> 0-glusterfs-fuse: 2783485: STAT() /testvol/somedir/bin =>
> -5626802993936595428
> [2010-05-09 01:13:28.140722] T [fuse-resolve.c:51:fuse_resolve_loc_touchup]
> 0-fuse: return value inode_path 52
> [2010-05-09 01:13:28.140737] T [fuse-bridge.c:506:fuse_lookup_resume]
> 0-glusterfs-fuse: 2783486: LOOKUP
> /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
> [2010-05-09 01:13:28.140851] T [fuse-bridge.c:376:fuse_entry_cbk]
> 0-glusterfs-fuse: 2783486: LOOKUP() /testvol/somedir/bin/x86_64 =>
> -4857810743645185021
> [2010-05-09 01:13:28.140954] T [fuse-resolve.c:51:fuse_resolve_loc_touchup]
> 0-fuse: return value inode_path 52
> [2010-05-09 01:13:28.140975] T [fuse-bridge.c:1296:fuse_readlink_resume]
> 0-glusterfs-fuse: 2783487 READLINK
> /testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03 
> [2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child]
> 0-testvol-replicate-0: Returning -107, call_child: -1, last_index: -1
> [2010-05-09 01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk]
> 0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 => -1 (Transport
> endpoint is not connected)
> 
> Upon successful readlink (ls -l /testvol/somedir/bin/x86_64):
> 
> [2010-05-09 01:13:37.717904] T [fuse-bridge.c:376:fuse_entry_cbk]
> 0-glusterfs-fuse: 2790073: LOOKUP() /testvol/somedir/bin =>
> -5626802993936595428
> [2010-05-09 01:13:37.718070] T [fuse-resolve.c:51:fuse_resolve_loc_touchup]
> 0-fuse: return value inode_path 52
> [2010-05-09 01:13:37.718127] T [fuse-bridge.c:506:fuse_lookup_resume]
> 0-glusterfs-fuse: 2790074: LOOKUP
> /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
> [2010-05-09 01:13:37.718306] D
> [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-testvol-replicate-0:
> /testvol/somedir/bin/x86_64: failed to get the gfid from dict
> [2010-05-09 01:13:37.718355] T [rpc-clnt.c:1301:rpc_clnt_record]
> 0-testvol-client-1: Auth Info: pid: 3343, uid: 0, gid: 0, owner:
> 0000000000000000
> [2010-05-09 01:13:37.718383] T
> [rpc-clnt.c:1181:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 464, payload: 368, rpc hdr: 96
> [2010-05-09 01:13:37.718459] T [rpc-clnt.c:1498:rpc_clnt_submit] 0-rpc-clnt:
> submitted request (XID: 0x1527955x Program: GlusterFS 3.3, ProgVers: 330,
> Proc: 27) to rpc-transport (testvol-client-1)
> [2010-05-09 01:13:37.718775] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-testvol-client-1: received rpc message (RPC XID: 0x1527955x Program:
> GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (testvol-client-1)
> [2010-05-09 01:13:37.718848] D
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> 0-testvol-replicate-0: pending_matrix: [ 0 0 ]
> [2010-05-09 01:13:37.718884] D
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> 0-testvol-replicate-0: pending_matrix: [ 1 0 ]
> [2010-05-09 01:13:37.718893] D [afr-self-heal-common.c:887:afr_mark_sources]
> 0-testvol-replicate-0: Number of sources: 1
> [2010-05-09 01:13:37.718903] D
> [afr-self-heal-data.c:814:afr_lookup_select_read_child_by_txn_type]
> 0-testvol-replicate-0: returning read_child: 1
> [2010-05-09 01:13:37.718916] D
> [afr-common.c:1397:afr_lookup_select_read_child] 0-testvol-replicate-0:
> Source selected as 1 for /testvol/somedir/bin/x86_64
> [2010-05-09 01:13:37.718927] D
> [afr-common.c:1134:afr_lookup_build_response_params] 0-testvol-replicate-0:
> Building lookup response from 1
> [2010-05-09 01:13:37.718954] D
> [afr-common.c:1749:afr_lookup_perform_self_heal] 0-testvol-replicate-0: Only
> 1 child up - do not attempt to detect self heal
> [2010-05-09 01:13:37.718970] T [io-cache.c:224:ioc_lookup_cbk]
> 0-testvol-io-cache: locked inode(0x7f9a057d3fe0)
> [2010-05-09 01:13:37.718981] T [io-cache.c:233:ioc_lookup_cbk]
> 0-testvol-io-cache: unlocked inode(0x7f9a057d3fe0)
> [2010-05-09 01:13:37.718991] T [io-cache.c:128:ioc_inode_flush]
> 0-testvol-io-cache: locked inode(0x7f9a057d3fe0)
> [2010-05-09 01:13:37.719001] T [io-cache.c:132:ioc_inode_flush]
> 0-testvol-io-cache: unlocked inode(0x7f9a057d3fe0)
> [2010-05-09 01:13:37.719010] T [io-cache.c:242:ioc_lookup_cbk]
> 0-testvol-io-cache: locked table(0x668ec0)
> [2010-05-09 01:13:37.719020] T [io-cache.c:247:ioc_lookup_cbk]
> 0-testvol-io-cache: unlocked table(0x668ec0)
> [2010-05-09 01:13:37.719050] T [fuse-bridge.c:376:fuse_entry_cbk]
> 0-glusterfs-fuse: 2790074: LOOKUP() /testvol/somedir/bin/x86_64 =>
> -4857810743645185021
> [2010-05-09 01:13:37.719239] T [fuse-resolve.c:51:fuse_resolve_loc_touchup]
> 0-fuse: return value inode_path 52
> [2010-05-09 01:13:37.719291] T [fuse-bridge.c:1296:fuse_readlink_resume]
> 0-glusterfs-fuse: 2790075 READLINK
> /testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03
> [2010-05-09 01:13:37.719426] D [afr-common.c:760:afr_get_call_child]
> 0-testvol-replicate-0: Returning 0, call_child: 1, last_index: -1
> [2010-05-09 01:13:37.719468] T [rpc-clnt.c:1301:rpc_clnt_record]
> 0-testvol-client-1: Auth Info: pid: 3343, uid: 0, gid: 0, owner:
> 0000000000000000
> [2010-05-09 01:13:37.719485] T
> [rpc-clnt.c:1181:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 120, payload: 24, rpc hdr: 96
> [2010-05-09 01:13:37.719561] T [rpc-clnt.c:1498:rpc_clnt_submit] 0-rpc-clnt:
> submitted request (XID: 0x1527956x Program: GlusterFS 3.3, ProgVers: 330,
> Proc: 2) to rpc-transport (testvol-client-1)
> [2010-05-09 01:13:37.719728] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-testvol-client-1: received rpc message (RPC XID: 0x1527956x Program:
> GlusterFS 3.3, ProgVers: 330, Proc: 2) from rpc-transport (testvol-client-1)
> [2010-05-09 01:13:37.719780] T [fuse-bridge.c:1265:fuse_readlink_cbk]
> 0-glusterfs-fuse: 2790075: /testvol/somedir/bin/x86_64 => setarch
> 
> 
> 
> I cannot quite figure out how the whole thing works, but looks like in the
> failure case we end up with the wrong set of read_child(ren). If we first
> take a look at the fuse_lookup_cbk, which I reproduce here,
> 
> 
> static int
> fuse_lookup_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
>                  int32_t op_ret, int32_t op_errno,
>                  inode_t *inode, struct iatt *stat, dict_t *dict,
>                  struct iatt *postparent)
> {
>         fuse_state_t            *state = NULL;
>         call_frame_t            *prev = NULL;
>         inode_table_t           *itable = NULL;
> 
>         state = frame->root->state;
>         prev  = cookie;
> 
>         if (op_ret == -1 && state->is_revalidate == 1) {
>                 itable = state->itable;
>                 inode_unref (state->loc.inode);
>                 state->loc.inode = inode_new (itable);
>                 state->is_revalidate = 2;
>                 if (uuid_is_null (state->gfid))
>                         uuid_generate (state->gfid);
>                 fuse_gfid_set (state);
> 
>                 STACK_WIND (frame, fuse_lookup_cbk,
>                             prev->this, prev->this->fops->lookup,
>                             &state->loc, state->xdata);
>                 return 0;
>         }
> 
>         fuse_entry_cbk (frame, cookie, this, op_ret, op_errno, inode, stat,
>                         dict);
>         return 0;
> }
> 
> 
> then one can see that a resolve failure causing op_ret to be -1 (note, from
> logs its is_revalidate == 1 for both cases) would cause the mount program to
> refresh the inode context. In fact, this is where the flow diverges between
> the success and the failure scenarios. I am inclined to think that at this
> point a soft-link on which readlink fails (on the mount on server1) has its
> read-child set as the afr-client on server2. Thus, in theory, a refresh of
> context would be necessary.
> 
> Now, if we take a look at the actual failure log:
> 
> [2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child] 
> 0-testvol-replicate-0: Returning -107, call_child: -1, last_index: -1
> [2010-05-09
>  01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk] 
> 0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 => -1 
> (Transport endpoint is not connected)
> 
> Here, if we see the below code snippet from afr_get_call_child(), then looks
> like there might be a problem (or maybe, I am just not reading it right)?
> 
>         if (child_up[read_child]) {
>                 *call_child = read_child;
>         } else {
>                 for (i = 0; i < priv->child_count; i++) {
>                         if (fresh_children[i] == -1)
>                                 break;
>                         if (child_up[fresh_children[i]]) {
>                                 *call_child = fresh_children[i];
>                                 ret = 0;
>                                 break;
>                         }
>                 }
> 
>                 if (*call_child == -1) {
>                         ret = -ENOTCONN;
>                         goto out;
>                 }
> 
>                 *last_index = i;
>         }
> out:
>         gf_log (this->name, GF_LOG_DEBUG, "Returning %d, call_child: %d, "
>                 "last_index: %d", ret, *call_child, *last_index);
>         return ret;
> 
> 
> I would suppose 'child_up [read_child]' should equal 0, in which case we
> would loop over all children. This is where I am confused. If we read a link
> on server1, and server1 allows me to read so manyother files correctly, then
> surely all subvolumes on server1 MUST be active? But then, we return
> -ENOTCONN only when it is discovered that not one of the subvolumes is
> active!


Just to avoid confusion, these trace logs are on 3.4.0 (alpha). This is where I discovered the issue first. However, the behavior is identical on 3.4.2.

Comment 3 Anirban Ghoshal 2014-02-11 09:39:26 UTC
Not sure if this matters to you in any way, but our linux kernel and GNU are a bit old. Linux kernel version is 2.6.34, while GNU version is 1.17.

Comment 4 Anirban Ghoshal 2014-02-20 23:54:30 UTC
OK, here's the deal on this one - I managed to debug this. The whole deal is with the fact that I am not using a Red Hat Distro, and I do not have Anand Avati's patch(es) for READDIRPLUS in FUSE. I ran the tests on a set of new RHEL equipped boxes, and I found that the issue is not there on the latter platforms.

SO, this here is my conclusion:

In fuse_readdirp_cbk(), we have this code that looks like its derived from fuse_entry_cbk(), but with a slight difference:

static int
fuse_readdirp_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
                   int32_t op_ret, int32_t op_errno, gf_dirent_t *entries,
                   dict_t *xdata)
{
.
.
.
                if (!linked_inode)
                        continue;

                inode_lookup (linked_inode);

                feo->nodeid = inode_to_fuse_nodeid (linked_inode);

                fuse_inode_set_need_lookup (linked_inode, this);

                inode_unref (linked_inode);

                feo->entry_valid =
                        calc_timeout_sec (priv->entry_timeout);
                feo->entry_valid_nsec =
                        calc_timeout_nsec (priv->entry_timeout);
                feo->attr_valid =
                        calc_timeout_sec (priv->attribute_timeout);
                feo->attr_valid_nsec =
                        calc_timeout_nsec (priv->attribute_timeout);

.
.
.

The same set of calls are present in the legacy procedure, fuse_entry_cbk(), except for use_inode_set_need_lookup (linked_inode, this); This is what it seems causes the inode context (and read_child info) to refresh once we reach up to READLINK. I tested by adding a call to use_inode_set_need_lookup in fuse_entry_cbk() and it works perfectly fine with 3.4.2 on legacy platforms where FUSE does not support READDIRPLUS yet.

It looks like use_inode_set_need_lookup() was created with the intention of refreshing inode context during READLINK, and somehow we missed porting it back to fuse_entry_cbk()? If everyone is agreed with this approach, then it seems a simple enough change to patch. I could even do it myself!

Comment 5 Niels de Vos 2015-05-17 21:59:58 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 6 Kaleb KEITHLEY 2015-10-07 13:49:43 UTC
GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.

Comment 7 Kaleb KEITHLEY 2015-10-07 13:50:53 UTC
GlusterFS 3.4.x has reached end-of-life.\                                                   \                                                                               If this bug still exists in a later release please reopen this and change the version or open a new bug.


Note You need to log in before you can comment on or make changes to this bug.