Bug 763373 (GLUSTER-1641) - Distribute must return error when dir selfheal has no fix
Summary: Distribute must return error when dir selfheal has no fix
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1641
Product: GlusterFS
Classification: Community
Component: distribute
Version: nfs-alpha
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Shehjar Tikoo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: GLUSTER-1643
TreeView+ depends on / blocked
 
Reported: 2010-09-18 11:29 UTC by Shehjar Tikoo
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Shehjar Tikoo 2010-09-18 08:31:33 UTC
Complete log at: dev:/share/tickets/1641

Comment 1 Shehjar Tikoo 2010-09-18 08:33:44 UTC
Adding Amar for more advice.

Comment 2 Shehjar Tikoo 2010-09-18 11:29:55 UTC
General remarks first:
o Setting to Blocker because Harsha is facing problems with customer setup w/ a deadline.
o Seen on nfs beta rc11.


First problem is that when distribute subvolumes are down and a top level translator performs a LOOKUP on root, dht self-heal does not return an error to root xlator even when dht realizes there is a problem. For eg. see the log lines below:


[2010-09-18 04:29:05] D [dht-layout.c:593:dht_layout_normalize] distribute: path=/ err=Transport endpoint is not connected on subvol=10.1.100.204-10
[2010-09-18 04:29:05] D [dht-layout.c:593:dht_layout_normalize] distribute: path=/ err=Transport endpoint is not connected on subvol=10.1.100.204-11
[2010-09-18 04:29:05] D [dht-layout.c:593:dht_layout_normalize] distribute: path=/ err=Transport endpoint is not connected on subvol=10.1.100.204-12
[2010-09-18 04:29:05] D [dht-common.c:168:dht_lookup_dir_cbk] distribute: fixing assignment on /
[2010-09-18 04:29:05] D [dht-selfheal.c:487:dht_selfheal_directory] distribute: 36 subvolumes down -- not fixing
#########################################################
Layout cannot be fixed and dht says so but does not return an error to nfs which continues thinking that the lookup succeeded and exports the subvolume as normal.
##########################################################
[2010-09-18 04:29:05] T [nfs.c:234:nfs_start_subvol_lookup_cbk] nfs: Started distribute

The code block to blame is:
In dht-selfheal.c:dht_selfheal_directory:487
        if (down) {
                gf_log (this->name, GF_LOG_DEBUG,
                        "%d subvolumes down -- not fixing", down);
                ret = 0; ############### Must change to error? ###########
                goto sorry_no_fix;
        }
.......
.......
.......
sorry_no_fix:
        /* TODO: need to put appropriate local->op_errno */
        dht_selfheal_dir_finish (frame, this, ret);

Comment 3 Harshavardhana 2010-09-18 22:14:03 UTC
http://dev.gluster.com/~shehjart/nfs-export-on-root-lookup-success.mbox - patch fixes the self-heal issue which was seen before.

Comment 4 Vijay Bellur 2010-09-22 08:14:22 UTC
PATCH: http://patches.gluster.com/patch/4919 in master (distribute: Return ESTALE when dir selfheal finds no fix)


Note You need to log in before you can comment on or make changes to this bug.