| Summary: | Distribute must return error when dir selfheal has no fix | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Shehjar Tikoo <shehjart> |
| Component: | distribute | Assignee: | Shehjar Tikoo <shehjart> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | nfs-alpha | CC: | amarts, fharshav, gluster-bugs |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | nfs |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 763375 | ||
|
Description
Shehjar Tikoo
2010-09-18 08:31:33 UTC
Adding Amar for more advice. General remarks first:
o Setting to Blocker because Harsha is facing problems with customer setup w/ a deadline.
o Seen on nfs beta rc11.
First problem is that when distribute subvolumes are down and a top level translator performs a LOOKUP on root, dht self-heal does not return an error to root xlator even when dht realizes there is a problem. For eg. see the log lines below:
[2010-09-18 04:29:05] D [dht-layout.c:593:dht_layout_normalize] distribute: path=/ err=Transport endpoint is not connected on subvol=10.1.100.204-10
[2010-09-18 04:29:05] D [dht-layout.c:593:dht_layout_normalize] distribute: path=/ err=Transport endpoint is not connected on subvol=10.1.100.204-11
[2010-09-18 04:29:05] D [dht-layout.c:593:dht_layout_normalize] distribute: path=/ err=Transport endpoint is not connected on subvol=10.1.100.204-12
[2010-09-18 04:29:05] D [dht-common.c:168:dht_lookup_dir_cbk] distribute: fixing assignment on /
[2010-09-18 04:29:05] D [dht-selfheal.c:487:dht_selfheal_directory] distribute: 36 subvolumes down -- not fixing
#########################################################
Layout cannot be fixed and dht says so but does not return an error to nfs which continues thinking that the lookup succeeded and exports the subvolume as normal.
##########################################################
[2010-09-18 04:29:05] T [nfs.c:234:nfs_start_subvol_lookup_cbk] nfs: Started distribute
The code block to blame is:
In dht-selfheal.c:dht_selfheal_directory:487
if (down) {
gf_log (this->name, GF_LOG_DEBUG,
"%d subvolumes down -- not fixing", down);
ret = 0; ############### Must change to error? ###########
goto sorry_no_fix;
}
.......
.......
.......
sorry_no_fix:
/* TODO: need to put appropriate local->op_errno */
dht_selfheal_dir_finish (frame, this, ret);
http://dev.gluster.com/~shehjart/nfs-export-on-root-lookup-success.mbox - patch fixes the self-heal issue which was seen before. PATCH: http://patches.gluster.com/patch/4919 in master (distribute: Return ESTALE when dir selfheal finds no fix) |