Bug 1236032

Summary: Tiering: unlink failed with error "Invalid argument"
Product: [Community] GlusterFS Reporter: Mohammed Rafi KC <rkavunga>
Component: tieringAssignee: Mohammed Rafi KC <rkavunga>
Status: CLOSED CURRENTRELEASE QA Contact: bugs <bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: mainlineCC: bugs, dlambrig, josferna, nbalacha, rgowdapp, sankarshan
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Known Issue
Doc Text:
Cause: Consequence: Workaround (if any): for a tiered volume, mount using the option -o use-readdirp=no Result:
Story Points: ---
Clone Of:
: 1266880 1271732 (view as bug list) Environment:
Last Closed: 2016-06-16 13:17:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1266880, 1271731, 1271732    

Description Mohammed Rafi KC 2015-06-26 11:57:15 UTC
Description of problem:

Unlink operation failed after attaching tier to a volume that contain some files/directory 


Version-Release number of selected component (if applicable):

master


How reproducible:

100%

Steps to Reproduce:
1.create a distributed volume
2.start and create some files
3.attach a tier. (enable ctr, etc)
4.Remove some files from the mount point

Actual results:

failed with Invalid argument

Expected results:

unlink should success

Additional info:

unlink failed because dht translator doesn't have any cached subvolume for that particular inode in the inode ctx variable.

Comment 1 Mohammed Rafi KC 2015-06-30 09:39:51 UTC
Changing the steps to reproduce :


Steps to Reproduce:
1.create a distributed volume
2.start and create some files
3.attach a tier. (enable ctr, etc)
4.do ls on mount point
4.Remove some files from the mount point

Comment 2 Joseph Elwin Fernandes 2015-07-06 07:35:47 UTC
This issue is due to the NULL cached_subvolume in hot-dht xlator after tiering translator. Had a discussion with Dan on this he said he has a fix for this as he has dealt with this issue for other FOPS. This issue also happens for getxattr "trusted.distribute.linkinfo". 


BT when break point at : dht_unlink_linkfile_cbk

(gdb) bt
#0  dht_unlink_linkfile_cbk (frame=0x7fddb40086dc, cookie=0x7fddb400796c, this=0x7fddbc0162d0, op_ret=-1, 
    op_errno=22, preparent=0x0, postparent=0x0, xdata=0x0) at dht-common.c:2403
#1  0x00007fddc9598b5a in dht_unlink (frame=0x7fddb400796c, this=0x7fddbc015510, loc=0x7fddb400625c, xflag=0, 
    xdata=0x0) at dht-common.c:5208
#2  0x00007fddc9598798 in dht_unlink (frame=0x7fddb40086dc, this=0x7fddbc0162d0, loc=0x7fddb400625c, xflag=0, 
    xdata=0x0) at dht-common.c:5196
#3  0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc, this=0x7fddbc017b70, loc=0x7fddb400625c, xflag=0, 
    xdata=0x0) at defaults.c:1910
#4  0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc, this=0x7fddbc0189b0, loc=0x7fddb400625c, xflag=0, 
    xdata=0x0) at defaults.c:1910
#5  0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc, this=0x7fddbc019720, loc=0x7fddb400625c, xflag=0, 
    xdata=0x0) at defaults.c:1910
#6  0x00007fddd6af5364 in default_unlink_resume (frame=0x7fddb40058ec, this=0x7fddbc01a550, loc=0x7fddb400625c, 
    xflag=0, xdata=0x0) at defaults.c:1469
#7  0x00007fddd6b16817 in call_resume_wind (stub=0x7fddb400621c) at call-stub.c:2083
#8  0x00007fddd6b1ef1e in call_resume (stub=0x7fddb400621c) at call-stub.c:2571
#9  0x00007fddc8d21a58 in open_and_resume (this=0x7fddbc01a550, fd=0x0, stub=0x7fddb400621c) at open-behind.c:242
#10 0x00007fddc8d2468f in ob_unlink (frame=0x7fddb40058ec, this=0x7fddbc01a550, loc=0x7fddb40016e0, xflags=0, 
    xdata=0x0) at open-behind.c:768
#11 0x00007fddc8b12c6b in mdc_unlink (frame=0x7fddb40056fc, this=0x7fddbc01b310, loc=0x7fddb40016e0, xflag=0, 
    xdata=0x0) at md-cache.c:1205
#12 0x00007fddc88fda67 in io_stats_unlink (frame=0x7fddb40055fc, this=0x7fddbc01c0d0, loc=0x7fddb40016e0, xflag=0, 
    xdata=0x0) at io-stats.c:2002
#13 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40055fc, this=0x7fddbc01d170, loc=0x7fddb40016e0, xflag=0, 
    xdata=0x0) at defaults.c:1910
#14 0x00007fddcdef230b in fuse_unlink_resume (state=0x7fddb40016c0) at fuse-bridge.c:1568
#15 0x00007fddcdeebe47 in fuse_fop_resume (state=0x7fddb40016c0) at fuse-bridge.c:536
#16 0x00007fddcdee9b49 in fuse_resolve_done (state=0x7fddb40016c0) at fuse-resolve.c:637
#17 0x00007fddcdee9c1f in fuse_resolve_all (state=0x7fddb40016c0) at fuse-resolve.c:664
#18 0x00007fddcdee9b2a in fuse_resolve (state=0x7fddb40016c0) at fuse-resolve.c:628
#19 0x00007fddcdee9bf6 in fuse_resolve_all (state=0x7fddb40016c0) at fuse-resolve.c:660
#20 0x00007fddcdee9c7d in fuse_resolve_continue (state=0x7fddb40016c0) at fuse-resolve.c:680
#21 0x00007fddcdee9041 in fuse_resolve_parent (state=0x7fddb40016c0) at fuse-resolve.c:290
#22 0x00007fddcdee9afa in fuse_resolve (state=0x7fddb40016c0) at fuse-resolve.c:621
#23 0x00007fddcdee9ba1 in fuse_resolve_all (state=0x7fddb40016c0) at fuse-resolve.c:653
#24 0x00007fddcdee9cbb in fuse_resolve_and_resume (state=0x7fddb40016c0, fn=0x7fddcdef1e94 <fuse_unlink_resume>)
    at fuse-resolve.c:692
#25 0x00007fddcdef240c in fuse_unlink (this=0x20a0be0, finh=0x7fddb4008d30, msg=0x7fddb4008d58)
    at fuse-bridge.c:1582
#26 0x00007fddcdf02087 in fuse_thread_proc (data=0x20a0be0) at fuse-bridge.c:4879
#27 0x00007fddd593652a in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#28 0x00007fddd528522d in clone () from /lib64/libc.so.6


Point to be noted here is cached_subvol is NULL and hence op_errno is set to EINVAL

#1  0x00007fddc9598b5a in dht_unlink (frame=0x7fddb400796c, this=0x7fddbc015510, loc=0x7fddb400625c, xflag=0, 
    xdata=0x0) at dht-common.c:5208
5208	        DHT_STACK_UNWIND (unlink, frame, -1, op_errno, NULL, NULL, NULL);
(gdb) p hashed_subvol
$8 = (xlator_t *) 0x7fddbc012f70

(gdb) p cached_subvol
$3 = (xlator_t *) 0x0

(gdb) p op_errno
$4 = 22

(gdb) p local
$6 = (dht_local_t *) 0x7fddb4008e5c

(gdb) p local->cached_subvol
$7 = (xlator_t *) 0x0
(gdb)

(gdb) p this->name
$9 = 0x7fddbc00c6a0 "test-hot-dht"
(gdb) 



dht_local_init (frame, loc, NULL, GF_FOP_UNLINK); in line 5170 of dht-common.c has failed to populate cached_subvol.  dht_subvol_get_cached() seems to be broken for hot-dht xaltor.

when looked into 
dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c) at dht-helper.c:626
626	        dht_layout_t *layout = NULL;
(gdb) bt
#0  dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c) at dht-helper.c:626
#1  0x00007fddc955ae7d in dht_local_init (frame=0x7fddb400751c, loc=0x7fddb400654c, fd=0x0, fop=GF_FOP_UNLINK)
    at dht-helper.c:498
#2  0x00007fddc95984b3 in dht_unlink (frame=0x7fddb400751c, this=0x7fddbc015510, loc=0x7fddb400654c, xflag=0, 
    xdata=0x0) at dht-common.c:5170
#3  0x00007fddc9598798 in dht_unlink (frame=0x7fddb400741c, this=0x7fddbc0162d0, loc=0x7fddb400654c, xflag=0, 
    xdata=0x0) at dht-common.c:5196
#4  0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c, this=0x7fddbc017b70, loc=0x7fddb400654c, xflag=0, 
    xdata=0x0) at defaults.c:1910
#5  0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c, this=0x7fddbc0189b0, loc=0x7fddb400654c, xflag=0, 
    xdata=0x0) at defaults.c:1910
#6  0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c, this=0x7fddbc019720, loc=0x7fddb400654c, xflag=0, 
    xdata=0x0) at defaults.c:1910
#7  0x00007fddd6af5364 in default_unlink_resume (frame=0x7fddb400640c, this=0x7fddbc01a550, loc=0x7fddb400654c

627	        xlator_t     *subvol = NULL;
(gdb) 
629	        GF_VALIDATE_OR_GOTO (this->name, this, out);
(gdb) 
630	        GF_VALIDATE_OR_GOTO (this->name, inode, out);
(gdb) 
632	        layout = dht_layout_get (this, inode);
(gdb) n
634	        if (!layout) {
(gdb) p layout
$18 = (dht_layout_t *) 0x0
(gdb) p this->name
$19 = 0x7fddbc00c6a0 "test-hot-dht"
(gdb)

dht_layout_get return NULL. As a result dht_subvol_get_cached also return NULL.

When looked deeper we see dht_inode_ctx_t is NULL!

Breakpoint 1, dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c) at dht-helper.c:626
626	        dht_layout_t *layout = NULL;
(gdb) n
627	        xlator_t     *subvol = NULL;
(gdb) n
629	        GF_VALIDATE_OR_GOTO (this->name, this, out);
(gdb) n
630	        GF_VALIDATE_OR_GOTO (this->name, inode, out);
(gdb) n
632	        layout = dht_layout_get (this, inode);
(gdb) s
dht_layout_get (this=0x7fddbc015510, inode=0x7fddbc03b70c) at dht-layout.c:65
65	        dht_conf_t   *conf = NULL;
(gdb) n
66	        dht_layout_t *layout = NULL;
(gdb) n
67	        int           ret = 0;
(gdb) n
69	        conf = this->private;
(gdb) n
70	        if (!conf)
(gdb) n
73	        LOCK (&conf->layout_lock);
(gdb) n
75	                ret = dht_inode_ctx_layout_get (inode, this, &layout);
(gdb) s
dht_inode_ctx_layout_get (inode=0x7fddbc03b70c, this=0x7fddbc015510, layout=0x7fddc37fd678) at dht-common.c:6981
6981	        dht_inode_ctx_t         *ctx            = NULL;
(gdb) n
6982	        int                      ret            = -1;
(gdb) n
6984	        ret = dht_inode_ctx_get (inode, this, &ctx);
(gdb) n
6986	        if (!ret && ctx) {
(gdb) p ctx
$10 = (dht_inode_ctx_t *) 0x0
(gdb)

Comment 3 Joseph Elwin Fernandes 2015-07-06 07:42:27 UTC
This issue only happens in a pure distribute case! not on a Dis-rep or Dis-EC.

Comment 4 Mohammed Rafi KC 2015-07-10 09:48:16 UTC
RCA.

Since, all of the fops will be hashed to hot_tier after attach-tier (unless explicitly set the "rule" option), the lookups sent to directory, will eventually search the directory using readdirp, and will populate inode_ctx for the inodes based on the output, in respective dht_xlators. So the readdirp will populate inodes_ctx for the files (that is already being present in volume before attaching) in cold-dht, only because it got the entries from the cold-tier.

So when an unlink comes on such an inode, the lookup associated with the unlink will be send as a re validate request to cold-tier only, since already a lookup was performed on the inode, and the new lookup will succeed. So from the unlink of dht, it will hash to cold-tier but the cached_subvol will be cold, since there is a mismatch in hash and cach , it chose hashed subvolume and will sent the fop to hot dht, and the fops fail with EINVAL from the hot-dht since it does not have inode_ctx stored for that inode (because, no lookup was performed from hot-dht).

Comment 5 Mohammed Rafi KC 2015-07-10 11:22:43 UTC
The same problem could be there for the following FOP's too.

1) dht_link,
2) getxattr "trusted.distribute.linkinfo"
3) f/setxattr
4) f/removexattr
5) unlink of a link file

Comment 6 Anand Avati 2015-07-21 12:53:46 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

Comment 7 Anand Avati 2015-08-06 06:48:17 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

Comment 8 Anand Avati 2015-08-13 11:33:06 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#5) for review on master by mohammed rafi  kc (rkavunga)

Comment 9 Anand Avati 2015-08-13 20:04:40 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#6) for review on master by Dan Lambright (dlambrig)

Comment 10 Anand Avati 2015-08-13 20:44:45 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#7) for review on master by Dan Lambright (dlambrig)

Comment 11 Anand Avati 2015-08-14 06:35:40 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

Comment 12 Anand Avati 2015-08-14 14:55:00 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#8) for review on master by Dan Lambright (dlambrig)

Comment 13 Anand Avati 2015-08-19 05:43:03 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#9) for review on master by Dan Lambright (dlambrig)

Comment 14 Anand Avati 2015-08-21 10:55:57 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#10) for review on master by Dan Lambright (dlambrig)

Comment 15 Anand Avati 2015-08-21 15:51:12 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#11) for review on master by Joseph Fernandes

Comment 16 Anand Avati 2015-08-27 17:14:18 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#12) for review on master by Dan Lambright (dlambrig)

Comment 17 Vijay Bellur 2015-09-03 09:37:12 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#13) for review on master by Joseph Fernandes

Comment 18 Vijay Bellur 2015-09-04 09:09:25 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#14) for review on master by mohammed rafi  kc (rkavunga)

Comment 19 Vijay Bellur 2015-09-09 09:13:55 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#16) for review on master by mohammed rafi  kc (rkavunga)

Comment 20 Vijay Bellur 2015-09-12 14:23:15 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#3) for review on master by Dan Lambright (dlambrig)

Comment 21 Vijay Bellur 2015-09-14 08:53:43 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#17) for review on master by mohammed rafi  kc (rkavunga)

Comment 22 Vijay Bellur 2015-09-15 05:39:21 UTC
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in a directory) posted (#18) for review on master by mohammed rafi  kc (rkavunga)

Comment 23 Vijay Bellur 2015-10-29 08:52:25 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#5) for review on master by mohammed rafi  kc (rkavunga)

Comment 24 Mohammed Rafi KC 2015-11-04 12:06:15 UTC
The patch was set to revert http://review.gluster.org/#/c/12449/ . So reopening this bug .

Comment 25 Vijay Bellur 2015-11-12 06:49:59 UTC
REVIEW: http://review.gluster.org/12519 (tier/dht: unlink fails after lookup in a directory) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

Comment 26 Vijay Bellur 2015-11-16 11:25:40 UTC
REVIEW: http://review.gluster.org/12519 (tier/dht: unlink fails after lookup in a directory) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

Comment 27 Vijay Bellur 2015-11-17 16:37:17 UTC
REVIEW: http://review.gluster.org/12519 (tier/dht: unlink fails after lookup in a directory) posted (#5) for review on master by Dan Lambright (dlambrig)

Comment 28 Vijay Bellur 2015-11-18 13:29:11 UTC
REVIEW: http://review.gluster.org/12519 (tier/dht: unlink fails after lookup in a directory) posted (#6) for review on master by mohammed rafi  kc (rkavunga)

Comment 29 Vijay Bellur 2015-11-18 13:29:14 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#10) for review on master by mohammed rafi  kc (rkavunga)

Comment 30 Vijay Bellur 2015-12-10 02:08:13 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#11) for review on master by Dan Lambright (dlambrig)

Comment 31 Vijay Bellur 2016-01-12 07:37:05 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#12) for review on master by mohammed rafi  kc (rkavunga)

Comment 32 Vijay Bellur 2016-01-13 06:52:27 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#13) for review on master by mohammed rafi  kc (rkavunga)

Comment 33 Vijay Bellur 2016-01-13 09:03:28 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#14) for review on master by mohammed rafi  kc (rkavunga)

Comment 34 Vijay Bellur 2016-01-13 12:39:19 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#15) for review on master by mohammed rafi  kc (rkavunga)

Comment 35 Vijay Bellur 2016-01-13 13:14:36 UTC
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before actual fop) posted (#16) for review on master by mohammed rafi  kc (rkavunga)

Comment 36 Vijay Bellur 2016-01-14 03:19:37 UTC
COMMIT: http://review.gluster.org/11892 committed in master by Dan Lambright (dlambrig) 
------
commit ca515db012718f2d4998edf682c70ccba29924c6
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed Aug 12 14:30:27 2015 +0530

    fuse:sent at least one lookup before actual fop
    
    Fuse shoud sent atleast one lookup for an inode/gfid
    populated via readdirp before actual fop to populate
    inode ctx for xlators
    
    Change-Id: I5c02ed73f892924c9e404d91cbe0633a275accbd
    BUG: 1236032
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/11892
    Reviewed-by: Raghavendra G <rgowdapp>
    Tested-by: Raghavendra G <rgowdapp>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Dan Lambright <dlambrig>
    Tested-by: Dan Lambright <dlambrig>

Comment 37 Niels de Vos 2016-06-16 13:17:09 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user