Description of problem: After attach tier, seeing lots of error messages in bricks logs E [changetimerecorder.c:337:ctr_lookup] (-->/usr/lib64/glusterfs/3.7.5/xlator/features/bitrot-stub.so(br_stub_lookup+0x2d0) [0x7f6c42ab5bb0] -->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f6c5588174b] -->/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f6c435c116f] ) 0-ctr: invalid argument: loc->name [Invalid argument] Version-Release number of selected component (if applicable): [root@localhost test]# rpm -qa | grep glusterfs glusterfs-3.7.5-6.el7rhgs.x86_64 glusterfs-api-3.7.5-6.el7rhgs.x86_64 glusterfs-cli-3.7.5-6.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-6.el7rhgs.x86_64 glusterfs-libs-3.7.5-6.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-6.el7rhgs.x86_64 glusterfs-fuse-3.7.5-6.el7rhgs.x86_64 glusterfs-server-3.7.5-6.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-6.el7rhgs.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create 1 x (4 + 2) disperse volume 2. Enable quota and set limit-usage 3. Do fuse mount, and create some files and directories such that quota limit is reached 4. Now attach 2*2 distribute replicate hot tier Actual results: Seeing below error logs: =========================================== [2015-11-24 15:26:00.236272] E [changetimerecorder.c:337:ctr_lookup] (-->/usr/lib64/glusterfs/3.7.5/xlator/features/bitrot-stub.so(br_stub_lookup+0x2d0) [0x7f7d872debb0] -->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f7d9a1a874b] -->/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f7d87dea16f] ) 0-ctr: invalid argument: loc->name [Invalid argument] [2015-11-24 15:26:00.562317] E [changetimerecorder.c:337:ctr_lookup] (-->/usr/lib64/glusterfs/3.7.5/xlator/features/bitrot-stub.so(br_stub_lookup+0x2d0) [0x7f7d872debb0] -->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f7d9a1a874b] -->/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f7d87dea16f] ) 0-ctr: invalid argument: loc->name [Invalid argument] Expected results: There should not be any error message after attach tier Additional info: [root@localhost test]# gluster v info Volume Name: vol0 Type: Tier Volume ID: 6369820b-2344-4812-be7c-190ae58b6d7d Status: Started Number of Bricks: 10 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.47.3:/rhs/brick3/b4 Brick2: 10.70.47.2:/rhs/brick3/b3 Brick3: 10.70.47.145:/rhs/brick3/b2 Brick4: 10.70.47.143:/rhs/brick3/b1 Cold Tier: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 2) = 6 Brick5: 10.70.47.143:/rhs/brick1/ec1 Brick6: 10.70.47.145:/rhs/brick1/ec2 Brick7: 10.70.47.2:/rhs/brick1/ec3 Brick8: 10.70.47.3:/rhs/brick1/ec4 Brick9: 10.70.47.143:/rhs/brick2/ec5 Brick10: 10.70.47.145:/rhs/brick2/ec6 Options Reconfigured: features.ctr-enabled: on features.hard-timeout: 0 features.soft-timeout: 0 features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on
Introduced a new function called ctr_junkie to catch this special case were !loc->name && loc->parent (gdb) bt #0 ctr_junkie (this=0x7f988000bb50, loc=0x7f9858001cbc) at changetimerecorder.c:315 #1 0x00007f98870e541b in ctr_lookup (frame=0x7f9870094eac, this=0x7f988000bb50, loc=0x7f9858001cbc, xdata=0x7f985800099c) at changetimerecorder.c:334 #2 0x00007f9894c8d45e in default_lookup (frame=0x7f9870094eac, this=0x7f988000f360, loc=0x7f9858001cbc, xdata=0x7f985800099c) at defaults.c:2377 #3 0x00007f98867d6efe in br_stub_lookup (frame=0x7f987009676c, this=0x7f9880011010, loc=0x7f9858001cbc, xdata=0x7f985800099c) at bit-rot-stub.c:2709 #4 0x00007f98865b99b0 in posix_acl_lookup (frame=0x7f987009602c, this=0x7f98800126a0, loc=0x7f9858001cbc, xattr=0x7f985800099c) at posix-acl.c:909 #5 0x00007f98863a6ea9 in pl_lookup (frame=0x7f9870000bac, this=0x7f9880013b30, loc=0x7f9858001cbc, xdata=0x7f985800099c) at posix.c:2171 #6 0x00007f988618cba2 in up_lookup (frame=0x7f987001444c, this=0x7f9880014f60, loc=0x7f9858001cbc, xattr_req=0x7f985800099c) at upcall.c:775 #7 0x00007f9894c892b7 in default_lookup_resume (frame=0x7f9858000f4c, this=0x7f9880016570, loc=0x7f9858001cbc, xdata=0x7f985800099c) at defaults.c:1731 #8 0x00007f9894c1d18d in call_resume_wind (stub=0x7f9858001c7c) at call-stub.c:2223 #9 0x00007f9894c2504e in call_resume (stub=0x7f9858001c7c) at call-stub.c:2571 #10 0x00007f9885f7f880 in iot_worker (data=0x7f98800445c0) at io-threads.c:210 #11 0x00007f9893a5452a in start_thread () from /lib64/libpthread.so.0 #12 0x00007f98933a322d in clone () from /lib64/libc.so.6
there was another function inserted in marker_lookup called marker_junkie. This was done to check if this lookup came from server xlator or marker translator.
This is how the code looks in marker
This is how the test code looks in ctr void ctr_junkie (xlator_t *this, loc_t *loc) { } int32_t ctr_lookup (call_frame_t *frame, xlator_t *this, loc_t *loc, dict_t *xdata) { gf_ctr_inode_context_t ctr_inode_cx; gf_ctr_inode_context_t *_inode_cx = &ctr_inode_cx; gf_ctr_link_context_t ctr_link_cx; gf_ctr_link_context_t *_link_cx = &ctr_link_cx; int ret = -1; CTR_IS_DISABLED_THEN_GOTO(this, out); CTR_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, out); GF_ASSERT(frame); GF_ASSERT(frame->root); if (loc->parent && !loc->name) { ctr_junkie (this, loc); ------------> catch the special case and call ctr_junkie } And this is how the test code looks in marker void marker_junkie (xlator_t *this, loc_t *loc) { } int32_t marker_lookup (call_frame_t *frame, xlator_t *this, loc_t *loc, dict_t *xattr_req) { int32_t ret = 0; marker_local_t *local = NULL; marker_conf_t *priv = NULL; priv = this->private; if (loc->parent && !loc->name) { marker_junkie (this, loc);------------> catch the special case and call marker_junkie }
The effect of this is the ctr lookup heal that is triggered for legacy files is not done. Functionality break. The databases are empty.
Plus looks like vanila quota is broken i,.e with tiering on a normal volume [root@fedora1 test]# gluster vol quota test list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- / 20Bytes 80%(16Bytes) 1.5MB 0Bytes Yes Yes I still can create files
(In reply to Joseph Elwin Fernandes from comment #7) > Plus looks like vanila quota is broken i,.e with tiering on a normal volume > > > [root@fedora1 test]# gluster vol quota test list > Path Hard-limit Soft-limit Used > Available Soft-limit exceeded? Hard-limit exceeded? > ----------------------------------------------------------------------------- > -------------------------------------------------- > / 20Bytes 80%(16Bytes) 1.5MB > 0Bytes Yes Yes > > > I still can create files typo without tiering on a normal volume
The logs are getting generated due to a nameless lookup. But couldn't see any kind of loss in data even after attach-tier. So it's better to avoid nameless lookup in ctr, thereby avoiding the log.
https://code.engineering.redhat.com/gerrit/#/c/63434/
After attach tier , not seeing any error logs on latest build. Bug verified on build glusterfs-3.7.5-11.el7rhgs.x86_64.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html