Bug 1284834

Summary: tiering: Seeing error messages E "/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f6c435c116f] ) 0-ctr: invalid argument: loc->name [Invalid argument] after attach tier
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anil Shah <ashah>
Component: tierAssignee: Joseph Elwin Fernandes <josferna>
Status: CLOSED ERRATA QA Contact: Anil Shah <ashah>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: asrivast, dlambrig, josferna, mselvaga, rhs-bugs, sankarshan, storage-qa-internal, vmallika
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.5-11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1285663 (view as bug list) Environment:
Last Closed: 2016-03-01 05:57:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1260783, 1285663, 1290295    

Description Anil Shah 2015-11-24 10:16:30 UTC
Description of problem:

After attach tier, seeing lots of error messages  in bricks logs

E [changetimerecorder.c:337:ctr_lookup] (-->/usr/lib64/glusterfs/3.7.5/xlator/features/bitrot-stub.so(br_stub_lookup+0x2d0) [0x7f6c42ab5bb0] -->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f6c5588174b] -->/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f6c435c116f] ) 0-ctr: invalid argument: loc->name [Invalid argument]

Version-Release number of selected component (if applicable):

[root@localhost test]# rpm -qa | grep glusterfs
glusterfs-3.7.5-6.el7rhgs.x86_64
glusterfs-api-3.7.5-6.el7rhgs.x86_64
glusterfs-cli-3.7.5-6.el7rhgs.x86_64
glusterfs-geo-replication-3.7.5-6.el7rhgs.x86_64
glusterfs-libs-3.7.5-6.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-6.el7rhgs.x86_64
glusterfs-fuse-3.7.5-6.el7rhgs.x86_64
glusterfs-server-3.7.5-6.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-6.el7rhgs.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Create 1 x (4 + 2) disperse volume
2. Enable quota and set limit-usage 
3. Do fuse mount, and create some files and directories such that quota limit is reached
4. Now attach 2*2 distribute replicate hot tier

Actual results:

Seeing below error logs:
===========================================
[2015-11-24 15:26:00.236272] E [changetimerecorder.c:337:ctr_lookup] (-->/usr/lib64/glusterfs/3.7.5/xlator/features/bitrot-stub.so(br_stub_lookup+0x2d0) [0x7f7d872debb0] -->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f7d9a1a874b] -->/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f7d87dea16f] ) 0-ctr: invalid argument: loc->name [Invalid argument]
[2015-11-24 15:26:00.562317] E [changetimerecorder.c:337:ctr_lookup] (-->/usr/lib64/glusterfs/3.7.5/xlator/features/bitrot-stub.so(br_stub_lookup+0x2d0) [0x7f7d872debb0] -->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f7d9a1a874b] -->/usr/lib64/glusterfs/3.7.5/xlator/features/changetimerecorder.so(ctr_lookup+0x54f) [0x7f7d87dea16f] ) 0-ctr: invalid argument: loc->name [Invalid argument]

Expected results:

There should not be any error message after attach tier

Additional info:
[root@localhost test]# gluster v info
 
Volume Name: vol0
Type: Tier
Volume ID: 6369820b-2344-4812-be7c-190ae58b6d7d
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.47.3:/rhs/brick3/b4
Brick2: 10.70.47.2:/rhs/brick3/b3
Brick3: 10.70.47.145:/rhs/brick3/b2
Brick4: 10.70.47.143:/rhs/brick3/b1
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: 10.70.47.143:/rhs/brick1/ec1
Brick6: 10.70.47.145:/rhs/brick1/ec2
Brick7: 10.70.47.2:/rhs/brick1/ec3
Brick8: 10.70.47.3:/rhs/brick1/ec4
Brick9: 10.70.47.143:/rhs/brick2/ec5
Brick10: 10.70.47.145:/rhs/brick2/ec6
Options Reconfigured:
features.ctr-enabled: on
features.hard-timeout: 0
features.soft-timeout: 0
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on

Comment 2 Joseph Elwin Fernandes 2015-12-05 06:26:33 UTC
Introduced a new function called ctr_junkie to catch this special case were

!loc->name && loc->parent

(gdb) bt
#0  ctr_junkie (this=0x7f988000bb50, loc=0x7f9858001cbc) at changetimerecorder.c:315
#1  0x00007f98870e541b in ctr_lookup (frame=0x7f9870094eac, this=0x7f988000bb50, loc=0x7f9858001cbc, 
    xdata=0x7f985800099c) at changetimerecorder.c:334
#2  0x00007f9894c8d45e in default_lookup (frame=0x7f9870094eac, this=0x7f988000f360, loc=0x7f9858001cbc, 
    xdata=0x7f985800099c) at defaults.c:2377
#3  0x00007f98867d6efe in br_stub_lookup (frame=0x7f987009676c, this=0x7f9880011010, loc=0x7f9858001cbc, 
    xdata=0x7f985800099c) at bit-rot-stub.c:2709
#4  0x00007f98865b99b0 in posix_acl_lookup (frame=0x7f987009602c, this=0x7f98800126a0, loc=0x7f9858001cbc, 
    xattr=0x7f985800099c) at posix-acl.c:909
#5  0x00007f98863a6ea9 in pl_lookup (frame=0x7f9870000bac, this=0x7f9880013b30, loc=0x7f9858001cbc, 
    xdata=0x7f985800099c) at posix.c:2171
#6  0x00007f988618cba2 in up_lookup (frame=0x7f987001444c, this=0x7f9880014f60, loc=0x7f9858001cbc, 
    xattr_req=0x7f985800099c) at upcall.c:775
#7  0x00007f9894c892b7 in default_lookup_resume (frame=0x7f9858000f4c, this=0x7f9880016570, loc=0x7f9858001cbc, 
    xdata=0x7f985800099c) at defaults.c:1731
#8  0x00007f9894c1d18d in call_resume_wind (stub=0x7f9858001c7c) at call-stub.c:2223
#9  0x00007f9894c2504e in call_resume (stub=0x7f9858001c7c) at call-stub.c:2571
#10 0x00007f9885f7f880 in iot_worker (data=0x7f98800445c0) at io-threads.c:210
#11 0x00007f9893a5452a in start_thread () from /lib64/libpthread.so.0
#12 0x00007f98933a322d in clone () from /lib64/libc.so.6

Comment 3 Joseph Elwin Fernandes 2015-12-05 06:28:00 UTC
there was another function inserted in marker_lookup called marker_junkie.
This was done to check if this lookup came from server xlator or marker translator.

Comment 4 Joseph Elwin Fernandes 2015-12-05 06:29:18 UTC
This is how the code looks in marker

Comment 5 Joseph Elwin Fernandes 2015-12-05 06:32:39 UTC
This is how the test code looks in ctr

void ctr_junkie (xlator_t *this, loc_t *loc) {
}

int32_t
ctr_lookup (call_frame_t *frame, xlator_t *this,
              loc_t *loc, dict_t *xdata)
{
        gf_ctr_inode_context_t ctr_inode_cx;
        gf_ctr_inode_context_t *_inode_cx       = &ctr_inode_cx;
        gf_ctr_link_context_t  ctr_link_cx;
        gf_ctr_link_context_t  *_link_cx        = &ctr_link_cx;
        int ret                                 = -1;

        CTR_IS_DISABLED_THEN_GOTO(this, out);
        CTR_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, out);

        GF_ASSERT(frame);
        GF_ASSERT(frame->root);

        if (loc->parent && !loc->name) {
                ctr_junkie (this, loc); ------------> catch the special case and call ctr_junkie
        }



And this is how the test code looks in marker 
void
marker_junkie (xlator_t *this, loc_t *loc) {
}

int32_t
marker_lookup (call_frame_t *frame, xlator_t *this,
               loc_t *loc, dict_t *xattr_req)
{
        int32_t         ret                     = 0;
        marker_local_t *local                   = NULL;
        marker_conf_t  *priv                    = NULL;

        priv = this->private;

        if (loc->parent && !loc->name) {
                marker_junkie (this, loc);------------> catch the special case and call marker_junkie
        }

Comment 6 Joseph Elwin Fernandes 2015-12-05 07:05:57 UTC
The effect of this is the ctr lookup heal that is triggered for legacy files is not done. Functionality break. The databases are empty.

Comment 7 Joseph Elwin Fernandes 2015-12-05 07:12:43 UTC
Plus looks like vanila quota is broken i,.e with tiering on a normal volume


[root@fedora1 test]# gluster vol quota test list
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/                                        20Bytes     80%(16Bytes)    1.5MB  0Bytes             Yes                  Yes


I still can create files

Comment 8 Joseph Elwin Fernandes 2015-12-05 07:34:07 UTC
(In reply to Joseph Elwin Fernandes from comment #7)
> Plus looks like vanila quota is broken i,.e with tiering on a normal volume
> 
> 
> [root@fedora1 test]# gluster vol quota test list
>                   Path                   Hard-limit  Soft-limit      Used 
> Available  Soft-limit exceeded? Hard-limit exceeded?
> -----------------------------------------------------------------------------
> --------------------------------------------------
> /                                        20Bytes     80%(16Bytes)    1.5MB 
> 0Bytes             Yes                  Yes
> 
> 
> I still can create files

typo without tiering on a normal volume

Comment 9 Manikandan 2015-12-09 14:16:05 UTC
The logs are getting generated due to a nameless lookup. But couldn't see any kind of loss in data even after attach-tier. So it's better to avoid nameless lookup in ctr, thereby avoiding the log.

Comment 10 Joseph Elwin Fernandes 2015-12-10 05:53:03 UTC
https://code.engineering.redhat.com/gerrit/#/c/63434/

Comment 11 Anil Shah 2015-12-11 09:25:45 UTC
After attach tier , not seeing any error logs on latest build. 

Bug verified on build glusterfs-3.7.5-11.el7rhgs.x86_64.

Comment 13 errata-xmlrpc 2016-03-01 05:57:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html