Bug 1209113 - Disperse volume: Invalid index errors in readdirp requests
Summary: Disperse volume: Invalid index errors in readdirp requests
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: qe_tracker_everglades 1214678
TreeView+ depends on / blocked
 
Reported: 2015-04-06 08:37 UTC by Bhaskarakiran
Modified: 2016-11-23 23:11 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1214678 (view as bug list)
Environment:
Last Closed: 2015-05-09 18:04:05 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Bhaskarakiran 2015-04-06 08:37:19 UTC
Description of problem:
=======================

'ls' on nfs mount lists only 21 entries while the directory has 1000's of directories in it. The nfs log file shows the below messages while deleting the entries from the client.

[2015-04-06 06:56:41.451361] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 26 in readdirp request
[2015-04-06 06:56:41.483035] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 18 in readdirp request
[2015-04-06 06:56:41.483921] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 32 in readdirp request
[2015-04-06 06:56:41.509809] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 22 in readdirp request
[2015-04-06 06:56:41.510357] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 26 in readdirp request
[2015-04-06 06:56:41.537172] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 14 in readdirp request
[2015-04-06 06:56:41.537957] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 32 in readdirp request
[2015-04-06 06:56:41.569386] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 18 in readdirp request
[2015-04-06 06:56:41.570263] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 26 in readdirp request
[2015-04-06 06:56:41.597343] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 22 in readdirp request
[2015-04-06 06:56:41.597986] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 32 in readdirp request
[2015-04-06 06:56:41.623217] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 14 in readdirp request
[2015-04-06 06:56:41.623743] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 26 in readdirp request
[2015-04-06 06:56:41.654130] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 18 in readdirp request
[2015-04-06 06:56:41.654713] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 32 in readdirp request
[2015-04-06 06:56:41.683289] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 21 in readdirp request
[2015-04-06 06:56:41.683905] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 26 in readdirp request
[2015-04-06 06:56:41.712598] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 13 in readdirp request
[2015-04-06 06:56:41.713078] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 32 in readdirp request
[2015-04-06 06:56:41.745242] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 18 in readdirp request
[2015-04-06 06:56:41.746037] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 26 in readdirp request
[2015-04-06 06:56:41.775139] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-0: Invalid index 22 in readdirp request
[2015-04-06 06:56:41.775963] E [ec-dir-read.c:422:ec_manager_readdir] 0-testvol-disperse-1: Invalid index 32 in readdirp request

Output of 'ls' command on nfs mount :
=====================================

[root@dhcp37-61 nfs]# ls -ld dirs
drwxr-xr-x. 8942 root root 901120 Apr  6 12:26 dirs
[root@dhcp37-61 nfs]# cd dirs
[root@dhcp37-61 dirs]# ls | wc -l
21
[root@dhcp37-61 dirs]# rm -rf *
[root@dhcp37-61 dirs]# ls | wc -l
21
[root@dhcp37-61 dirs]# 

Version-Release number of selected component (if applicable):
==============================================================
[root@dhcp37-61 dirs]# gluster --version
glusterfs 3.7dev built on Apr  5 2015 01:10:28
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@dhcp37-61 dirs]# 

How reproducible:
=================
100%

Steps to reproduce :
1. NFS mount the volume on client.
2. Create 100's of directories
3. Now delete with 'rm rf * "
4. List the entries from mount and check the nfs log file on server side.


Gluster volume status and info :
================================

[root@vertigo ~]# gluster v status testvol
Status of volume: testvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick vertigo:/rhs/brick1/b1                49152     0          Y       8995 
Brick ninja:/rhs/brick1/b2                  49152     0          Y       12400
Brick vertigo:/rhs/brick2/b3                49153     0          Y       9014 
Brick ninja:/rhs/brick2/b4                  49153     0          Y       12419
Brick vertigo:/rhs/brick3/b5                49154     0          Y       6143 
Brick ninja:/rhs/brick3/b6                  49154     0          Y       5874 
Brick vertigo:/rhs/brick4/b7                49155     0          Y       6160 
Brick ninja:/rhs/brick4/b8                  49155     0          Y       5891 
Brick vertigo:/rhs/brick1/b9                49156     0          Y       6177 
Brick ninja:/rhs/brick1/b10                 49156     0          Y       5908 
Brick vertigo:/rhs/brick2/b11               49157     0          Y       6194 
Brick ninja:/rhs/brick2/b12                 49157     0          Y       5925 
Brick vertigo:/rhs/brick1/b1-1              49159     0          Y       9149 
Brick ninja:/rhs/brick1/b2-1                49159     0          Y       13401
Brick vertigo:/rhs/brick2/b3-1              49160     0          Y       9168 
Brick ninja:/rhs/brick2/b4-1                49160     0          Y       13420
Brick vertigo:/rhs/brick3/b5-1              49161     0          Y       9187 
Brick ninja:/rhs/brick3/b6-1                49161     0          Y       13439
Brick vertigo:/rhs/brick4/b7-1              49162     0          Y       9206 
Brick ninja:/rhs/brick4/b8-1                49162     0          Y       13458
Brick vertigo:/rhs/brick1/b9-1              49163     0          Y       9225 
Brick ninja:/rhs/brick1/b10-1               49163     0          Y       13477
Brick vertigo:/rhs/brick2/b11-1             49164     0          Y       9244 
Brick ninja:/rhs/brick2/b12-1               49164     0          Y       13496
Snapshot Daemon on localhost                49158     0          Y       6336 
NFS Server on localhost                     2049      0          Y       2546 
Quota Daemon on localhost                   N/A       N/A        Y       2584 
Snapshot Daemon on ninja                    49158     0          Y       6110 
NFS Server on ninja                         2049      0          Y       6657 
Quota Daemon on ninja                       N/A       N/A        Y       6682 
 
Task Status of Volume testvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : f768cf44-3b79-487c-99a6-7b301c213f46
Status               : in progress         
 
[root@vertigo ~]# gluster v info testvol 
 
Volume Name: testvol
Type: Distributed-Disperse
Volume ID: b9957725-69f5-496a-8b24-20a1c102ff1a
Status: Started
Number of Bricks: 2 x (8 + 4) = 24
Transport-type: tcp
Bricks:
Brick1: vertigo:/rhs/brick1/b1
Brick2: ninja:/rhs/brick1/b2
Brick3: vertigo:/rhs/brick2/b3
Brick4: ninja:/rhs/brick2/b4
Brick5: vertigo:/rhs/brick3/b5
Brick6: ninja:/rhs/brick3/b6
Brick7: vertigo:/rhs/brick4/b7
Brick8: ninja:/rhs/brick4/b8
Brick9: vertigo:/rhs/brick1/b9
Brick10: ninja:/rhs/brick1/b10
Brick11: vertigo:/rhs/brick2/b11
Brick12: ninja:/rhs/brick2/b12
Brick13: vertigo:/rhs/brick1/b1-1
Brick14: ninja:/rhs/brick1/b2-1
Brick15: vertigo:/rhs/brick2/b3-1
Brick16: ninja:/rhs/brick2/b4-1
Brick17: vertigo:/rhs/brick3/b5-1
Brick18: ninja:/rhs/brick3/b6-1
Brick19: vertigo:/rhs/brick4/b7-1
Brick20: ninja:/rhs/brick4/b8-1
Brick21: vertigo:/rhs/brick1/b9-1
Brick22: ninja:/rhs/brick1/b10-1
Brick23: vertigo:/rhs/brick2/b11-1
Brick24: ninja:/rhs/brick2/b12-1
Options Reconfigured:
features.quota: on
features.uss: on
server.event-threads: 3
client.event-threads: 4
cluster.disperse-self-heal-daemon: enable
[root@vertigo ~]# 


sosreports of the node will be attached.

Comment 2 Anand Avati 2015-04-08 16:46:56 UTC
REVIEW: http://review.gluster.org/10165 (cluster/ec: Fix readdir de-itransform) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2015-04-09 07:24:25 UTC
REVIEW: http://review.gluster.org/10165 (cluster/ec: Fix readdir de-itransform) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2015-04-09 10:39:42 UTC
REVIEW: http://review.gluster.org/10165 (cluster/ec: Fix readdir de-itransform) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2015-04-11 09:13:09 UTC
COMMIT: http://review.gluster.org/10165 committed in master by Vijay Bellur (vbellur) 
------
commit 4797cb1c9dbf3910952f9d28d8272ff83cd25e7b
Author: Pranith Kumar K <pkarampu>
Date:   Wed Apr 8 21:42:49 2015 +0530

    cluster/ec: Fix readdir de-itransform
    
    Problem:
    gf_deitransform returns the glbal client-id in the complete graph. So except
    for the first disperse subvolume under dht, all the other disperse subvolumes
    will return a client-id greater than ec->nodes, so readdir will always error
    out in those subvolumes.
    
    Fix:
    Get the client subvolume whose client-id matches the client-id returned by
    gf_deitransform of offset.
    
    Change-Id: I26aa17504352d48d7ff14b390b62f49d7ab2d699
    BUG: 1209113
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/10165
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Xavier Hernandez <xhernandez>

Comment 6 Pranith Kumar K 2015-04-16 13:27:50 UTC
Dan needs to send out the fix for the second issue.

Pranith

Comment 7 Anand Avati 2015-04-16 14:31:41 UTC
REVIEW: http://review.gluster.org/10274 (This fix corrects the subvolume id in the presense of graphs composed out of multiple volumes. Example such graphs are created with the self heal daemon and snap uss. Prior, the number of bricks calculated was computed regardles of the total number of volumes combined within the graph. With this fix, the brick count only includes those owned by particular volume in question.) posted (#1) for review on master by Dan Lambright (dlambrig)

Comment 8 Vijay Bellur 2015-04-23 11:35:27 UTC
Please perform POST -> MODIFIED transitions after all patches needed are merged. Thanks!

Comment 9 Vivek Agarwal 2015-04-23 13:23:02 UTC
Forked another bug for this which is assigned to tiering team and hence this moved to modified. Done per discussion with ec team.

Comment 10 Pranith Kumar K 2015-05-09 18:04:05 UTC
Not observing this in the recent builds. For now closing the bug. Please feel free to re-open as soon as we observe it again.


Note You need to log in before you can comment on or make changes to this bug.