Bug 1415038 - [Ganesha + EC] : Input/Output Error while creating LOTS of smallfiles
Summary: [Ganesha + EC] : Input/Output Error while creating LOTS of smallfiles
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Pranith Kumar K
QA Contact: Ambarish
URL:
Whiteboard:
Depends On:
Blocks: 1417147 1438411 1438423 1438424
TreeView+ depends on / blocked
 
Reported: 2017-01-20 03:42 UTC by Ambarish
Modified: 2017-09-21 04:56 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.8.4-23
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1438411 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:30:55 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Ambarish 2017-01-20 03:42:30 UTC
Description of problem:
------------------------

4 node cluster - 3 volumes - one 2*(4+2) ,one 2*3 and one 2*2.

All volumes exported via Ganesha and mounted on 4 clients,each one having >1 mount via v3/v4.

Almost an hour into my workload,I see that IO has errored out on my clients :

***********
On gqac019
***********

Mount : 192.168.97.152:/butcher
[root@gqac019 gluster-mount]# for i in {1..25};do mkdir dir$i;cp linux-4.9.4.tar.xz dir$i;cd dir$i;tar xvfJ linux-4.9.4.tar.xz >> /tar$i.log;cd ..;done
cp: error reading ‘linux-4.9.4.tar.xz’: Input/output error
cp: failed to extend ‘dir1/linux-4.9.4.tar.xz’: Input/output error
tar: linux-4.9.4/drivers/bus/arm-cci.c: Cannot change ownership to uid 0, gid 0: Remote I/O error


**********
On gqac011
**********

Mount :192.168.97.150:/butcher
[root@gqac011 W]#  for i in {1..25};do mkdir dir$i;cp linux-4.9.4.tar.xz dir$i;cd dir$i;tar xvfJ linux-4.9.4.tar.xz >> /tar$i.log;cd ..;done
cp: error reading ‘linux-4.9.4.tar.xz’: Input/output error
cp: failed to extend ‘dir1/linux-4.9.4.tar.xz’: Input/output error

**********
On gqac019
**********

Mount : 192.168.97.152:/butcher
[root@gqac019 ~]# /opt/qa/tools/system_light/run.sh -w /gluster-mount -t bonnie -l /var/tmp/bonnie.log
/opt/qa/tools/system_light/scripts
/opt/qa/tools/system_light
/root
/gluster-mount
/
----- /gluster-mount
/gluster-mount/run2179/
Tests available:
arequal
bonnie
compile_kernel
coverage
dbench
dd
ffsb
fileop
fs_mark
fsx
fuse
glusterfs
glusterfs_build
iozone
locks
ltp
multiple_files
openssl
posix_compliance
postmark
read_large
rpc
syscallbench
tiobench
===========================TESTS RUNNING===========================
Changing to the specified mountpoint
/gluster-mount/run2179
executing bonnie
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...Bonnie: drastic I/O error (re-write read): Stale file handle

The brick processes are up and running.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

nfs-ganesha-2.4.1-6.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.1-6.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-12.el7rhgs.x86_64


How reproducible:
-----------------

1/1


Actual results:
----------------

EIO.

Expected results:
------------------

Zero error status on clients

Additional info:
----------------

[root@gqas013 tmp]# gluster v status
Status of volume: butcher
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49153     0          Y       2486 
Brick gqas008.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49152     0          Y       9116 
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49153     0          Y       5724 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks1/brick                                49153     0          Y       6853 
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49154     0          Y       2513 
Brick gqas008.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49153     0          Y       9135 
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49155     0          Y       2532 
Brick gqas008.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49154     0          Y       9154 
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49154     0          Y       5743 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks2/brick                                49154     0          Y       6872 
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49155     0          Y       5762 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks3/brick                                49155     0          Y       6891 
Self-heal Daemon on localhost               N/A       N/A        Y       7066 
Self-heal Daemon on gqas008.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       2621 
Self-heal Daemon on gqas005.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       26374
Self-heal Daemon on gqas006.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       27934
 
Task Status of Volume butcher
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas013.sbu.lab.eng.bos.redhat.com:/v
ar/lib/glusterd/ss_brick                    49152     0          Y       30998
Brick gqas005.sbu.lab.eng.bos.redhat.com:/v
ar/lib/glusterd/ss_brick                    49152     0          Y       30630
Brick gqas006.sbu.lab.eng.bos.redhat.com:/v
ar/lib/glusterd/ss_brick                    49152     0          Y       30713
Self-heal Daemon on localhost               N/A       N/A        Y       7066 
Self-heal Daemon on gqas008.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       2621 
Self-heal Daemon on gqas005.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       26374
Self-heal Daemon on gqas006.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       27934
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: rep2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49157     0          Y       3022 
Brick gqas008.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49156     0          Y       13329
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49158     0          Y       8299 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks11/brick                               49158     0          Y       27834
Self-heal Daemon on localhost               N/A       N/A        Y       7066 
Self-heal Daemon on gqas006.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       27934
Self-heal Daemon on gqas005.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       26374
Self-heal Daemon on gqas008.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       2621 
 
Task Status of Volume rep2
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: rep3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49156     0          Y       2823 
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49156     0          Y       7499 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49156     0          Y       13758
Brick gqas008.sbu.lab.eng.bos.redhat.com:/b
ricks9/brick                                49155     0          Y       12041
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49157     0          Y       7518 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks10/brick                               49157     0          Y       13848
Self-heal Daemon on localhost               N/A       N/A        Y       7066 
Self-heal Daemon on gqas008.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       2621 
Self-heal Daemon on gqas006.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       27934
Self-heal Daemon on gqas005.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       26374
 
Task Status of Volume rep3
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@gqas013 tmp]#

Comment 3 Soumya Koduri 2017-01-20 06:59:08 UTC
In the nodes which those clients are connected to (i.e, with VIP 192.168.97.152 & 192.168.97.150) , I see ENOTCONN errors-


===gqas005 ==

19/01/2017 14:34:52 : epoch 48930000 : gqas005.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-28019[work-41] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Transport endpoint is not connected


**********
On gqac011
**********

Mount :192.168.97.150:/butcher
[root@gqac011 W]#  for i in {1..25};do mkdir dir$i;cp linux-4.9.4.tar.xz dir$i;cd dir$i;tar xvfJ linux-4.9.4.tar.xz >> /tar$i.log;cd ..;done
cp: error reading ‘linux-4.9.4.tar.xz’: Input/output error
cp: failed to extend ‘dir1/linux-4.9.4.tar.xz’: Input/output error


From ganesha-gfapi.log -

[2017-01-19 19:34:52.984372] I [MSGID: 122002] [ec-heal.c:2368:ec_heal_do] 0-butcher-disperse-0: /W/linux-4.9.4.tar.xz: name heal failed [Transport endpoint is not connected]



There are many ENOTCONN errors on gqas006 as well.

Comment 4 Atin Mukherjee 2017-01-20 07:12:23 UTC
Pranith - can you have a look at the comment 3?

Comment 5 Ambarish 2017-01-22 09:42:18 UTC
This is not necessarily multivolume.

I could repro it with 1 EC volume as well - with small file creates and kernel untars.


Adjusting summary accordingly.

Comment 6 Ambarish 2017-01-23 10:25:00 UTC
[root@gqas009 gluster]# grep -i mismatch /var/log/ganesha-gfapi.log
[2017-01-22 05:34:47.319714] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 0-testvol-disperse-7: Mismatching GFID's in loc
[2017-01-22 11:18:11.939976] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-7: Mismatching GFID's in loc
[2017-01-22 12:35:54.265400] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-7: Mismatching GFID's in loc
[2017-01-22 13:03:13.471747] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-4: Mismatching GFID's in loc
[2017-01-22 13:04:04.590037] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-4: Mismatching GFID's in loc
[2017-01-22 13:10:33.799298] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-4: Mismatching GFID's in loc
[2017-01-22 13:11:06.763801] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-5: Mismatching GFID's in loc
[2017-01-22 13:43:24.219911] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-7: Mismatching GFID's in loc
[2017-01-22 14:35:53.688066] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-6: Mismatching GFID's in loc
[2017-01-23 03:09:18.811044] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 10-testvol-disperse-7: Mismatching GFID's in loc

Comment 12 Soumya Koduri 2017-03-30 09:23:53 UTC
Thanks to Pranith & Ashish. 

Below is the snippet of the debugging done on the core 


>>>>
(gdb) fr 7
#7  0x00007fde6c184df2 in ec_manager_lookup (fop=0x7fde575a5a80, state=<optimized out>) at ec-generic.c:837
837	                ec_lookup_rebuild(fop->xl->private, fop, cbk);
(gdb) p *fop->req_frame->parent->parent->parent->parent
$17 = {root = 0x7fde75452350, parent = 0x7fde755344d4, frames = {next = 0x7fde755344e4, prev = 0x7fde7559c148}, local = 0x7fdc240cc420, this = 0x7fde6007ae20, ret = 0x7fde7a7de090 <default_lookup_cbk>, ref_count = 1, lock = {
    spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, 
  cookie = 0x7fde7555340c, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde7a804d90 <__FUNCTION__.16554> "default_lookup_resume", 
  wind_to = 0x7fde7a803d30 "FIRST_CHILD(this)->fops->lookup", unwind_from = 0x0, unwind_to = 0x7fde7a8043cd "default_lookup_cbk"}
(gdb) p *fop->req_frame->parent->parent->parent->parent->parent
$18 = {root = 0x7fde75452350, parent = 0x7fde755929ac, frames = {next = 0x7fde755929bc, prev = 0x7fde7555341c}, local = 0x0, this = 0x7fde6007c300, ret = 0x7fde5eafbb30 <io_stats_lookup_cbk>, ref_count = 1, lock = {spinlock = 1, 
    mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x7fde755344d4, 
  complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde5eb04ec0 <__FUNCTION__.16584> "io_stats_lookup", wind_to = 0x7fde5eb037b8 "FIRST_CHILD(this)->fops->lookup", 
  unwind_from = 0x0, unwind_to = 0x7fde5eb02997 "io_stats_lookup_cbk"}
(gdb) p *fop->req_frame->parent->parent->parent->parent
$19 = {root = 0x7fde75452350, parent = 0x7fde755344d4, frames = {next = 0x7fde755344e4, prev = 0x7fde7559c148}, local = 0x7fdc240cc420, this = 0x7fde6007ae20, ret = 0x7fde7a7de090 <default_lookup_cbk>, ref_count = 1, lock = {
    spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, 
  cookie = 0x7fde7555340c, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde7a804d90 <__FUNCTION__.16554> "default_lookup_resume", 
  wind_to = 0x7fde7a803d30 "FIRST_CHILD(this)->fops->lookup", unwind_from = 0x0, unwind_to = 0x7fde7a8043cd "default_lookup_cbk"}
(gdb) p *fop->req_frame->parent->parent->parent->parent->this->name
$20 = 98 'b'
(gdb) p fop->req_frame->parent->parent->parent->parent->this->name
$21 = 0x7fde6007b9c0 "butcher-md-cache"
(gdb) p (mdc_local_t*)fop->req_frame->parent->parent->parent->parent->local
$22 = (mdc_local_t *) 0x7fdc240cc420
(gdb) p *$22->loc
Structure has no component named operator*.
(gdb) p $22->loc
$23 = {path = 0x7fdc24192880 "<gfid:9d7aeb82-3c80-4316-8128-bba593c65f5c>/..", name = 0x7fdc241928ac "..", inode = 0x7fde4d02a9ec, parent = 0x7fde4d355240, gfid = '\000' <repeats 15 times>, 
  pargfid = "\235z\353\202<\200C\026\201(\273\245\223\306_\\"}
(gdb) p *$22->loc.inode
$24 = {table = 0x7fde607fb710, gfid = '\000' <repeats 15 times>, lock = {spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = "\001", '\000' <repeats 38 times>, __align = 1}}, nlookup = 0, fd_count = 0, ref = 28, ia_type = IA_INVAL, fd_list = {next = 0x7fde4d02aa44, prev = 0x7fde4d02aa44}, dentry_list = {next = 0x7fde4d02aa54, 
    prev = 0x7fde4d02aa54}, hash = {next = 0x7fde4d02aa64, prev = 0x7fde4d02aa64}, list = {next = 0x7fde4d28d9b8, prev = 0x7fde4d27befc}, _ctx = 0x7fdcac0191e0}
(gdb) p (mdc_local_t*)fop->req_frame->parent->parent->parent
$25 = (mdc_local_t *) 0x7fde7559c138
(gdb) p fop->req_frame->parent->parent->parent
$26 = (call_frame_t *) 0x7fde7559c138
(gdb) p *fop->req_frame->parent->parent->parent
$27 = {root = 0x7fde75452350, parent = 0x7fde7555340c, frames = {next = 0x7fde7555341c, prev = 0x7fde7558a6e4}, local = 0x7fde4d02a9ec, this = 0x7fde600784a0, ret = 0x7fde5ef22600 <mdc_lookup_cbk>, ref_count = 1, lock = {spinlock = 1, 
    mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x7fde7559c138, 
  complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde5ef2856f <__FUNCTION__.15558> "mdc_lookup", wind_to = 0x7fde7a803d30 "FIRST_CHILD(this)->fops->lookup", 
  unwind_from = 0x0, unwind_to = 0x7fde5ef27709 "mdc_lookup_cbk"}
(gdb) p *fop->req_frame->parent->parent->parent->this->name
$28 = 98 'b'
(gdb) p fop->req_frame->parent->parent->parent->this->name
$29 = 0x7fde60079040 "butcher-quick-read"
(gdb) p fop->req_frame->parent->parent
$30 = (call_frame_t *) 0x7fde7558a6d4
(gdb) p *fop->req_frame->parent->parent
$31 = {root = 0x7fde75452350, parent = 0x7fde7559c138, frames = {next = 0x7fde7559c148, prev = 0x7fde754d2f30}, local = 0x7fde6009ad54, this = 0x7fde60076d80, ret = 0x7fde5f33ab30 <qr_lookup_cbk>, ref_count = 1, lock = {spinlock = 1, 
    mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x7fde7558a6d4, 
  complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde5f33cae0 <__FUNCTION__.16084> "qr_lookup", wind_to = 0x7fde5f33c4c8 "FIRST_CHILD(this)->fops->lookup", 
  unwind_from = 0x0, unwind_to = 0x7fde5f33c7a4 "qr_lookup_cbk"}
(gdb) p *fop->req_frame->parent->parent->this->name
$32 = 98 'b'
(gdb) p fop->req_frame->parent->parent->this->name
$33 = 0x7fde60076c00 "butcher-io-cache"
(gdb) p ((ioc_local_t*)fop->req_frame->parent->parent->local)->file_loc
$34 = {path = 0x7fdc240798a0 "<gfid:9d7aeb82-3c80-4316-8128-bba593c65f5c>/..", name = 0x7fdc240798cc "..", inode = 0x7fde4d02a9ec, parent = 0x7fde4d355240, gfid = '\000' <repeats 15 times>, 
  pargfid = "\235z\353\202<\200C\026\201(\273\245\223\306_\\"}
(gdb) p *fop->req_frame->parent
$35 = {root = 0x7fde75452350, parent = 0x7fde7558a6d4, frames = {next = 0x7fde7558a6e4, prev = 0x7fde754ce3d4}, local = 0x7fde5de5b3cc, this = 0x7fde60070d00, ret = 0x7fde5f546c60 <ioc_lookup_cbk>, ref_count = 3, lock = {spinlock = 1, 
    mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x7fde754d2f20, 
  complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde5f54ecd7 <__FUNCTION__.16334> "ioc_lookup", wind_to = 0x7fde7a803d30 "FIRST_CHILD(this)->fops->lookup", 
  unwind_from = 0x0, unwind_to = 0x7fde5f54df48 "ioc_lookup_cbk"}
(gdb) p fop->req_frame->parent->this->name
$36 = 0x7fde60062720 "butcher-dht"
(gdb) p (dht_local_t*)fop->req_frame->parent->local
$37 = (dht_local_t *) 0x7fde5de5b3cc
(gdb) p $37->loc
$38 = {path = 0x7fdc240420d0 "<gfid:9d7aeb82-3c80-4316-8128-bba593c65f5c>/..", name = 0x7fdc240420fc "..", inode = 0x7fde4d02a9ec, parent = 0x7fde4d355240, gfid = ";\202\020^@eC\024\240\306\b\365l,\270V", 
  pargfid = "\235z\353\202<\200C\026\201(\273\245\223\306_\\"}
(gdb) p /x $37->loc.gfid
$39 = {0x3b, 0x82, 0x10, 0x5e, 0x40, 0x65, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56}
(gdb) fr 7
#7  0x00007fde6c184df2 in ec_manager_lookup (fop=0x7fde575a5a80, state=<optimized out>) at ec-generic.c:837
837	                ec_lookup_rebuild(fop->xl->private, fop, cbk);
(gdb) p /x fop->loc[0].gfid
$40 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56}
(gdb) fr 6
#6  0x00007fde6c184b85 in ec_lookup_rebuild (ec=<optimized out>, fop=fop@entry=0x7fde575a5a80, cbk=cbk@entry=0x7fde56dd8aa8) at ec-generic.c:644
644	    err = ec_loc_update(fop->xl, &fop->loc[0], cbk->inode, &cbk->iatt[0]);
(gdb) p /x cbk->iatt[0].ia_gfid
$41 = {0x3b, 0x82, 0x10, 0x5e, 0x40, 0x65, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56}
(gdb) p /x $37->gfid
$42 = {0x3b, 0x82, 0x10, 0x5e, 0x40, 0x65, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56}
(gdb) p *fop->req_frame
$43 = {root = 0x7fde75452350, parent = 0x7fde754d2f20, frames = {next = 0x7fde75526d94, prev = 0x7fde75546864}, local = 0x0, this = 0x7fde6006c320, ret = 0x7fde5fdafd90 <dht_lookup_dir_cbk>, ref_count = 0, lock = {spinlock = 1, mutex = {
      __data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x7fde75503c0c, 
  complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde5fdee590 <__FUNCTION__.18890> "dht_lookup_directory", 
  wind_to = 0x7fde5fde9528 "conf->subvolumes[i]->fops->lookup", unwind_from = 0x0, unwind_to = 0x7fde5fdeb3cd "dht_lookup_dir_cbk"}
(gdb) shell 
[root@gqas013 tmp]# gluster v info butcher
 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 5be704df-e1e1-43a2-8f67-5bf720efe9e5
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick2: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick3: gqas015.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick4: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick5: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick6: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick8: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick9: gqas015.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick10: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick11: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick12: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick14: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick15: gqas015.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick16: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick17: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick18: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick20: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick21: gqas015.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick22: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick23: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick24: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick26: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick27: gqas015.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick28: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick29: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick30: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick31: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick32: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick33: gqas015.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick34: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick35: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick36: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick37: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick38: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick39: gqas015.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick40: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick41: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick42: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick43: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick44: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick45: gqas015.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick46: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick47: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick48: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick49: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick50: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick51: gqas015.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick52: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick53: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick54: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick55: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick56: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick57: gqas015.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick58: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick59: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick60: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick61: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick62: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick63: gqas015.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick64: gqas005.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick65: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick66: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick67: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick68: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick69: gqas015.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick70: gqas005.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick71: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick72: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas013 tmp]# exit
(gdb) p *fop->req_frame
$44 = {root = 0x7fde75452350, parent = 0x7fde754d2f20, frames = {next = 0x7fde75526d94, prev = 0x7fde75546864}, local = 0x0, this = 0x7fde6006c320, ret = 0x7fde5fdafd90 <dht_lookup_dir_cbk>, ref_count = 0, lock = {spinlock = 1, mutex = {
      __data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x7fde75503c0c, 
  complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7fde5fdee590 <__FUNCTION__.18890> "dht_lookup_directory", 
  wind_to = 0x7fde5fde9528 "conf->subvolumes[i]->fops->lookup", unwind_from = 0x0, unwind_to = 0x7fde5fdeb3cd "dht_lookup_dir_cbk"}

<<<<

Pranith observed that loc->gfid has been set in dht layer but got modified by the time ec was validating it in ec_lookup_cbk. He shall be able to provide more details on the code-flow when this issue arises.

Comment 18 Pranith Kumar K 2017-04-03 12:09:21 UTC
https://review.gluster.org/16986 - Patch upstream

Comment 24 Ambarish 2017-05-24 09:23:56 UTC
I could not repro it on two attempts to recreate it on 3.8.4-25.

Since it's a stress test,I'll be revisiting it again during 3.3.Will reopen  if I hit it again.

Moving this to Verified.

Comment 26 errata-xmlrpc 2017-09-21 04:30:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 27 errata-xmlrpc 2017-09-21 04:56:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.