Description of problem: While running bonnie and Crefi along with parallel lookups,Bonnie failed with Input/output error Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha glusterfs-ganesha-3.12.2-7.el7rhgs.x86_64 nfs-ganesha-gluster-2.5.5-4.el7rhgs.x86_64 nfs-ganesha-2.5.5-4.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1. Create 8 node ganesha cluster 2. Create 5 x (4 + 2) Distributed-Disperse volume using 6 out of 8 nodes and export the volume via ganesha 3. Mount the volume to 4 different client with 4 different VIP's 4. Perform below data set Client 1- Using crefi create deep directories with the following data pattern in sequence. create,chmod,hardlink,chgrp,symlink,hardlink,truncate,hardlink Client 2-Run bonnie and lookups - while true;do find;done Client 3-Lookups while true;do ls -laRt;done Client 4-Lookups while true;do du -sh;done Actual results: Bonnie failed on client with IO error Writing a byte at a time...done Writing intelligently...done Rewriting...Can't write block.: Input/output error Bonnie: drastic I/O error (re write(2)): Input/output error real 66m34.755s user 0m4.833s sys 1m31.620s bonnie failed 0 Total 0 tests were successful Switching over to the previous working directory Removing /mnt/ganesha//run5495/ Expected results: Bonnie should not fail Additional info: ganesha-gfapi.logs- [2018-04-12 21:02:13.657683] E [MSGID: 122064] [ec-common.c:1156:ec_prepare_update_cbk] 0-Ganeshavol1-disperse-0: Unable to get version xattr [No such file or directory] [2018-04-12 21:02:21.206056] E [MSGID: 122034] [ec-common.c:651:ec_child_select] 0-Ganeshavol1-disperse-0: Insufficient available children for this request (have 0, need 4) [2018-04-12 21:02:21.206185] W [MSGID: 122040] [ec-common.c:1144:ec_prepare_update_cbk] 0-Ganeshavol1-disperse-0: Failed to get size and version [Input/output error] Observing lots of GFID's mismatched messages in ganesha-gfapi.logs ================= cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.602353] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-2 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.608159] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-3: Mismatching GFID's in loc [2018-04-13 06:33:32.608250] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-3 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.608550] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-4: Mismatching GFID's in loc [2018-04-13 06:33:32.608604] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-4 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.608627] E [MSGID: 101046] [dht-common.c:1857:dht_revalidate_cbk] 0-Ganeshavol1-dht: dict is null The message "W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-2: Mismatching GFID's in loc" repeated 2 times between [2018-04-13 06:33:32.586679] and [2018-04-13 06:33:32.611995] [2018-04-13 06:33:32.627866] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-0: Mismatching GFID's in loc [2018-04-13 06:33:32.627940] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-0 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.628578] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-1: Mismatching GFID's in loc [2018-04-13 06:33:32.628640] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-1 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.628919] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-4: Mismatching GFID's in loc [2018-04-13 06:33:32.628970] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-4 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.630219] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-2: Mismatching GFID's in loc [2018-04-13 06:33:32.630274] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-2 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.630395] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0-Ganeshavol1-disperse-3: Mismatching GFID's in loc [2018-04-13 06:33:32.630405] I [MSGID: 109094] [dht-common.c:1561:dht_revalidate_cbk] 0-Ganeshavol1-dht: Revalidate: subvolume Ganeshavol1-disperse-3 for <gfid:90ecde24-1e83-4f1f-aef6-8cc07bb89304>/level10/level20/level30/level40/level50/level60/level70 (gfid = bf3ddc68-ad9c-45a6-b9e6-5ccccea3e45e) returned -1 [Invalid argument] [2018-04-13 06:33:32.630458] E [MSGID: 101046] [dht-common.c:1857:dht_revalidate_cbk] 0-Ganeshavol1-dht: dict is null =================== Detailed logs will be attaching shortly
Are there any ganesha logs in these SOS reports? I can't seem to find any.
Verified this BZ with # rpm -qa | grep ganesha nfs-ganesha-debuginfo-2.5.5-8.el7rhgs.x86_64 nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64 nfs-ganesha-2.5.5-8.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-14.el7rhgs.x86_64 Create 2 x (4 + 2) Distributed-Disperse Volume. Mounted the volume to 4 different clients using 4 different VIP's. Ran the following workload Client 1- Using crefi create deep directories with the following data pattern in sequence. create,chmod,hardlink,chgrp,symlink,hardlink,truncate,hardlink Client 2-Run bonnie Client 3-Lookups while true;do ls -laRt;done Client 4-Lookups while true;do du -sh;done Performed failover/failback when IO's were running.No I/O error observed.Bonnie completed successfully. Moving this BZ to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607