Bug 1477581 - Distributed_Disperse+CIFS: Seeing "Invalid argument" when doing IOs (untar or copy of directories) [NEEDINFO]
Distributed_Disperse+CIFS: Seeing "Invalid argument" when doing IOs (untar or...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: samba (Show other bugs)
3.3
Unspecified Unspecified
unspecified Severity urgent
: ---
: ---
Assigned To: Michael Adam
Vivek Das
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-02 08:18 EDT by nchilaka
Modified: 2018-07-17 08:33 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
nchilaka: needinfo? (pgurusid)


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2017-08-02 08:18:40 EDT
Description of problem:
========================
On a CIFS mount for  a distributed disperse volume, when I try to copy a directory to a new location , where both the source and destination reside on the mount(ie volume), I see that some of the files fail to get copied with invalid arguement

NOTE: Tried the same with fuse , but not seeing this problem


root@dhcp37-132 cifs]# cp -Rf dir1 loki
cp: cannot create regular file ‘loki/thread1/level01/level11/level21/level31/level41/59819dd5%%ALTXZ345G8’: Invalid argument



Following is the snippet from samba log:
=======================================

[2017-08-02 10:16:31.003370] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc
[2017-08-02 10:16:31.003404] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /loki/thread1/level01/level11/level21/level31/level41 (gfid = 9883cd8c-0ec5-42ad-92d3-87692080027b) returned -1 [Invalid argument]
[2017-08-02 10:16:31.003875] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:31.003911] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /loki/thread1/level01/level11/level21/level31/level41 (gfid = 9883cd8c-0ec5-42ad-92d3-87692080027b) returned -1 [Invalid argument]
[2017-08-02 10:16:31.003932] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:31.005369] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:36.007778] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /dir1/thread1/level00 (gfid = 86d9b770-b8ac-4132-b0ec-40f5257ffe80) returned -1 [Invalid argument]
[2017-08-02 10:16:36.008100] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:36.008108] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /dir1/thread1/level00 (gfid = 86d9b770-b8ac-4132-b0ec-40f5257ffe80) returned -1 [Invalid argument]
[2017-08-02 10:16:36.008144] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:36.009928] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:36.009974] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
The message "W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc" repeated 3 times between [2017-08-02 10:16:31.003370] and [2017-08-02 10:16:36.014284]
[2017-08-02 10:16:36.014294] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /dir1/thread1 (gfid = 227f6ddc-40e5-4b91-8233-05f7ca5a0a1a) returned -1 [Invalid argument]
[2017-08-02 10:16:36.014395] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:36.014401] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /dir1/thread1 (gfid = 227f6ddc-40e5-4b91-8233-05f7ca5a0a1a) returned -1 [Invalid argument]
[2017-08-02 10:16:36.014429] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:36.015851] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc
[2017-08-02 10:16:36.015883] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument



Version-Release number of selected component (if applicable):
===================
[root@dhcp43-157 samba]# rpm -qa|grep gluster
glusterfs-geo-replication-3.8.4-33.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-5.el7rhgs.x86_64
glusterfs-fuse-3.8.4-33.el7rhgs.x86_64
glusterfs-api-3.8.4-33.el7rhgs.x86_64
glusterfs-3.8.4-33.el7rhgs.x86_64
glusterfs-cli-3.8.4-33.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-33.el7rhgs.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
glusterfs-server-3.8.4-33.el7rhgs.x86_64
glusterfs-rdma-3.8.4-33.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64
python-gluster-3.8.4-33.el7rhgs.noarch
glusterfs-libs-3.8.4-33.el7rhgs.x86_64
[root@dhcp43-157 samba]# rpm -qa|grep samba
samba-vfs-glusterfs-4.6.3-5.el7rhgs.x86_64
samba-winbind-krb5-locator-4.6.3-5.el7rhgs.x86_64
samba-libs-4.6.3-5.el7rhgs.x86_64
samba-client-libs-4.6.3-5.el7rhgs.x86_64
samba-winbind-clients-4.6.3-5.el7rhgs.x86_64
samba-common-tools-4.6.3-5.el7rhgs.x86_64
samba-common-libs-4.6.3-5.el7rhgs.x86_64
samba-winbind-modules-4.6.3-5.el7rhgs.x86_64
samba-client-4.6.3-5.el7rhgs.x86_64
samba-common-4.6.3-5.el7rhgs.noarch
samba-winbind-4.6.3-5.el7rhgs.x86_64
samba-4.6.3-5.el7rhgs.x86_64
samba-debuginfo-4.6.3-5.el7rhgs.x86_64



How reproducible:
=================
always


Steps to Reproduce:
==================
1.have a  2x(4+2) ec volume and mount it using cifs
2. create some files and directories in breadth and depth. or use linux untar for this workset creation.
3.Now let us take the source directory as "kernel.dir"
4. Now cp -Rf kernel.dir <newdir>


Actual results:
===================
5.some files fail to create with "Invalid Arguement"

Expected results:
=================
Should not see any such issues

Additional info:
================
NOTE: Tried the same with fuse , but not seeing this problem
Comment 2 nchilaka 2017-08-02 09:21:19 EDT
Tried all the below combinations,however seeing the issue with all the below combinations:
1)enabled uss+cifs
2)disable uss+cifs
3)mount with default cifs command
4)user vers=3

[root@dhcp43-157 samba]# gluster v info bangalore
 
Volume Name: bangalore
Type: Distributed-Disperse
Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8
Status: Started
Snapshot Count: 1
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table
Options Reconfigured:
features.show-snapshot-directory: enable
features.uss: enable
features.barrier: disable
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp43-157 samba]# gluster v set bangalore features.show-snapshot-directory off
volume set: success
[root@dhcp43-157 samba]# gluster v set bangalore features.uss off
volume set: success
[root@dhcp43-157 samba]# gluster v info bangalore
 
Volume Name: bangalore
Type: Distributed-Disperse
Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8
Status: Started
Snapshot Count: 1
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table
Options Reconfigured:
features.show-snapshot-directory: off
features.uss: off
features.barrier: disable
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp43-157 samba]# gluster v status
Status of volume: bangalore
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp43-157.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49165     0          Y       24596
Brick dhcp41-157.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49165     0          Y       5059 
Brick dhcp43-164.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49165     0          Y       15729
Brick dhcp43-162.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49164     0          Y       20980
Brick dhcp43-78.lab.eng.blr.redhat.com:/bri
cks/brick1/table                            49157     0          Y       32629
Brick dhcp41-241.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49156     0          Y       6946 
Brick dhcp43-157.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49166     0          Y       24615
Brick dhcp41-157.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49166     0          Y       5078 
Brick dhcp43-164.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49166     0          Y       15748
Brick dhcp43-162.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49165     0          Y       20999
Brick dhcp43-78.lab.eng.blr.redhat.com:/bri
cks/brick2/table                            49158     0          Y       32649
Brick dhcp41-241.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49157     0          Y       6965 
Self-heal Daemon on localhost               N/A       N/A        Y       24635
Self-heal Daemon on dhcp43-78.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       32670
Self-heal Daemon on dhcp43-164.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       15768
Self-heal Daemon on dhcp43-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       21019
Self-heal Daemon on dhcp41-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5106 
Self-heal Daemon on dhcp41-241.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       6986 
 
Task Status of Volume bangalore
------------------------------------------------------------------------------
There are no active volume tasks
Comment 3 nchilaka 2017-08-02 09:21:36 EDT
client version;

root@dhcp37-132 ~]# rpm -qa|egrep "cifs|samb"
cifs-utils-6.2-10.el7.x86_64
samba-common-tools-4.6.3-4.el7rhgs.x86_64
samba-common-libs-4.6.3-4.el7rhgs.x86_64
samba-common-4.6.3-4.el7rhgs.noarch
samba-libs-4.6.3-4.el7rhgs.x86_64
samba-client-libs-4.6.3-4.el7rhgs.x86_64
Comment 4 nchilaka 2017-08-02 09:23:08 EDT
Changing the title, as I hit this problem even when we do linux untar.
We see invalid argument error for some of the files in a random fashion
Comment 5 nchilaka 2017-08-02 10:16:39 EDT
with below vol settings, I am NOT seeing the errors reported above

Volume Name: bangalore
Type: Distributed-Disperse
Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8
Status: Started
Snapshot Count: 1
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table
Options Reconfigured:
performance.cache-samba-metadata: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
features.show-snapshot-directory: off
features.uss: off
features.barrier: disable
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
Comment 6 nchilaka 2017-08-02 10:20:11 EDT
logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1477581/
Comment 8 Poornima G 2017-11-28 06:50:51 EST
Can this be retested with the 3.4 build? i think there was a similar bug with gfapi+EC combination(mismatching gfid) that was fixed. Hence putting it on retest.

Note You need to log in before you can comment on or make changes to this bug.