Bug 1477581 - Distributed_Disperse+CIFS: Seeing "Invalid argument" when doing IOs (untar or copy of directories)
Summary: Distributed_Disperse+CIFS: Seeing "Invalid argument" when doing IOs (untar or...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: samba
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Guenther Deschner
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-02 12:18 UTC by Nag Pavan Chilakam
Modified: 2019-12-11 04:54 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-11 04:54:08 UTC
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2017-08-02 12:18:40 UTC
Description of problem:
========================
On a CIFS mount for  a distributed disperse volume, when I try to copy a directory to a new location , where both the source and destination reside on the mount(ie volume), I see that some of the files fail to get copied with invalid arguement

NOTE: Tried the same with fuse , but not seeing this problem


root@dhcp37-132 cifs]# cp -Rf dir1 loki
cp: cannot create regular file ‘loki/thread1/level01/level11/level21/level31/level41/59819dd5%%ALTXZ345G8’: Invalid argument



Following is the snippet from samba log:
=======================================

[2017-08-02 10:16:31.003370] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc
[2017-08-02 10:16:31.003404] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /loki/thread1/level01/level11/level21/level31/level41 (gfid = 9883cd8c-0ec5-42ad-92d3-87692080027b) returned -1 [Invalid argument]
[2017-08-02 10:16:31.003875] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:31.003911] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /loki/thread1/level01/level11/level21/level31/level41 (gfid = 9883cd8c-0ec5-42ad-92d3-87692080027b) returned -1 [Invalid argument]
[2017-08-02 10:16:31.003932] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:31.005369] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:36.007778] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /dir1/thread1/level00 (gfid = 86d9b770-b8ac-4132-b0ec-40f5257ffe80) returned -1 [Invalid argument]
[2017-08-02 10:16:36.008100] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:36.008108] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /dir1/thread1/level00 (gfid = 86d9b770-b8ac-4132-b0ec-40f5257ffe80) returned -1 [Invalid argument]
[2017-08-02 10:16:36.008144] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:36.009928] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:36.009974] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
The message "W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc" repeated 3 times between [2017-08-02 10:16:31.003370] and [2017-08-02 10:16:36.014284]
[2017-08-02 10:16:36.014294] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /dir1/thread1 (gfid = 227f6ddc-40e5-4b91-8233-05f7ca5a0a1a) returned -1 [Invalid argument]
[2017-08-02 10:16:36.014395] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc
[2017-08-02 10:16:36.014401] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /dir1/thread1 (gfid = 227f6ddc-40e5-4b91-8233-05f7ca5a0a1a) returned -1 [Invalid argument]
[2017-08-02 10:16:36.014429] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument
[2017-08-02 10:16:36.015851] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc
[2017-08-02 10:16:36.015883] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument



Version-Release number of selected component (if applicable):
===================
[root@dhcp43-157 samba]# rpm -qa|grep gluster
glusterfs-geo-replication-3.8.4-33.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-5.el7rhgs.x86_64
glusterfs-fuse-3.8.4-33.el7rhgs.x86_64
glusterfs-api-3.8.4-33.el7rhgs.x86_64
glusterfs-3.8.4-33.el7rhgs.x86_64
glusterfs-cli-3.8.4-33.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-33.el7rhgs.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
glusterfs-server-3.8.4-33.el7rhgs.x86_64
glusterfs-rdma-3.8.4-33.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64
python-gluster-3.8.4-33.el7rhgs.noarch
glusterfs-libs-3.8.4-33.el7rhgs.x86_64
[root@dhcp43-157 samba]# rpm -qa|grep samba
samba-vfs-glusterfs-4.6.3-5.el7rhgs.x86_64
samba-winbind-krb5-locator-4.6.3-5.el7rhgs.x86_64
samba-libs-4.6.3-5.el7rhgs.x86_64
samba-client-libs-4.6.3-5.el7rhgs.x86_64
samba-winbind-clients-4.6.3-5.el7rhgs.x86_64
samba-common-tools-4.6.3-5.el7rhgs.x86_64
samba-common-libs-4.6.3-5.el7rhgs.x86_64
samba-winbind-modules-4.6.3-5.el7rhgs.x86_64
samba-client-4.6.3-5.el7rhgs.x86_64
samba-common-4.6.3-5.el7rhgs.noarch
samba-winbind-4.6.3-5.el7rhgs.x86_64
samba-4.6.3-5.el7rhgs.x86_64
samba-debuginfo-4.6.3-5.el7rhgs.x86_64



How reproducible:
=================
always


Steps to Reproduce:
==================
1.have a  2x(4+2) ec volume and mount it using cifs
2. create some files and directories in breadth and depth. or use linux untar for this workset creation.
3.Now let us take the source directory as "kernel.dir"
4. Now cp -Rf kernel.dir <newdir>


Actual results:
===================
5.some files fail to create with "Invalid Arguement"

Expected results:
=================
Should not see any such issues

Additional info:
================
NOTE: Tried the same with fuse , but not seeing this problem

Comment 2 Nag Pavan Chilakam 2017-08-02 13:21:19 UTC
Tried all the below combinations,however seeing the issue with all the below combinations:
1)enabled uss+cifs
2)disable uss+cifs
3)mount with default cifs command
4)user vers=3

[root@dhcp43-157 samba]# gluster v info bangalore
 
Volume Name: bangalore
Type: Distributed-Disperse
Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8
Status: Started
Snapshot Count: 1
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table
Options Reconfigured:
features.show-snapshot-directory: enable
features.uss: enable
features.barrier: disable
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp43-157 samba]# gluster v set bangalore features.show-snapshot-directory off
volume set: success
[root@dhcp43-157 samba]# gluster v set bangalore features.uss off
volume set: success
[root@dhcp43-157 samba]# gluster v info bangalore
 
Volume Name: bangalore
Type: Distributed-Disperse
Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8
Status: Started
Snapshot Count: 1
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table
Options Reconfigured:
features.show-snapshot-directory: off
features.uss: off
features.barrier: disable
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp43-157 samba]# gluster v status
Status of volume: bangalore
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp43-157.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49165     0          Y       24596
Brick dhcp41-157.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49165     0          Y       5059 
Brick dhcp43-164.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49165     0          Y       15729
Brick dhcp43-162.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49164     0          Y       20980
Brick dhcp43-78.lab.eng.blr.redhat.com:/bri
cks/brick1/table                            49157     0          Y       32629
Brick dhcp41-241.lab.eng.blr.redhat.com:/br
icks/brick1/table                           49156     0          Y       6946 
Brick dhcp43-157.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49166     0          Y       24615
Brick dhcp41-157.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49166     0          Y       5078 
Brick dhcp43-164.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49166     0          Y       15748
Brick dhcp43-162.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49165     0          Y       20999
Brick dhcp43-78.lab.eng.blr.redhat.com:/bri
cks/brick2/table                            49158     0          Y       32649
Brick dhcp41-241.lab.eng.blr.redhat.com:/br
icks/brick2/table                           49157     0          Y       6965 
Self-heal Daemon on localhost               N/A       N/A        Y       24635
Self-heal Daemon on dhcp43-78.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       32670
Self-heal Daemon on dhcp43-164.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       15768
Self-heal Daemon on dhcp43-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       21019
Self-heal Daemon on dhcp41-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5106 
Self-heal Daemon on dhcp41-241.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       6986 
 
Task Status of Volume bangalore
------------------------------------------------------------------------------
There are no active volume tasks

Comment 3 Nag Pavan Chilakam 2017-08-02 13:21:36 UTC
client version;

root@dhcp37-132 ~]# rpm -qa|egrep "cifs|samb"
cifs-utils-6.2-10.el7.x86_64
samba-common-tools-4.6.3-4.el7rhgs.x86_64
samba-common-libs-4.6.3-4.el7rhgs.x86_64
samba-common-4.6.3-4.el7rhgs.noarch
samba-libs-4.6.3-4.el7rhgs.x86_64
samba-client-libs-4.6.3-4.el7rhgs.x86_64

Comment 4 Nag Pavan Chilakam 2017-08-02 13:23:08 UTC
Changing the title, as I hit this problem even when we do linux untar.
We see invalid argument error for some of the files in a random fashion

Comment 5 Nag Pavan Chilakam 2017-08-02 14:16:39 UTC
with below vol settings, I am NOT seeing the errors reported above

Volume Name: bangalore
Type: Distributed-Disperse
Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8
Status: Started
Snapshot Count: 1
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table
Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table
Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table
Options Reconfigured:
performance.cache-samba-metadata: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
features.show-snapshot-directory: off
features.uss: off
features.barrier: disable
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

Comment 6 Nag Pavan Chilakam 2017-08-02 14:20:11 UTC
logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1477581/

Comment 8 Poornima G 2017-11-28 11:50:51 UTC
Can this be retested with the 3.4 build? i think there was a similar bug with gfapi+EC combination(mismatching gfid) that was fixed. Hence putting it on retest.

Comment 13 Poornima G 2018-11-19 04:25:20 UTC
Having stat-prefetch on and the rest of the options like cache-invalidation not enabled in not a supported configuration. Also this happens with EC and gfapi on cifs. Cifs mount is not something, we take as priority in the downstream. Hence i would prefer closing it as wontfix/deferred.

But one thing, it could actually be a gfid mismatch bug in gfapi and EC combination, hence i would suggest changing the component to EC/gfapi and root cause from that perspective?


Note You need to log in before you can comment on or make changes to this bug.