Description of problem: ======================== On a CIFS mount for a distributed disperse volume, when I try to copy a directory to a new location , where both the source and destination reside on the mount(ie volume), I see that some of the files fail to get copied with invalid arguement NOTE: Tried the same with fuse , but not seeing this problem root@dhcp37-132 cifs]# cp -Rf dir1 loki cp: cannot create regular file ‘loki/thread1/level01/level11/level21/level31/level41/59819dd5%%ALTXZ345G8’: Invalid argument Following is the snippet from samba log: ======================================= [2017-08-02 10:16:31.003370] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc [2017-08-02 10:16:31.003404] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /loki/thread1/level01/level11/level21/level31/level41 (gfid = 9883cd8c-0ec5-42ad-92d3-87692080027b) returned -1 [Invalid argument] [2017-08-02 10:16:31.003875] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc [2017-08-02 10:16:31.003911] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /loki/thread1/level01/level11/level21/level31/level41 (gfid = 9883cd8c-0ec5-42ad-92d3-87692080027b) returned -1 [Invalid argument] [2017-08-02 10:16:31.003932] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument [2017-08-02 10:16:31.005369] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument [2017-08-02 10:16:36.007778] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /dir1/thread1/level00 (gfid = 86d9b770-b8ac-4132-b0ec-40f5257ffe80) returned -1 [Invalid argument] [2017-08-02 10:16:36.008100] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc [2017-08-02 10:16:36.008108] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /dir1/thread1/level00 (gfid = 86d9b770-b8ac-4132-b0ec-40f5257ffe80) returned -1 [Invalid argument] [2017-08-02 10:16:36.008144] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument [2017-08-02 10:16:36.009928] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc [2017-08-02 10:16:36.009974] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument The message "W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc" repeated 3 times between [2017-08-02 10:16:31.003370] and [2017-08-02 10:16:36.014284] [2017-08-02 10:16:36.014294] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-0 for /dir1/thread1 (gfid = 227f6ddc-40e5-4b91-8233-05f7ca5a0a1a) returned -1 [Invalid argument] [2017-08-02 10:16:36.014395] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-1: Mismatching GFID's in loc [2017-08-02 10:16:36.014401] I [MSGID: 109094] [dht-common.c:1014:dht_revalidate_cbk] 0-bangalore-dht: Revalidate: subvolume bangalore-disperse-1 for /dir1/thread1 (gfid = 227f6ddc-40e5-4b91-8233-05f7ca5a0a1a) returned -1 [Invalid argument] [2017-08-02 10:16:36.014429] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument [2017-08-02 10:16:36.015851] W [MSGID: 122019] [ec-helpers.c:364:ec_loc_gfid_check] 0-bangalore-disperse-0: Mismatching GFID's in loc [2017-08-02 10:16:36.015883] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-bangalore-snapview-client: Lookup failed on normal graph with error Invalid argument Version-Release number of selected component (if applicable): =================== [root@dhcp43-157 samba]# rpm -qa|grep gluster glusterfs-geo-replication-3.8.4-33.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch samba-vfs-glusterfs-4.6.3-5.el7rhgs.x86_64 glusterfs-fuse-3.8.4-33.el7rhgs.x86_64 glusterfs-api-3.8.4-33.el7rhgs.x86_64 glusterfs-3.8.4-33.el7rhgs.x86_64 glusterfs-cli-3.8.4-33.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-client-xlators-3.8.4-33.el7rhgs.x86_64 gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64 glusterfs-server-3.8.4-33.el7rhgs.x86_64 glusterfs-rdma-3.8.4-33.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64 python-gluster-3.8.4-33.el7rhgs.noarch glusterfs-libs-3.8.4-33.el7rhgs.x86_64 [root@dhcp43-157 samba]# rpm -qa|grep samba samba-vfs-glusterfs-4.6.3-5.el7rhgs.x86_64 samba-winbind-krb5-locator-4.6.3-5.el7rhgs.x86_64 samba-libs-4.6.3-5.el7rhgs.x86_64 samba-client-libs-4.6.3-5.el7rhgs.x86_64 samba-winbind-clients-4.6.3-5.el7rhgs.x86_64 samba-common-tools-4.6.3-5.el7rhgs.x86_64 samba-common-libs-4.6.3-5.el7rhgs.x86_64 samba-winbind-modules-4.6.3-5.el7rhgs.x86_64 samba-client-4.6.3-5.el7rhgs.x86_64 samba-common-4.6.3-5.el7rhgs.noarch samba-winbind-4.6.3-5.el7rhgs.x86_64 samba-4.6.3-5.el7rhgs.x86_64 samba-debuginfo-4.6.3-5.el7rhgs.x86_64 How reproducible: ================= always Steps to Reproduce: ================== 1.have a 2x(4+2) ec volume and mount it using cifs 2. create some files and directories in breadth and depth. or use linux untar for this workset creation. 3.Now let us take the source directory as "kernel.dir" 4. Now cp -Rf kernel.dir <newdir> Actual results: =================== 5.some files fail to create with "Invalid Arguement" Expected results: ================= Should not see any such issues Additional info: ================ NOTE: Tried the same with fuse , but not seeing this problem
Tried all the below combinations,however seeing the issue with all the below combinations: 1)enabled uss+cifs 2)disable uss+cifs 3)mount with default cifs command 4)user vers=3 [root@dhcp43-157 samba]# gluster v info bangalore Volume Name: bangalore Type: Distributed-Disperse Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8 Status: Started Snapshot Count: 1 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table Options Reconfigured: features.show-snapshot-directory: enable features.uss: enable features.barrier: disable performance.stat-prefetch: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on [root@dhcp43-157 samba]# gluster v set bangalore features.show-snapshot-directory off volume set: success [root@dhcp43-157 samba]# gluster v set bangalore features.uss off volume set: success [root@dhcp43-157 samba]# gluster v info bangalore Volume Name: bangalore Type: Distributed-Disperse Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8 Status: Started Snapshot Count: 1 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table Options Reconfigured: features.show-snapshot-directory: off features.uss: off features.barrier: disable performance.stat-prefetch: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on [root@dhcp43-157 samba]# gluster v status Status of volume: bangalore Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp43-157.lab.eng.blr.redhat.com:/br icks/brick1/table 49165 0 Y 24596 Brick dhcp41-157.lab.eng.blr.redhat.com:/br icks/brick1/table 49165 0 Y 5059 Brick dhcp43-164.lab.eng.blr.redhat.com:/br icks/brick1/table 49165 0 Y 15729 Brick dhcp43-162.lab.eng.blr.redhat.com:/br icks/brick1/table 49164 0 Y 20980 Brick dhcp43-78.lab.eng.blr.redhat.com:/bri cks/brick1/table 49157 0 Y 32629 Brick dhcp41-241.lab.eng.blr.redhat.com:/br icks/brick1/table 49156 0 Y 6946 Brick dhcp43-157.lab.eng.blr.redhat.com:/br icks/brick2/table 49166 0 Y 24615 Brick dhcp41-157.lab.eng.blr.redhat.com:/br icks/brick2/table 49166 0 Y 5078 Brick dhcp43-164.lab.eng.blr.redhat.com:/br icks/brick2/table 49166 0 Y 15748 Brick dhcp43-162.lab.eng.blr.redhat.com:/br icks/brick2/table 49165 0 Y 20999 Brick dhcp43-78.lab.eng.blr.redhat.com:/bri cks/brick2/table 49158 0 Y 32649 Brick dhcp41-241.lab.eng.blr.redhat.com:/br icks/brick2/table 49157 0 Y 6965 Self-heal Daemon on localhost N/A N/A Y 24635 Self-heal Daemon on dhcp43-78.lab.eng.blr.r edhat.com N/A N/A Y 32670 Self-heal Daemon on dhcp43-164.lab.eng.blr. redhat.com N/A N/A Y 15768 Self-heal Daemon on dhcp43-162.lab.eng.blr. redhat.com N/A N/A Y 21019 Self-heal Daemon on dhcp41-157.lab.eng.blr. redhat.com N/A N/A Y 5106 Self-heal Daemon on dhcp41-241.lab.eng.blr. redhat.com N/A N/A Y 6986 Task Status of Volume bangalore ------------------------------------------------------------------------------ There are no active volume tasks
client version; root@dhcp37-132 ~]# rpm -qa|egrep "cifs|samb" cifs-utils-6.2-10.el7.x86_64 samba-common-tools-4.6.3-4.el7rhgs.x86_64 samba-common-libs-4.6.3-4.el7rhgs.x86_64 samba-common-4.6.3-4.el7rhgs.noarch samba-libs-4.6.3-4.el7rhgs.x86_64 samba-client-libs-4.6.3-4.el7rhgs.x86_64
Changing the title, as I hit this problem even when we do linux untar. We see invalid argument error for some of the files in a random fashion
with below vol settings, I am NOT seeing the errors reported above Volume Name: bangalore Type: Distributed-Disperse Volume ID: 080d1528-91bb-4bdc-b7bb-a201c5370ff8 Status: Started Snapshot Count: 1 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick1/table Brick2: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick1/table Brick3: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick1/table Brick4: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick1/table Brick5: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick1/table Brick6: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick1/table Brick7: dhcp43-157.lab.eng.blr.redhat.com:/bricks/brick2/table Brick8: dhcp41-157.lab.eng.blr.redhat.com:/bricks/brick2/table Brick9: dhcp43-164.lab.eng.blr.redhat.com:/bricks/brick2/table Brick10: dhcp43-162.lab.eng.blr.redhat.com:/bricks/brick2/table Brick11: dhcp43-78.lab.eng.blr.redhat.com:/bricks/brick2/table Brick12: dhcp41-241.lab.eng.blr.redhat.com:/bricks/brick2/table Options Reconfigured: performance.cache-samba-metadata: on network.inode-lru-limit: 50000 performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on features.show-snapshot-directory: off features.uss: off features.barrier: disable performance.stat-prefetch: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on
logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1477581/
Can this be retested with the 3.4 build? i think there was a similar bug with gfapi+EC combination(mismatching gfid) that was fixed. Hence putting it on retest.
Having stat-prefetch on and the rest of the options like cache-invalidation not enabled in not a supported configuration. Also this happens with EC and gfapi on cifs. Cifs mount is not something, we take as priority in the downstream. Hence i would prefer closing it as wontfix/deferred. But one thing, it could actually be a gfid mismatch bug in gfapi and EC combination, hence i would suggest changing the component to EC/gfapi and root cause from that perspective?