Description of problem: In case of multiple bricks on different node, when one node is removed from network(shutdown), for few directories, ls command is giving 'Invalid argument' and for few directory ls is giving expected result Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.create a distributed volume having 3 bricks(each on different node) 2.Configure CTDB setup as suggested in 'Gluster_CTDB.pdf' (one change in that, create different 'public address' file for each node and do not put it in shared file system.) 3.create CIFS share from distributed volume(step-1) and make entry in samba config file on each server 4. cifs mount that share from some client(mount it using virtual IP) 5.from client create few directory and few file in those dir in the mounted dir. 6. reboot/shutdown server(having brick for distributed volume) which is serving public IP/virtual IP (use ctdb pnn and ctdb ip command to find that) 7. ctdb will initiate failover and failover is completed from client issue the 'ls' command to view directory content. Actual results: few directories are listing its content from remaining server/node but few directories showd 'invalid argument' though it has some files on remaining servers. Expected results: It should list directory content from the server/brick which are up and running. Additional info: attachment info test_sc - shows few command output for test env. like volume info, ctdb status, ls commnad output for each brick s2__mmnt-samba-DhtTest.log - is log from server 2 s2__mmnt-samba-DhtTest.log - log from server3
Can you please upload the files that you have mentioned above?
Created attachment 584237 [details] log
It looks like on dht_discover is treating a transport end point not connected error as holes. [2012-06-21 10:40:49.937151] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 42608524, GETATTR: args: FH: hashcount 1, exportid 84cca894-383a-41b2-a65c-160e96e9b8fc, gfid fbe6e535-b5c5-4c19-9679-020fd6d38307 [2012-06-21 10:40:49.937257] D [dht-common.c:264ht_discover_cbk] 0-dht-dht: lookup of <gfid:fbe6e535-b5c5-4c19-9679-020fd6d38307> on dht-client-2 returned error (Transport endpoint is not connected) [2012-06-21 10:40:49.937519] I [dht-layout.c:593ht_layout_normalize] 0-dht-dht: found anomalies in <gfid:fbe6e535-b5c5-4c19-9679-020 fd6d38307>. holes=1 overlaps=0 [2012-06-21 10:40:49.937560] D [dht-layout.c:609ht_layout_normalize] (-->/usr/lib64/glusterfs/3.3.0/xlator/protocol/client.so(client 3_1_lookup_cbk+0x6ad) [0x7f0e8c59efed] (-->/usr/lib64/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_discover_cbk+0x39e) [0x7f0e87b7 adde] (-->/usr/lib64/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_discover_complete+0x337) [0x7f0e87b75127]))) 0-dht-dht: path=<gf id:fbe6e535-b5c5-4c19-9679-020fd6d38307> err=Transport endpoint is not connected on subvol=dht-client-2 [2012-06-21 10:40:49.937573] D [dht-common.c:192ht_discover_complete] 0-dht-dht: normalizing failed on <gfid:fbe6e535-b5c5-4c19-9679 -020fd6d38307> (overlaps/holes present: yes, ENOENT errors: 0) [2012-06-21 10:40:49.937583] E [nfs3-helpers.c:3603:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:fbe6e535-b5c5- 4c19-9679-020fd6d38307>: Invalid argument [2012-06-21 10:40:49.937599] E [nfs3.c:753:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.16.159.194:830) dht : fbe6e535- b5c5-4c19-9679-020fd6d38307 [root@nec-em3 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8 getfattr: Removing leading '/' from absolute path names # file: exp/dht/brick1/d8 trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307 trusted.glusterfs.dht=0x00000001000000000000000055555554 [root@dell-pe2900-02 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8 getfattr: Removing leading '/' from absolute path names # file: exp/dht/brick1/d8 trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 [root@ibm-x3620m3-01 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8 getfattr: Removing leading '/' from absolute path names # file: exp/dht/brick1/d8 trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
Can you please try to reproduce the bug with the latest git repo.
(In reply to comment #4) > Can you please try to reproduce the bug with the latest git repo. not able to reproduce with the latest git repo.
Please re-open the bug if you encounter it again.
able to reproduce in 3.3.0rhs-28.el6rhs.x86_64 hence reopening
*** This bug has been marked as a duplicate of bug 856459 ***