Bug 819444 - for few directories, ls command is giving 'Invalid argument' when one of the server(brick, distributed volume) is down
for few directories, ls command is giving 'Invalid argument' when one of the ...
Status: CLOSED DUPLICATE of bug 856459
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
pre-release
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: shishir gowda
amainkar
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-07 05:45 EDT by Rachana Patel
Modified: 2015-04-20 07:56 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-17 08:24:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log (6.76 MB, application/x-tar)
2012-05-14 01:38 EDT, Rachana Patel
no flags Details

  None (edit)
Description Rachana Patel 2012-05-07 05:45:09 EDT
Description of problem:
In case of multiple bricks on different node, when one node is removed from network(shutdown), for few directories, ls command is giving 'Invalid argument' 

and for few directory ls is giving expected result


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.create a distributed volume having 3 bricks(each on different node)
2.Configure CTDB setup as suggested in 'Gluster_CTDB.pdf' (one change in that, create different 'public address' file for each node and do not put it in shared file system.)
3.create CIFS share from distributed volume(step-1) and make entry in samba config file on each server

4. cifs mount that share from some client(mount it using virtual IP)

5.from client create few directory and few file in those dir in the mounted dir.

6. reboot/shutdown server(having brick for distributed volume) which is serving public IP/virtual IP (use ctdb pnn and ctdb ip command to find that)

7. ctdb will initiate failover and failover is completed from client issue the 'ls' command to view directory content.
  
Actual results:
few directories are listing its content from remaining server/node but few directories showd 'invalid argument' though it has some files on remaining servers.


Expected results:
It should list directory content from the server/brick which are up and running.


Additional info:
attachment  info
test_sc - shows few command output for test env. like volume info, ctdb status, ls commnad output for each brick
s2__mmnt-samba-DhtTest.log - is log from server 2
s2__mmnt-samba-DhtTest.log - log from server3
Comment 1 shishir gowda 2012-05-14 00:50:34 EDT
Can you please upload the files that you have mentioned above?
Comment 2 Rachana Patel 2012-05-14 01:38:03 EDT
Created attachment 584237 [details]
log
Comment 3 shishir gowda 2012-06-22 05:07:37 EDT
It looks like on dht_discover is treating a transport end point not connected error as holes.

[2012-06-21 10:40:49.937151] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 42608524, GETATTR: args: FH: hashcount 1,
exportid 84cca894-383a-41b2-a65c-160e96e9b8fc, gfid fbe6e535-b5c5-4c19-9679-020fd6d38307
[2012-06-21 10:40:49.937257] D [dht-common.c:264ht_discover_cbk] 0-dht-dht: lookup of <gfid:fbe6e535-b5c5-4c19-9679-020fd6d38307> on
 dht-client-2 returned error (Transport endpoint is not connected)
[2012-06-21 10:40:49.937519] I [dht-layout.c:593ht_layout_normalize] 0-dht-dht: found anomalies in <gfid:fbe6e535-b5c5-4c19-9679-020
fd6d38307>. holes=1 overlaps=0
[2012-06-21 10:40:49.937560] D [dht-layout.c:609ht_layout_normalize] (-->/usr/lib64/glusterfs/3.3.0/xlator/protocol/client.so(client
3_1_lookup_cbk+0x6ad) [0x7f0e8c59efed] (-->/usr/lib64/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_discover_cbk+0x39e) [0x7f0e87b7
adde] (-->/usr/lib64/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_discover_complete+0x337) [0x7f0e87b75127]))) 0-dht-dht: path=<gf
id:fbe6e535-b5c5-4c19-9679-020fd6d38307> err=Transport endpoint is not connected on subvol=dht-client-2
[2012-06-21 10:40:49.937573] D [dht-common.c:192ht_discover_complete] 0-dht-dht: normalizing failed on <gfid:fbe6e535-b5c5-4c19-9679
-020fd6d38307> (overlaps/holes present: yes, ENOENT errors: 0)
[2012-06-21 10:40:49.937583] E [nfs3-helpers.c:3603:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:fbe6e535-b5c5-
4c19-9679-020fd6d38307>: Invalid argument
[2012-06-21 10:40:49.937599] E [nfs3.c:753:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.16.159.194:830) dht : fbe6e535-
b5c5-4c19-9679-020fd6d38307

[root@nec-em3 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8
getfattr: Removing leading '/' from absolute path names
# file: exp/dht/brick1/d8
trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307
trusted.glusterfs.dht=0x00000001000000000000000055555554

[root@dell-pe2900-02 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8
getfattr: Removing leading '/' from absolute path names
# file: exp/dht/brick1/d8
trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[root@ibm-x3620m3-01 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8
getfattr: Removing leading '/' from absolute path names
# file: exp/dht/brick1/d8
trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
Comment 4 shishir gowda 2012-07-10 23:46:31 EDT
Can you please try to reproduce the bug with the latest git repo.
Comment 5 Rachana Patel 2012-08-01 05:23:49 EDT
(In reply to comment #4)
> Can you please try to reproduce the bug with the latest git repo.

not able to reproduce with the latest git repo.
Comment 6 shishir gowda 2012-08-13 09:04:15 EDT
Please re-open the bug if you encounter it again.
Comment 7 Rachana Patel 2012-10-04 05:26:07 EDT
able to reproduce in 3.3.0rhs-28.el6rhs.x86_64 hence reopening
Comment 8 shishir gowda 2012-10-17 08:24:42 EDT

*** This bug has been marked as a duplicate of bug 856459 ***

Note You need to log in before you can comment on or make changes to this bug.