Bug 819444 - for few directories, ls command is giving 'Invalid argument' when one of the server(brick, distributed volume) is down
Summary: for few directories, ls command is giving 'Invalid argument' when one of the ...
Keywords:
Status: CLOSED DUPLICATE of bug 856459
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: shishir gowda
QA Contact: amainkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-07 09:45 UTC by Rachana Patel
Modified: 2015-04-20 11:56 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-17 12:24:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
log (6.76 MB, application/x-tar)
2012-05-14 05:38 UTC, Rachana Patel
no flags Details

Description Rachana Patel 2012-05-07 09:45:09 UTC
Description of problem:
In case of multiple bricks on different node, when one node is removed from network(shutdown), for few directories, ls command is giving 'Invalid argument' 

and for few directory ls is giving expected result


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.create a distributed volume having 3 bricks(each on different node)
2.Configure CTDB setup as suggested in 'Gluster_CTDB.pdf' (one change in that, create different 'public address' file for each node and do not put it in shared file system.)
3.create CIFS share from distributed volume(step-1) and make entry in samba config file on each server

4. cifs mount that share from some client(mount it using virtual IP)

5.from client create few directory and few file in those dir in the mounted dir.

6. reboot/shutdown server(having brick for distributed volume) which is serving public IP/virtual IP (use ctdb pnn and ctdb ip command to find that)

7. ctdb will initiate failover and failover is completed from client issue the 'ls' command to view directory content.
  
Actual results:
few directories are listing its content from remaining server/node but few directories showd 'invalid argument' though it has some files on remaining servers.


Expected results:
It should list directory content from the server/brick which are up and running.


Additional info:
attachment  info
test_sc - shows few command output for test env. like volume info, ctdb status, ls commnad output for each brick
s2__mmnt-samba-DhtTest.log - is log from server 2
s2__mmnt-samba-DhtTest.log - log from server3

Comment 1 shishir gowda 2012-05-14 04:50:34 UTC
Can you please upload the files that you have mentioned above?

Comment 2 Rachana Patel 2012-05-14 05:38:03 UTC
Created attachment 584237 [details]
log

Comment 3 shishir gowda 2012-06-22 09:07:37 UTC
It looks like on dht_discover is treating a transport end point not connected error as holes.

[2012-06-21 10:40:49.937151] D [nfs3-helpers.c:1627:nfs3_log_common_call] 0-nfs-nfsv3: XID: 42608524, GETATTR: args: FH: hashcount 1,
exportid 84cca894-383a-41b2-a65c-160e96e9b8fc, gfid fbe6e535-b5c5-4c19-9679-020fd6d38307
[2012-06-21 10:40:49.937257] D [dht-common.c:264ht_discover_cbk] 0-dht-dht: lookup of <gfid:fbe6e535-b5c5-4c19-9679-020fd6d38307> on
 dht-client-2 returned error (Transport endpoint is not connected)
[2012-06-21 10:40:49.937519] I [dht-layout.c:593ht_layout_normalize] 0-dht-dht: found anomalies in <gfid:fbe6e535-b5c5-4c19-9679-020
fd6d38307>. holes=1 overlaps=0
[2012-06-21 10:40:49.937560] D [dht-layout.c:609ht_layout_normalize] (-->/usr/lib64/glusterfs/3.3.0/xlator/protocol/client.so(client
3_1_lookup_cbk+0x6ad) [0x7f0e8c59efed] (-->/usr/lib64/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_discover_cbk+0x39e) [0x7f0e87b7
adde] (-->/usr/lib64/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_discover_complete+0x337) [0x7f0e87b75127]))) 0-dht-dht: path=<gf
id:fbe6e535-b5c5-4c19-9679-020fd6d38307> err=Transport endpoint is not connected on subvol=dht-client-2
[2012-06-21 10:40:49.937573] D [dht-common.c:192ht_discover_complete] 0-dht-dht: normalizing failed on <gfid:fbe6e535-b5c5-4c19-9679
-020fd6d38307> (overlaps/holes present: yes, ENOENT errors: 0)
[2012-06-21 10:40:49.937583] E [nfs3-helpers.c:3603:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:fbe6e535-b5c5-
4c19-9679-020fd6d38307>: Invalid argument
[2012-06-21 10:40:49.937599] E [nfs3.c:753:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.16.159.194:830) dht : fbe6e535-
b5c5-4c19-9679-020fd6d38307

[root@nec-em3 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8
getfattr: Removing leading '/' from absolute path names
# file: exp/dht/brick1/d8
trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307
trusted.glusterfs.dht=0x00000001000000000000000055555554

[root@dell-pe2900-02 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8
getfattr: Removing leading '/' from absolute path names
# file: exp/dht/brick1/d8
trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[root@ibm-x3620m3-01 ~]# getfattr -d -m . -e hex /exp/dht/brick1/d8
getfattr: Removing leading '/' from absolute path names
# file: exp/dht/brick1/d8
trusted.gfid=0xfbe6e535b5c54c199679020fd6d38307
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

Comment 4 shishir gowda 2012-07-11 03:46:31 UTC
Can you please try to reproduce the bug with the latest git repo.

Comment 5 Rachana Patel 2012-08-01 09:23:49 UTC
(In reply to comment #4)
> Can you please try to reproduce the bug with the latest git repo.

not able to reproduce with the latest git repo.

Comment 6 shishir gowda 2012-08-13 13:04:15 UTC
Please re-open the bug if you encounter it again.

Comment 7 Rachana Patel 2012-10-04 09:26:07 UTC
able to reproduce in 3.3.0rhs-28.el6rhs.x86_64 hence reopening

Comment 8 shishir gowda 2012-10-17 12:24:42 UTC

*** This bug has been marked as a duplicate of bug 856459 ***


Note You need to log in before you can comment on or make changes to this bug.