Description of problem: git operations fail when add-brick operation is done. This is due to data being looked up even in the newly added bricks when it is not supposed to do that. Version-Release number of selected component (if applicable): glusterfs-3.6.0.28-1.el6rhs.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a distribute volume and mount a fuse volume. 2 Create a git repo on the mount point. 3. On the git repo, add a file and commit it. for i in `seq 1 10000`; do dd if=/dev/urandom of=ddfile_$i bs=1M count=1; git add ddfile_$i; git commit -m "ddfile"; done 4. Do an add-brick operation over the volume 5. When add-brick is done, git operations see failures. Actual results: git operations fail with the following error message: 1048576 bytes (1.0 MB) copied, 0.20487 s, 5.1 MB/s fatal: Unable to read current working directory: No such file or directory fatal: Unable to read current working directory: No such file or directory 1+0 records in 1+0 records out Expected results: There should be no errors. Additional info:
Created attachment 935306 [details] Client log
On Windows client (SAMBA) Add brick is resulting in to permission denied error (I/O error ). Error on Client: cp: cannot create regular file ‘971.docx’: Permission denied cp: cannot create regular file ‘974.docx’: Permission denied cp: cannot create regular file ‘975.docx’: Permission denied cp: cannot create regular file ‘977.docx’: Permission denied cp: cannot create regular file ‘978.docx’: Permission denied cp: cannot create regular file ‘979.docx’: Permission denied cp: cannot create regular file ‘983.docx’: Permission denied cp: cannot create regular file ‘984.docx’: Permission denied cp: cannot create regular file ‘985.docx’: Permission denied cp: cannot create regular file ‘986.docx’: Permission denied cp: cannot create regular file ‘987.docx’: Permission denied cp: cannot create regular file ‘989.docx’: Permission denied cp: cannot create regular file ‘991.docx’: Permission denied cp: cannot create regular file ‘993.docx’: Permission denied From /var/log/samba/glusterfs-testvol.10.70.35.48.log: [2014-09-08 12:31:34.958229] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 2-testvol-client-2: remote operation failed: Stale file handle. Path: /rhsdata01/ms (b34da6ae-b39f-431d-a111-06be74d44caa) [2014-09-08 12:31:35.010739] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 2-testvol-client-2: remote operation failed: Stale file handle. Path: /rhsdata01/ms (b34da6ae-b39f-431d-a111-06be74d44caa) [2014-09-08 12:31:36.014934] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 2-testvol-client-2: remote operation failed: Stale file handle. Path: /rhsdata01/ms (b34da6ae-b39f-431d-a111-06be74d44caa) [2014-09-08 12:31:37.063339] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 2-testvol-client-2: remote operation failed: Stale file handle. Path: /rhsdata01/ms (b34da6ae-b39f-431d-a111-06be74d44caa)
Here is the issue, 1. create a 6x2 volume, start it 2. enable quota 3. set quota limit of 800GB on "/" 4. mount the volume over nfs 5. touch <mount-point>/a 6. mkdir <mount-point>/dir 7. start iozone -a 8. add-brick [root@nfs1 ~]# gluster volume add-brick vol0 10.70.37.156:/bricks/d1r1-add 10.70.37.81:/bricks/d1r2-add volume add-brick: success 9. rebalance start force, [root@nfs1 ~]# gluster volume rebalance vol0 start force volume rebalance: vol0: success: Starting rebalance on volume vol0 has been successful. ID: 2adeb93f-54bb-42ad-97df-9bade1e1c847 10. check iozone execution on mount-point. Result it fails, 4096 64 1194472 1818012 288408 283895 3340563 2154808 50890 5306022 187451 2254337 2708986 240291 2329851 4096 128 1897111 2985308 405790 5867831 7642951 3263773 84846 4399673 222428 2213100 2639075 297523 5020702 4096 256 1110972 2711124 298542 2594046 3182161 1667656 106100 3680502 594916 1023232 1231813 270307 4971306 4096 512 1577320 2522443 268591 5032468 5894001 3136268 205025 3771811 357922 2031724 2925823 311078 4487004 4096 1024 1231195 2609016 184329 4779081 6013663 3330202 200962 4639689 386196 1628607 2222548 333934 5418140 4096 2048 1990298 1758645 349370 5127086 4545172 2212815 296942 4297324 219165 1565819 3512686 383626 3650781 4096 4096 2031724 2972395 309063 5013377 7816828 3657777 345533 3806915 315147 2542602 2059243 298563 3893185 8192 4 Error writing block 1844, fd= 3 write: No such file or directory iozone: interrupted exiting iozone real 1m2.843s user 0m0.551s sys 0m13.098s [root@nfs1 ~]# gluster volume rebalance vol0 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 1 0Bytes 3 0 0 completed 0.00 10.70.37.81 0 0Bytes 2 0 0 completed 0.00 10.70.37.50 0 0Bytes 2 0 0 completed 0.00 10.70.37.95 0 0Bytes 2 0 0 completed 1.00 volume rebalance: vol0: success:
Please review and sign-off edited doc text.
RCA: ==== After add-brick named-lookup was not sent on the directory in question and it was not created in new-brick. However, opendir is wound to that node and hence it is failing with ENOENT/ESTALE. Fix: ==== Two solutions are possible: 1. Send named-lookup during resolution of that inode in new-graph. 2. Make sure opendir is sent on only those bricks which were part of parent directory's layout in the previous graph. However, the tricky thing is how to find out parent of the inode on which opendir is done since during opendir we only get gfid of the directory (without pargfid). One solution is to read the symbolic link of the gfid from gfid backend and extract pargfid from it, but it seems like a messy hack.
A duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=1278399 Fixed by: https://code.engineering.redhat.com/gerrit/#/c/61036/2
Tested with build glusterfs-server-3.7.5-6, created 2x2 volume and mounted it on client using fuse and created 200 deep directories and cd to the leaf directory (../dir199/dir200) and then added two new bricks to the volume and while re-balance is going on, from the client able to run ls and mkdir but it was hang and no IO error is poping up for the hang created 1283990 and marking this bug verified
Hi Susant, The doc text is edited. Do signoff on the same if it looks OK.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html