Description of problem: ======================= "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf. Version-Release number of selected component (if applicable): 3.8.4-25.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: =================== 1) Create a distrbuted-replicate volume and start it. 2) Set nl-cache, parallel readdirp, md-cache settings on the volume. (Please see gluster volume distrep info below output for more info) 3) cifs mount on multiple clients. 4) Create a very large data set which contains small files, empty directories, directory with files. 5) Simultaneously issue rm -rf * from multiple clients. Check samba logs and rm -rf * terminal. Actual results: =============== When issued rm -rf * command simultaneously from multiple clients, seeing "split-brain observed [Input/output error]" error messages in samba logs and rm -rf * throws permission denied errors as below, rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_02/d_005/d_005’: Permission denied rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_02/d_005/d_000’: Permission denied rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_00/d_002/d_005’: Permission denied rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_00/d_002/d_000’: Permission denied rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_04/d_001/d_000’: Permission denied rm: cannot remove ‘file_srcdir/10.70.47.52/thrd_01/d_002/d_000’: Permission denied rm: cannot remove ‘file_srcdir/10.70.47.15/thrd_05/d_003/d_000’: Permission denied rm: cannot remove ‘new/file_dstdir/10.70.47.15/thrd_04/d_004/d_008’: Permission denied Expected results: ================= No split-brain errors in samba logs and rm -rf * should not throw any errors.
1. The 'Operation not permitted' errors in server_fstat_cbk and posix_do_readdir were ENOENTS incorrectly being propagated as EPERMs. This is fixed by https://review.gluster.org/#/c/17414/ 2. https://review.gluster.org/#/c/17413/ (see commit message for description of why we got EIO in afr) attempts to fix split-brain messages. Additionally, https://review.gluster.org/#/c/16879/ also needs to be taken in. Having said that, the 'permission denied' errors on the cifs mount still occur with the above fixes and it seems that it is expected behaviour because samba converts any unmapped errors to EACCES. I traced all errors unwound by io-stat xlator (which is the top most one) in the smb process and did not see gluster propagate an EACCES. I'll share the various errors we send during paralle rm -rf soon and get a confirmation from gfapi/smb folks that this is expected behaviour. FWIW, if I stop the volume and do an ls from the cifs mount, I get EACCES, presumably because samba converts ENOTCONN to EACCES.
(In reply to Ravishankar N from comment #4) > I'll share the various errors we send during paralle rm -rf soon and get a > confirmation from gfapi/smb folks that this is expected behaviour. FWIW, if > I stop the volume and do an ls from the cifs mount, I get EACCES, presumably > because samba converts ENOTCONN to EACCES. Hi Poorinma, as discussed, I checked all stack unwind errors from io-stat xlator on the cifs mount where I got permission denied errors(mount log is attached for reference). I only see ENODATA/ENOENT/ESTALE errors being unwound by io-stats: #grep -rne stack-trace glusterfs-distrep.10.70.43.238.log |grep io-stats| awk '{print $14,$15.$16,$17,$18}'|sort|uniq No dataavailable [No data No suchfile or directory Stale filehandle [Stale file Can you confirm that Samba is converting these errors into EACCES?
Created attachment 1283353 [details] CIFS mount log for comment #5 CIFS mount log for comment #5
ENODATA is NT_STATUS_END_OF_FILE ESTALE is NT_STATUS_ACCESS_DENIED ENOENT is NT_STATUS_NO_SUCH_FILE So ESTALE i think gets converted to permission denied.
Corresponding downstream patches for the ones mentioned in comment #4: https://code.engineering.redhat.com/gerrit/#/c/108107/1 https://code.engineering.redhat.com/gerrit/#/c/108106 https://code.engineering.redhat.com/gerrit/#/c/108105
Patches merged.
There is an fix to the posix patch in comment #11. https://code.engineering.redhat.com/gerrit/#/c/109378/ is the fix. It has been merged downstream, should be available in the next build. Moving the BZ back to modified.
Verified this BZ on glusterfs version 3.8.4-31.el7rhgs.x86_64. Followed the same steps as in the description, now I am not seeing any "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf but however I am still seeing "permission denied" errors on the mountpoint as samba converts ESTALE to permission denied as per Comment 7. Moving this BZ to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774