Bug 1454689

Summary: "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasad Desala <tdesala>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, pgurusid, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1456582 (view as bug list) Environment:
Last Closed: 2017-09-21 04:45:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1417151, 1456582, 1457616, 1457732, 1460661    
Attachments:
Description Flags
CIFS mount log for comment #5 none

Description Prasad Desala 2017-05-23 10:42:30 UTC
Description of problem:
=======================
"split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf.

Version-Release number of selected component (if applicable):
3.8.4-25.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a distrbuted-replicate volume and start it.
2) Set nl-cache, parallel readdirp, md-cache settings on the volume. (Please see gluster volume distrep info below output for more info)
3) cifs mount on multiple clients.
4) Create a very large data set which contains small files, empty directories, directory with files.
5) Simultaneously issue rm -rf * from multiple clients.

Check samba logs and rm -rf * terminal.

Actual results:
===============
When issued rm -rf * command simultaneously from multiple clients, seeing "split-brain observed [Input/output error]" error messages in samba logs and rm -rf * throws permission denied errors as below,

rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_02/d_005/d_005’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_02/d_005/d_000’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_00/d_002/d_005’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_00/d_002/d_000’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_04/d_001/d_000’: Permission denied
rm: cannot remove ‘file_srcdir/10.70.47.52/thrd_01/d_002/d_000’: Permission denied
rm: cannot remove ‘file_srcdir/10.70.47.15/thrd_05/d_003/d_000’: Permission denied
rm: cannot remove ‘new/file_dstdir/10.70.47.15/thrd_04/d_004/d_008’: Permission denied

Expected results:
=================
No split-brain errors in samba logs and rm -rf * should not throw any errors.

Comment 4 Ravishankar N 2017-05-29 17:06:05 UTC
1. The 'Operation not permitted' errors in server_fstat_cbk and posix_do_readdir were ENOENTS incorrectly being propagated as EPERMs. This is fixed by https://review.gluster.org/#/c/17414/

2. https://review.gluster.org/#/c/17413/ (see commit message for description of why we got EIO in afr) attempts to fix split-brain messages. Additionally, https://review.gluster.org/#/c/16879/ also needs to be taken in.

Having said that, the 'permission denied' errors on the cifs mount still occur with the above fixes and it seems that it is expected behaviour because samba converts any unmapped errors to EACCES. I traced all errors unwound by io-stat xlator (which is the top most one) in the smb process and did not see gluster propagate an EACCES.

I'll share the various errors we send during paralle rm -rf soon and get a confirmation from gfapi/smb folks that this is expected behaviour. FWIW, if I stop the volume and do an ls from the cifs mount, I get EACCES, presumably because samba converts ENOTCONN to EACCES.

Comment 5 Ravishankar N 2017-05-30 05:33:14 UTC
(In reply to Ravishankar N from comment #4)
> I'll share the various errors we send during paralle rm -rf soon and get a
> confirmation from gfapi/smb folks that this is expected behaviour. FWIW, if
> I stop the volume and do an ls from the cifs mount, I get EACCES, presumably
> because samba converts ENOTCONN to EACCES.

Hi Poorinma, as discussed, I checked all stack unwind errors from io-stat xlator on the cifs mount where I got permission denied errors(mount log is attached for reference). I only see ENODATA/ENOENT/ESTALE errors being unwound by io-stats:

#grep -rne stack-trace glusterfs-distrep.10.70.43.238.log |grep io-stats| awk '{print $14,$15.$16,$17,$18}'|sort|uniq
No dataavailable [No data
No suchfile or directory
Stale filehandle [Stale file

Can you confirm that Samba is converting these errors into EACCES?

Comment 6 Ravishankar N 2017-05-30 05:35:02 UTC
Created attachment 1283353 [details]
CIFS mount log for comment #5

CIFS mount log for comment #5

Comment 7 Poornima G 2017-05-30 05:48:30 UTC
ENODATA is NT_STATUS_END_OF_FILE
ESTALE is NT_STATUS_ACCESS_DENIED
ENOENT is NT_STATUS_NO_SUCH_FILE

So ESTALE i think gets converted to permission denied.

Comment 12 Ravishankar N 2017-06-05 06:08:22 UTC
Patches merged.

Comment 14 Ravishankar N 2017-06-19 10:46:23 UTC
There is an fix to the posix patch in comment #11. https://code.engineering.redhat.com/gerrit/#/c/109378/ is the fix.

It has been merged downstream, should be available in the next build. Moving the BZ back to modified.

Comment 16 Prasad Desala 2017-06-28 13:04:19 UTC
Verified this BZ on glusterfs version 3.8.4-31.el7rhgs.x86_64.
Followed the same steps as in the description, now I am not seeing any "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf but however I am still seeing "permission denied" errors on the mountpoint as samba converts ESTALE to permission denied as per Comment 7.

Moving this BZ to Verified.

Comment 18 errata-xmlrpc 2017-09-21 04:45:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 19 errata-xmlrpc 2017-09-21 04:58:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774