Bug 1454689 - "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf
Summary: "split-brain observed [Input/output error]" error messages in samba logs duri...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHGS 3.3.0
Assignee: Ravishankar N
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks: 1417151 1456582 1457616 1457732 1460661
TreeView+ depends on / blocked
 
Reported: 2017-05-23 10:42 UTC by Prasad Desala
Modified: 2017-09-21 04:58 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1456582 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:45:37 UTC
Embargoed:


Attachments (Terms of Use)
CIFS mount log for comment #5 (773.01 KB, application/x-gzip)
2017-05-30 05:35 UTC, Ravishankar N
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Prasad Desala 2017-05-23 10:42:30 UTC
Description of problem:
=======================
"split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf.

Version-Release number of selected component (if applicable):
3.8.4-25.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a distrbuted-replicate volume and start it.
2) Set nl-cache, parallel readdirp, md-cache settings on the volume. (Please see gluster volume distrep info below output for more info)
3) cifs mount on multiple clients.
4) Create a very large data set which contains small files, empty directories, directory with files.
5) Simultaneously issue rm -rf * from multiple clients.

Check samba logs and rm -rf * terminal.

Actual results:
===============
When issued rm -rf * command simultaneously from multiple clients, seeing "split-brain observed [Input/output error]" error messages in samba logs and rm -rf * throws permission denied errors as below,

rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_02/d_005/d_005’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_02/d_005/d_000’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_00/d_002/d_005’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_00/d_002/d_000’: Permission denied
rm: cannot remove ‘file_dstdir/10.70.47.52/thrd_04/d_001/d_000’: Permission denied
rm: cannot remove ‘file_srcdir/10.70.47.52/thrd_01/d_002/d_000’: Permission denied
rm: cannot remove ‘file_srcdir/10.70.47.15/thrd_05/d_003/d_000’: Permission denied
rm: cannot remove ‘new/file_dstdir/10.70.47.15/thrd_04/d_004/d_008’: Permission denied

Expected results:
=================
No split-brain errors in samba logs and rm -rf * should not throw any errors.

Comment 4 Ravishankar N 2017-05-29 17:06:05 UTC
1. The 'Operation not permitted' errors in server_fstat_cbk and posix_do_readdir were ENOENTS incorrectly being propagated as EPERMs. This is fixed by https://review.gluster.org/#/c/17414/

2. https://review.gluster.org/#/c/17413/ (see commit message for description of why we got EIO in afr) attempts to fix split-brain messages. Additionally, https://review.gluster.org/#/c/16879/ also needs to be taken in.

Having said that, the 'permission denied' errors on the cifs mount still occur with the above fixes and it seems that it is expected behaviour because samba converts any unmapped errors to EACCES. I traced all errors unwound by io-stat xlator (which is the top most one) in the smb process and did not see gluster propagate an EACCES.

I'll share the various errors we send during paralle rm -rf soon and get a confirmation from gfapi/smb folks that this is expected behaviour. FWIW, if I stop the volume and do an ls from the cifs mount, I get EACCES, presumably because samba converts ENOTCONN to EACCES.

Comment 5 Ravishankar N 2017-05-30 05:33:14 UTC
(In reply to Ravishankar N from comment #4)
> I'll share the various errors we send during paralle rm -rf soon and get a
> confirmation from gfapi/smb folks that this is expected behaviour. FWIW, if
> I stop the volume and do an ls from the cifs mount, I get EACCES, presumably
> because samba converts ENOTCONN to EACCES.

Hi Poorinma, as discussed, I checked all stack unwind errors from io-stat xlator on the cifs mount where I got permission denied errors(mount log is attached for reference). I only see ENODATA/ENOENT/ESTALE errors being unwound by io-stats:

#grep -rne stack-trace glusterfs-distrep.10.70.43.238.log |grep io-stats| awk '{print $14,$15.$16,$17,$18}'|sort|uniq
No dataavailable [No data
No suchfile or directory
Stale filehandle [Stale file

Can you confirm that Samba is converting these errors into EACCES?

Comment 6 Ravishankar N 2017-05-30 05:35:02 UTC
Created attachment 1283353 [details]
CIFS mount log for comment #5

CIFS mount log for comment #5

Comment 7 Poornima G 2017-05-30 05:48:30 UTC
ENODATA is NT_STATUS_END_OF_FILE
ESTALE is NT_STATUS_ACCESS_DENIED
ENOENT is NT_STATUS_NO_SUCH_FILE

So ESTALE i think gets converted to permission denied.

Comment 12 Ravishankar N 2017-06-05 06:08:22 UTC
Patches merged.

Comment 14 Ravishankar N 2017-06-19 10:46:23 UTC
There is an fix to the posix patch in comment #11. https://code.engineering.redhat.com/gerrit/#/c/109378/ is the fix.

It has been merged downstream, should be available in the next build. Moving the BZ back to modified.

Comment 16 Prasad Desala 2017-06-28 13:04:19 UTC
Verified this BZ on glusterfs version 3.8.4-31.el7rhgs.x86_64.
Followed the same steps as in the description, now I am not seeing any "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf but however I am still seeing "permission denied" errors on the mountpoint as samba converts ESTALE to permission denied as per Comment 7.

Moving this BZ to Verified.

Comment 18 errata-xmlrpc 2017-09-21 04:45:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 19 errata-xmlrpc 2017-09-21 04:58:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.