Bug 1412566 - [Scale] : I/O errors out with ENOTCONN during rebalance
Summary: [Scale] : I/O errors out with ENOTCONN during rebalance
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Ambarish
Depends On:
TreeView+ depends on / blocked
Reported: 2017-01-12 10:15 UTC by Ambarish
Modified: 2017-03-28 06:51 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2017-01-31 06:25:56 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Ambarish 2017-01-12 10:15:23 UTC
Description of problem:

The intent was to scale from 1*2 to 6*2 and then back to 1*2 amidst continuous I/O from FUSE mounts.

While add-brick from 3*2 to 4*2,I saw that Bonnie++ errored out on one of my clients :


Changing to the specified mountpoint
executing bonnie
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...Bonnie: drastic I/O error (re-write read): Transport endpoint is not connected
Can't read a full block, only got 8550 bytes.


I was running Bonnie,finds,dds and kernel untars.

sosreports and statedump location will be shared in comments.

Version-Release number of selected component (if applicable):


How reproducible:

Reporting the first occurrence.

Actual results:

Bonnie errors out on application side.

Expected results:


Additional info:

Client and Server OS :RHEL 7.3

*Vol Config* :

[root@gqas009 ~]# gluster v status
Status of volume: butcher
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks1/A                                    49152     0          Y       23269
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks1/A                                    49152     0          Y       23170
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks2/A                                    49153     0          Y       23466
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks2/A                                    49153     0          Y       23380
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks3/A                                    49154     0          Y       24074
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks3/A                                    49154     0          Y       24472
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks4/A                                    49155     0          Y       24872
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks4/A                                    49155     0          Y       25346
Self-heal Daemon on localhost               N/A       N/A        Y       27002
Quota Daemon on localhost                   N/A       N/A        Y       27010
Self-heal Daemon on gqas015.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       25917
Quota Daemon on gqas015.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       25925
Self-heal Daemon on gqas014.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       25484
Quota Daemon on gqas014.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       25492
Self-heal Daemon on gqas010.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       26554
Quota Daemon on gqas010.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       26562
Task Status of Volume butcher
Task                 : Rebalance           
ID                   : 86df50c3-00fc-409c-aac8-02c64dd5faa5
Status               : completed           
[root@gqas009 ~]#

Comment 2 Ambarish 2017-01-12 10:22:50 UTC
From client mount logs :

[2017-01-12 06:30:50.721491] W [MSGID: 108035] [afr-transaction.c:2221:afr_changelog_fsync_cbk] 6-butcher-replicate-3: fsync(317da8ef-9dc3-41ea-824a-88f9af31066a) failed on subvolume butcher-client-7. Transaction was WRITE [Transport endpoint is not connected]

Comment 5 Ambarish 2017-01-12 10:37:41 UTC

Client 1 : dd in loop 

Client 2 : Bonnie++

Client 3 : tarball untar

Client 4:  finds and fileop

Comment 15 Ambarish 2017-01-31 06:25:56 UTC
I scaled out 1*2 to 6*2 and then back to 1*2 on 3.8.4-13 on FUSE.

It worked seamlessly

Closing it as WFM post disussion with Atin/Ravi,for lack of a reproducer from QE.

Note You need to log in before you can comment on or make changes to this bug.