Description of problem: ---------------------- The intent was to scale from 1*2 to 6*2 and then back to 1*2 amidst continuous I/O from FUSE mounts. While add-brick from 3*2 to 4*2,I saw that Bonnie++ errored out on one of my clients : <snip> Changing to the specified mountpoint /gluster-mount/d2/run3638 executing bonnie Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...Bonnie: drastic I/O error (re-write read): Transport endpoint is not connected Can't read a full block, only got 8550 bytes. </snip> I was running Bonnie,finds,dds and kernel untars. sosreports and statedump location will be shared in comments. Version-Release number of selected component (if applicable): -------------------------------------------------------------- glusterfs-3.8.4-11.el7rhgs.x86_64 How reproducible: ----------------- Reporting the first occurrence. Actual results: --------------- Bonnie errors out on application side. Expected results: ----------------- No EIO. Additional info: ---------------- Client and Server OS :RHEL 7.3 *Vol Config* : [root@gqas009 ~]# gluster v status Status of volume: butcher Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks1/A 49152 0 Y 23269 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks1/A 49152 0 Y 23170 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks2/A 49153 0 Y 23466 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks2/A 49153 0 Y 23380 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks3/A 49154 0 Y 24074 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks3/A 49154 0 Y 24472 Brick gqas010.sbu.lab.eng.bos.redhat.com:/b ricks4/A 49155 0 Y 24872 Brick gqas009.sbu.lab.eng.bos.redhat.com:/b ricks4/A 49155 0 Y 25346 Self-heal Daemon on localhost N/A N/A Y 27002 Quota Daemon on localhost N/A N/A Y 27010 Self-heal Daemon on gqas015.sbu.lab.eng.bos .redhat.com N/A N/A Y 25917 Quota Daemon on gqas015.sbu.lab.eng.bos.red hat.com N/A N/A Y 25925 Self-heal Daemon on gqas014.sbu.lab.eng.bos .redhat.com N/A N/A Y 25484 Quota Daemon on gqas014.sbu.lab.eng.bos.red hat.com N/A N/A Y 25492 Self-heal Daemon on gqas010.sbu.lab.eng.bos .redhat.com N/A N/A Y 26554 Quota Daemon on gqas010.sbu.lab.eng.bos.red hat.com N/A N/A Y 26562 Task Status of Volume butcher ------------------------------------------------------------------------------ Task : Rebalance ID : 86df50c3-00fc-409c-aac8-02c64dd5faa5 Status : completed [root@gqas009 ~]#
From client mount logs : [2017-01-12 06:30:50.721491] W [MSGID: 108035] [afr-transaction.c:2221:afr_changelog_fsync_cbk] 6-butcher-replicate-3: fsync(317da8ef-9dc3-41ea-824a-88f9af31066a) failed on subvolume butcher-client-7. Transaction was WRITE [Transport endpoint is not connected]
************** EXACT WORKLOAD ************** Client 1 : dd in loop Client 2 : Bonnie++ Client 3 : tarball untar Client 4: finds and fileop
I scaled out 1*2 to 6*2 and then back to 1*2 on 3.8.4-13 on FUSE. It worked seamlessly Closing it as WFM post disussion with Atin/Ravi,for lack of a reproducer from QE.