Bug 1899386 - IO's failing with Input/output error on rebalance
Summary: IO's failing with Input/output error on rebalance
Keywords:
Status: CLOSED DUPLICATE of bug 1937314
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Barak Sason Rofman
QA Contact: Pranav Prakash
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-19 04:11 UTC by Upasana
Modified: 2021-05-26 09:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-26 09:01:14 UTC
Embargoed:


Attachments (Terms of Use)

Description Upasana 2020-11-19 04:11:48 UTC
Provide version-Release number of selected component (if applicable):
====================================================================
glusterfs-server-6.0-48.el7rhgs.x86_64
 
Have you searched the Bugzilla archives for same/similar issues reported
=========================================================================
Yes, https://bugzilla.redhat.com/show_bug.cgi?id=1848532 looks similar but want to confirm

Describe the issue:
===================
After starting rebalance , while rebalance is in progress IO's fail with Input/Output error

Is this issue reproducible? If yes, share more details.:
=======================================================
yes 2/2

Steps to Reproduce:
===================
1.Create a arbiter volume and mount the volume via glusterfs
2.Start IO's (tested with -mkdir test1; cd test1; for i in `seq 1 1000` ;do dd if=/dev/urandom of=file$i bs=10MB count=1 oflag=append conv=notrunc;done)
3.While IO is in progress add-brick and start rebalance
4.Rebalance completes successfully but see Input/Output errors on the mountpoint

Actual results:
==============
IO's fail
dd: failed to open ‘file181’: Input/output error
dd: failed to open ‘file182’: Input/output error
dd: failed to open ‘file183’: Input/output error
dd: failed to open ‘file184’: Input/output error
dd: failed to open ‘file185’: Input/output error
dd: failed to open ‘file186’: Input/output error
 
Expected results:
=================
IO's should pass

 
Additional info:
===============
I was actually hitting the issue on a nfs-ganesha mountpoint first and then confirmed that it is hit on glusterfs mountpoint also 

Hence is not protocol specific

[root@dhcp35-100 ~]# gluster v status arbiter-vol
Status of volume: arbiter-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/vol1                           49161     0          Y       16173
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/vol2                           49162     0          Y       15039
Brick 10.70.35.31:/gluster/brick1/vol3      49157     0          Y       29695
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/vol4                           49154     0          Y       30604
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/vol5                           49154     0          Y       29730
Brick 10.70.35.31:/gluster/brick1/vol6      49158     0          Y       14512
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/vol7                           49155     0          Y       32646
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/vol8                           49155     0          Y       31759
Brick 10.70.35.31:/gluster/brick1/vol9      49159     0          Y       16513
Self-heal Daemon on localhost               N/A       N/A        Y       931  
Self-heal Daemon on 10.70.35.104            N/A       N/A        Y       31789
Self-heal Daemon on dhcp35-202.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       32677
Self-heal Daemon on 10.70.35.31             N/A       N/A        Y       16619
 
Task Status of Volume arbiter-vol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 2466186d-c128-4414-ae4d-3db1e028da46
Status               : completed           
 
[root@dhcp35-100 ~]# gluster v info arbiter-vol
 
Volume Name: arbiter-vol
Type: Distributed-Replicate
Volume ID: 58427682-9e72-4057-9aff-76ab05b2be20
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/vol1
Brick2: dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/vol2
Brick3: 10.70.35.31:/gluster/brick1/vol3 (arbiter)
Brick4: dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/vol4
Brick5: dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/vol5
Brick6: 10.70.35.31:/gluster/brick1/vol6 (arbiter)
Brick7: dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/vol7
Brick8: dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/vol8
Brick9: 10.70.35.31:/gluster/brick1/vol9 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.cache-invalidation: on
ganesha.enable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp35-100 ~]#

Comment 3 Csaba Henk 2020-11-19 09:18:11 UTC
Hi Barak, can you please take a look at it?


Note You need to log in before you can comment on or make changes to this bug.