Bug 1758432

Summary: Rebalance causing IO Error - File descriptor in bad state
Product: Red Hat Gluster Storage Reporter: Upasana <ubansal>
Component: distributeAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED ERRATA QA Contact: Upasana <ubansal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.5CC: amukherj, moagrawa, nbalacha, rhs-bugs, rkothiya, sheggodu, storage-qa-internal, vdas
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0-18 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1758579 1761692 1761907 (view as bug list) Environment:
Last Closed: 2019-10-30 12:23:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1761692, 1696809, 1758579, 1761907, 1761910, 1806996    
Attachments:
Description Flags
cmvlt script
none
script logs
none
drill logs from client
none
drill logs from client none

Description Upasana 2019-10-04 06:54:55 UTC
Created attachment 1622497 [details]
cmvlt script

Description of problem:
======================
On adding bricks and starting reabalance on a disperse volume , cmvlt script fails with IO Error


Version-Release number of selected component (if applicable):
==============================================================

[root@dhcp35-146 ~]# rpm -qa|grep gluster
glusterfs-libs-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
glusterfs-cli-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.1.x86_64
vdsm-gluster-4.30.18-1.0.el7rhgs.x86_64
glusterfs-api-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
glusterfs-fuse-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
glusterfs-geo-replication-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
glusterfs-events-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
glusterfs-rdma-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
glusterfs-client-xlators-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
python2-gluster-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
glusterfs-server-6.0-15.2.git02dd9a3ad.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
[root@dhcp35-146 ~]# 

Test build provided for bug https://bugzilla.redhat.com/show_bug.cgi?id=1756325 and https://bugzilla.redhat.com/show_bug.cgi?id=1744881


How reproducible:
==================
2/2

Steps to Reproduce:
===================
1.Mounted a disperse volume
2.Started the cmvlt script from 2 clients (one from /mnt/EC and another from /mnt/EC/dir1)
3.Added brick and started rebalance 
4.Rebalance has completed 

IO's failed on one client - 10.70.41.186 with the below error and then resumed also 

Thread [6] starting
Thread [3] starting
Thread [3], Iteration [0] starting
Thread [6], Iteration [0] starting
Thread [3], Exception: [Errno 77] File descriptor in bad state
Thread [3], Traceback: Traceback (most recent call last):
  File "cmvlt.py", line 443, in run
    Thread.run (self)
  File "/usr/lib64/python2.7/threading.py", line 765, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cmvlt.py", line 405, in main
    if not sf.verify ():
  File "cmvlt.py", line 361, in verify
    bytes = f.read (7)
IOError: [Errno 77] File descriptor in bad state

close failed in file object destructor:
IOError: [Errno 77] File descriptor in bad state
Thread [5], Iteration [0] completed
Thread [5], Iteration [1] starting
Thread [1], Iteration [0] completed
Thread [1], Iteration [1] starting



Actual results:
================
IO error observed

Expected results:
=================
IO Error should not be seen



Additional info:
===================

[root@dhcp35-129 ~]# gluster v status vol_-2-11 
Status of volume: vol_-2-11
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.45:/gluster/brick2/vol_-2-11 49154     0          Y       27052
Brick 10.70.35.227:/gluster/brick2/vol_-2-1
1                                           49154     0          Y       8721 
Brick 10.70.35.146:/gluster/brick2/vol_-2-1
1                                           49154     0          Y       24948
Brick 10.70.35.129:/gluster/brick2/vol_-2-1
1                                           49154     0          Y       18662
Brick 10.70.35.111:/gluster/brick2/vol_-2-1
1                                           49154     0          Y       16863
Brick 10.70.35.232:/gluster/brick2/vol_-2-1
1                                           49154     0          Y       17553
Brick 10.70.35.45:/gluster/brick2/addnew1   49154     0          Y       27052
Brick 10.70.35.227:/gluster/brick2/addnew2  49154     0          Y       8721 
Brick 10.70.35.146:/gluster/brick2/addnew3  49154     0          Y       24948
Brick 10.70.35.129:/gluster/brick2/addnew4  49154     0          Y       18662
Brick 10.70.35.111:/gluster/brick2/addnew5  49154     0          Y       16863
Brick 10.70.35.232:/gluster/brick2/addnew6  49154     0          Y       17553
Self-heal Daemon on localhost               N/A       N/A        Y       23604
Self-heal Daemon on 10.70.35.146            N/A       N/A        Y       29837
Self-heal Daemon on 10.70.35.227            N/A       N/A        Y       16248
Self-heal Daemon on 10.70.35.111            N/A       N/A        Y       24529
Self-heal Daemon on 10.70.35.45             N/A       N/A        Y       1484 
Self-heal Daemon on 10.70.35.232            N/A       N/A        Y       23001
 
Task Status of Volume vol_-2-11
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 84cd8bbd-8aab-4f2b-aba5-d8dc54d76e7f
Status               : completed           
 
[root@dhcp35-129 ~]# 
[root@dhcp35-129 ~]# 



Client where the IO's have failed 10.70.41.186 (root-redhat)


Client logs -
[2019-10-03 10:11:44.028541] W [MSGID: 114061] [client-common.c:2871:client_pre_fstat_v2] 2-vol_-2-11-client-6:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.028629] W [MSGID: 114061] [client-common.c:2871:client_pre_fstat_v2] 2-vol_-2-11-client-7:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.028656] W [MSGID: 114061] [client-common.c:2871:client_pre_fstat_v2] 2-vol_-2-11-client-8:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.028681] W [MSGID: 114061] [client-common.c:2871:client_pre_fstat_v2] 2-vol_-2-11-client-9:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.028711] W [MSGID: 114061] [client-common.c:2871:client_pre_fstat_v2] 2-vol_-2-11-client-10:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.028779] W [MSGID: 114061] [client-common.c:2871:client_pre_fstat_v2] 2-vol_-2-11-client-11:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.028823] W [fuse-bridge.c:1269:fuse_attr_cbk] 0-glusterfs-fuse: 70782: FSTAT() ERR => -1 (File descriptor in bad state)
[2019-10-03 10:11:44.033748] W [MSGID: 122033] [ec-common.c:1914:ec_locked] 2-vol_-2-11-disperse-0: Failed to complete preop lock [Stale file handle]
[2019-10-03 10:11:44.038882] I [MSGID: 114024] [client-helpers.c:96:this_fd_set_ctx] 2-vol_-2-11-client-6: /DrillHoleTest.log (1f697779-794a-4f9e-b4de-56310b23b9e3): trying duplicate remote fd set. 
[2019-10-03 10:11:44.039147] I [MSGID: 114024] [client-helpers.c:96:this_fd_set_ctx] 2-vol_-2-11-client-7: /DrillHoleTest.log (1f697779-794a-4f9e-b4de-56310b23b9e3): trying duplicate remote fd set. 
[2019-10-03 10:11:44.039200] I [MSGID: 114024] [client-helpers.c:96:this_fd_set_ctx] 2-vol_-2-11-client-9: /DrillHoleTest.log (1f697779-794a-4f9e-b4de-56310b23b9e3): trying duplicate remote fd set. 
[2019-10-03 10:11:44.039245] I [MSGID: 114024] [client-helpers.c:96:this_fd_set_ctx] 2-vol_-2-11-client-8: /DrillHoleTest.log (1f697779-794a-4f9e-b4de-56310b23b9e3): trying duplicate remote fd set. 
[2019-10-03 10:11:44.039277] I [MSGID: 114024] [client-helpers.c:96:this_fd_set_ctx] 2-vol_-2-11-client-10: /DrillHoleTest.log (1f697779-794a-4f9e-b4de-56310b23b9e3): trying duplicate remote fd set. 
[2019-10-03 10:11:44.039315] I [MSGID: 114024] [client-helpers.c:96:this_fd_set_ctx] 2-vol_-2-11-client-11: /DrillHoleTest.log (1f697779-794a-4f9e-b4de-56310b23b9e3): trying duplicate remote fd set. 
[2019-10-03 10:11:44.375281] W [MSGID: 114061] [client-common.c:2625:client_pre_flush_v2] 2-vol_-2-11-client-6:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.375331] W [MSGID: 114061] [client-common.c:2625:client_pre_flush_v2] 2-vol_-2-11-client-7:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.375359] W [MSGID: 114061] [client-common.c:2625:client_pre_flush_v2] 2-vol_-2-11-client-8:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.375384] W [MSGID: 114061] [client-common.c:2625:client_pre_flush_v2] 2-vol_-2-11-client-9:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.375407] W [MSGID: 114061] [client-common.c:2625:client_pre_flush_v2] 2-vol_-2-11-client-10:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.375431] W [MSGID: 114061] [client-common.c:2625:client_pre_flush_v2] 2-vol_-2-11-client-11:  (5a4e775b-afc4-4872-8269-10f7f9c9cf5f) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-10-03 10:11:44.375471] W [fuse-bridge.c:1826:fuse_err_cbk] 0-glusterfs-fuse: 70811: FLUSH() ERR => -1 (File descriptor in bad state)




Server details - 10.70.35.45(root-redhat)
Client details - 10.70.41.186 & 10.70.43.113

Comment 14 Upasana 2019-10-10 07:53:02 UTC
Created attachment 1624187 [details]
script logs

Comment 22 Upasana 2019-10-11 10:12:19 UTC
Created attachment 1624678 [details]
drill logs from client

Comment 23 Upasana 2019-10-11 10:12:53 UTC
Created attachment 1624679 [details]
drill logs from client

Comment 31 errata-xmlrpc 2019-10-30 12:23:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249