Bug 1225566

Summary: [geo-rep]: Traceback "ValueError: filedescriptor out of range in select()" observed while creating huge set of data on master
Product: [Community] GlusterFS Reporter: Aravinda VK <avishwan>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: avishwan, bugs, chrisw, csaba, nlevinki, rhinduja, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1224928
: 1225567 (view as bug list) Environment:
Last Closed: 2016-06-16 13:05:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1223636, 1225567    

Description Aravinda VK 2015-05-27 16:47:00 UTC
+++ This bug was initially created as a clone of Bug #1224928 +++

Description of problem:
=======================

Was creating huge set of data from the master volume (FUse and NFS) and observed the below traceback:

[2015-05-26 13:54:30.792651] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>: exiting.
[2015-05-26 13:54:30.796233] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-05-26 13:54:30.796593] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-05-26 13:54:31.320] E [syncdutils(/rhs/brick2/b2):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1518, in syncjob
    po = self.sync_engine(pb)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1573, in rsync
    *(gconf.rsync_ssh_options.split() + [self.slaveurl]))
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 55, in sup
    sys._getframe(1).f_code.co_name)(*a, **kw)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 945, in rsync
    po.terminate_geterr(fail_on_err=False)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 259, in terminate_geterr
    if not select([self.stderr], [], [], 0.1)[0]:
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 373, in select
    return eintr_wrap(oselect.select, oselect.error, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 365, in eintr_wrap
    return func(*a)
ValueError: filedescriptor out of range in select()
[2015-05-26 13:54:31.2243] I [syncdutils(/rhs/brick2/b2):220:finalize] <top>: exiting.
[2015-05-26 13:54:31.13803] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-05-26 13:54:31.14297] I [syncdutils(agent):220:finalize] <top>: exiting.


Nothing specific except the data creation was in progress. Before this test, performed the ignore_deletes test as mentioned in bug id 1224906. 



Version-Release number of selected component (if applicable):
=============================================================



Steps Carried:
1. Create master and slave cluster
2. Create and start master,meta,slave volume
3. Mount the volume fuse,nfs on client
4. Create huge set of dat (copy /etc multiple times overs fuse and nfs client)
5. Performed basic test of ignore_deletes option and hit bug id 1224906
6. Set the ignore_deletes to false.
7. Started creating huge set of data from master (copy /etc multiple times over fuse and nfs client)
8. Observed this traceback.

Comment 1 Vijay Bellur 2015-10-11 15:27:39 UTC
REVIEW: http://review.gluster.org/12332 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 2 Vijay Bellur 2015-10-14 06:21:35 UTC
REVIEW: http://review.gluster.org/12320 (geo-rep: Handle fd leak during rsync/tarssh) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 3 Vijay Bellur 2015-11-02 11:45:57 UTC
REVIEW: http://review.gluster.org/12332 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 4 Vijay Bellur 2015-11-03 09:03:27 UTC
REVIEW: http://review.gluster.org/12332 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#3) for review on master by Venky Shankar (vshankar)

Comment 5 Vijay Bellur 2015-11-16 14:37:26 UTC
REVIEW: http://review.gluster.org/12332 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#9) for review on master by Aravinda VK (avishwan)

Comment 6 Vijay Bellur 2015-11-17 05:41:53 UTC
REVIEW: http://review.gluster.org/12332 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#10) for review on master by Aravinda VK (avishwan)

Comment 7 Vijay Bellur 2015-11-19 04:03:49 UTC
COMMIT: http://review.gluster.org/12332 committed in master by Venky Shankar (vshankar) 
------
commit 42def948ac8e5d24278cb000cc8f8906b83a8592
Author: Aravinda VK <avishwan>
Date:   Sun Oct 11 20:26:16 2015 +0530

    geo-rep: Fix FD leak from Active Geo-rep worker
    
    Active worker tries to acquire lock in each iteration. On every successfull
    lock acqusition it was not closing previously opened lock fd.
    
    To see the leak, get the PID of worker,
        ps -ax | grep feedback-fd
        watch ls /proc/$pid/fd
    
    BUG: 1225566
    Change-Id: Ic476c24c306e7ab372c5560fbb80ef39f4fb31af
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/12332
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Milind Changire <mchangir>
    Reviewed-by: Saravanakumar Arumugam <sarumuga>
    Reviewed-by: Venky Shankar <vshankar>

Comment 8 Niels de Vos 2016-06-16 13:05:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user