Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1225567

Summary:	[geo-rep]: Traceback "ValueError: filedescriptor out of range in select()" observed while creating huge set of data on master
Product:	[Community] GlusterFS	Reporter:	Aravinda VK <avishwan>
Component:	geo-replication	Assignee:	Aravinda VK <avishwan>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.7.0	CC:	avishwan, bugs, chrisw, csaba, nlevinki, rhinduja, rtalur, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.7.7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1225566	Environment:
Last Closed:	2016-02-15 06:26:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1225566
Bug Blocks:	1279240

Description Aravinda VK 2015-05-27 16:47:46 UTC

+++ This bug was initially created as a clone of Bug #1225566 +++

+++ This bug was initially created as a clone of Bug #1224928 +++

Description of problem:
=======================

Was creating huge set of data from the master volume (FUse and NFS) and observed the below traceback:

[2015-05-26 13:54:30.792651] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>: exiting.
[2015-05-26 13:54:30.796233] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-05-26 13:54:30.796593] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-05-26 13:54:31.320] E [syncdutils(/rhs/brick2/b2):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1518, in syncjob
    po = self.sync_engine(pb)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1573, in rsync
    *(gconf.rsync_ssh_options.split() + [self.slaveurl]))
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 55, in sup
    sys._getframe(1).f_code.co_name)(*a, **kw)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 945, in rsync
    po.terminate_geterr(fail_on_err=False)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 259, in terminate_geterr
    if not select([self.stderr], [], [], 0.1)[0]:
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 373, in select
    return eintr_wrap(oselect.select, oselect.error, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 365, in eintr_wrap
    return func(*a)
ValueError: filedescriptor out of range in select()
[2015-05-26 13:54:31.2243] I [syncdutils(/rhs/brick2/b2):220:finalize] <top>: exiting.
[2015-05-26 13:54:31.13803] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-05-26 13:54:31.14297] I [syncdutils(agent):220:finalize] <top>: exiting.


Nothing specific except the data creation was in progress. Before this test, performed the ignore_deletes test as mentioned in bug id 1224906. 



Version-Release number of selected component (if applicable):
=============================================================



Steps Carried:
1. Create master and slave cluster
2. Create and start master,meta,slave volume
3. Mount the volume fuse,nfs on client
4. Create huge set of dat (copy /etc multiple times overs fuse and nfs client)
5. Performed basic test of ignore_deletes option and hit bug id 1224906
6. Set the ignore_deletes to false.
7. Started creating huge set of data from master (copy /etc multiple times over fuse and nfs client)
8. Observed this traceback.

Comment 1 Niels de Vos 2015-06-02 08:20:21 UTC

The required changes to fix this bug have not made it into glusterfs-3.7.1. This bug is now getting tracked for glusterfs-3.7.2.

Comment 2 Niels de Vos 2015-06-20 10:08:39 UTC

Unfortunately glusterfs-3.7.2 did not contain a code change that was associated with this bug report. This bug is now proposed to be a blocker for glusterfs-3.7.3.

Comment 3 Kaushal 2015-07-30 13:17:45 UTC

This bug could not be fixed in time for glusterfs-3.7.3. This is now being tracked for being fixed in glusterfs-3.7.4.

Comment 4 Kaushal 2015-10-28 12:28:32 UTC

This bug could not be fixed in time for glusterfs-3.7.4 or glusterfs-3.7.5. This is now being tracked for being fixed in glusterfs-3.7.6.

Comment 5 Raghavendra Talur 2015-11-08 20:24:52 UTC

This bug could not be fixed in time for glusterfs-3.7.6.
This is now being tracked for being fixed in glusterfs-3.7.7.

Comment 6 Vijay Bellur 2015-11-19 04:41:33 UTC

REVIEW: http://review.gluster.org/12650 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#1) for review on release-3.7 by Aravinda VK (avishwan)

Comment 7 Vijay Bellur 2015-11-21 14:20:52 UTC

REVIEW: http://review.gluster.org/12650 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#2) for review on release-3.7 by Aravinda VK (avishwan)

Comment 8 Vijay Bellur 2015-12-02 07:13:09 UTC

COMMIT: http://review.gluster.org/12650 committed in release-3.7 by Venky Shankar (vshankar) 
------
commit 58539176e0152fdb09f093d0cdd1cfc7840a5a4f
Author: Aravinda VK <avishwan>
Date:   Sun Oct 11 20:26:16 2015 +0530

    geo-rep: Fix FD leak from Active Geo-rep worker
    
    Active worker tries to acquire lock in each iteration. On every successfull
    lock acqusition it was not closing previously opened lock fd.
    
    To see the leak, get the PID of worker,
        ps -ax | grep feedback-fd
        watch ls /proc/$pid/fd
    
    BUG: 1225567
    Change-Id: Ic476c24c306e7ab372c5560fbb80ef39f4fb31af
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/12332
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Milind Changire <mchangir>
    Reviewed-by: Saravanakumar Arumugam <sarumuga>
    Reviewed-by: Venky Shankar <vshankar>
     (cherry picked from commit 42def948ac8e5d24278cb000cc8f8906b83a8592)
    Reviewed-on: http://review.gluster.org/12650
    Tested-by: Gluster Build System <jenkins.com>

Comment 9 Kaushal 2016-04-19 07:46:47 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report.

glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user