Bug 1225567 - [geo-rep]: Traceback "ValueError: filedescriptor out of range in select()" observed while creating huge set of data on master
Summary: [geo-rep]: Traceback "ValueError: filedescriptor out of range in select()" o...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.7.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard:
Depends On: 1225566
Blocks: glusterfs-3.7.7
TreeView+ depends on / blocked
 
Reported: 2015-05-27 16:47 UTC by Aravinda VK
Modified: 2016-04-19 07:46 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.7.7
Doc Type: Bug Fix
Doc Text:
Clone Of: 1225566
Environment:
Last Closed: 2016-02-15 06:26:03 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Aravinda VK 2015-05-27 16:47:46 UTC
+++ This bug was initially created as a clone of Bug #1225566 +++

+++ This bug was initially created as a clone of Bug #1224928 +++

Description of problem:
=======================

Was creating huge set of data from the master volume (FUse and NFS) and observed the below traceback:

[2015-05-26 13:54:30.792651] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>: exiting.
[2015-05-26 13:54:30.796233] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-05-26 13:54:30.796593] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-05-26 13:54:31.320] E [syncdutils(/rhs/brick2/b2):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1518, in syncjob
    po = self.sync_engine(pb)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1573, in rsync
    *(gconf.rsync_ssh_options.split() + [self.slaveurl]))
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 55, in sup
    sys._getframe(1).f_code.co_name)(*a, **kw)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 945, in rsync
    po.terminate_geterr(fail_on_err=False)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 259, in terminate_geterr
    if not select([self.stderr], [], [], 0.1)[0]:
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 373, in select
    return eintr_wrap(oselect.select, oselect.error, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 365, in eintr_wrap
    return func(*a)
ValueError: filedescriptor out of range in select()
[2015-05-26 13:54:31.2243] I [syncdutils(/rhs/brick2/b2):220:finalize] <top>: exiting.
[2015-05-26 13:54:31.13803] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-05-26 13:54:31.14297] I [syncdutils(agent):220:finalize] <top>: exiting.


Nothing specific except the data creation was in progress. Before this test, performed the ignore_deletes test as mentioned in bug id 1224906. 



Version-Release number of selected component (if applicable):
=============================================================



Steps Carried:
1. Create master and slave cluster
2. Create and start master,meta,slave volume
3. Mount the volume fuse,nfs on client
4. Create huge set of dat (copy /etc multiple times overs fuse and nfs client)
5. Performed basic test of ignore_deletes option and hit bug id 1224906
6. Set the ignore_deletes to false.
7. Started creating huge set of data from master (copy /etc multiple times over fuse and nfs client)
8. Observed this traceback.

Comment 1 Niels de Vos 2015-06-02 08:20:21 UTC
The required changes to fix this bug have not made it into glusterfs-3.7.1. This bug is now getting tracked for glusterfs-3.7.2.

Comment 2 Niels de Vos 2015-06-20 10:08:39 UTC
Unfortunately glusterfs-3.7.2 did not contain a code change that was associated with this bug report. This bug is now proposed to be a blocker for glusterfs-3.7.3.

Comment 3 Kaushal 2015-07-30 13:17:45 UTC
This bug could not be fixed in time for glusterfs-3.7.3. This is now being tracked for being fixed in glusterfs-3.7.4.

Comment 4 Kaushal 2015-10-28 12:28:32 UTC
This bug could not be fixed in time for glusterfs-3.7.4 or glusterfs-3.7.5. This is now being tracked for being fixed in glusterfs-3.7.6.

Comment 5 Raghavendra Talur 2015-11-08 20:24:52 UTC
This bug could not be fixed in time for glusterfs-3.7.6.
This is now being tracked for being fixed in glusterfs-3.7.7.

Comment 6 Vijay Bellur 2015-11-19 04:41:33 UTC
REVIEW: http://review.gluster.org/12650 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#1) for review on release-3.7 by Aravinda VK (avishwan@redhat.com)

Comment 7 Vijay Bellur 2015-11-21 14:20:52 UTC
REVIEW: http://review.gluster.org/12650 (geo-rep: Fix FD leak from Active Geo-rep worker) posted (#2) for review on release-3.7 by Aravinda VK (avishwan@redhat.com)

Comment 8 Vijay Bellur 2015-12-02 07:13:09 UTC
COMMIT: http://review.gluster.org/12650 committed in release-3.7 by Venky Shankar (vshankar@redhat.com) 
------
commit 58539176e0152fdb09f093d0cdd1cfc7840a5a4f
Author: Aravinda VK <avishwan@redhat.com>
Date:   Sun Oct 11 20:26:16 2015 +0530

    geo-rep: Fix FD leak from Active Geo-rep worker
    
    Active worker tries to acquire lock in each iteration. On every successfull
    lock acqusition it was not closing previously opened lock fd.
    
    To see the leak, get the PID of worker,
        ps -ax | grep feedback-fd
        watch ls /proc/$pid/fd
    
    BUG: 1225567
    Change-Id: Ic476c24c306e7ab372c5560fbb80ef39f4fb31af
    Signed-off-by: Aravinda VK <avishwan@redhat.com>
    Reviewed-on: http://review.gluster.org/12332
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Milind Changire <mchangir@redhat.com>
    Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
    Reviewed-by: Venky Shankar <vshankar@redhat.com>
     (cherry picked from commit 42def948ac8e5d24278cb000cc8f8906b83a8592)
    Reviewed-on: http://review.gluster.org/12650
    Tested-by: Gluster Build System <jenkins@build.gluster.com>

Comment 9 Kaushal 2016-04-19 07:46:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report.

glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.