Bug 1238116 - Gluster-swift object server leaks fds in failure cases (when exceptions are raised)
Summary: Gluster-swift object server leaks fds in failure cases (when exceptions are r...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-swift
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.1.1
Assignee: Prashanth Pai
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1251815
TreeView+ depends on / blocked
 
Reported: 2015-07-01 08:58 UTC by Prashanth Pai
Modified: 2019-10-10 09:54 UTC (History)
6 users (show)

Fixed In Version: swiftonfile-1.13.1-3
Doc Type: Bug Fix
Doc Text:
Previously, when there was an error and exception was raised, in some cases, open file descriptors were not closed. This resulted in file descriptors being leaked. With this fix, exceptions raised are caught first, file descriptor are closed and then the original exception caught is re-raised.
Clone Of:
Environment:
Last Closed: 2015-10-05 07:17:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1845 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.1 update 2015-10-05 11:06:22 UTC

Description Prashanth Pai 2015-07-01 08:58:13 UTC
Description of problem:
The object-server process leaks file descriptors (does not close the fd) in certain cases (negative path - the ones that causes DiskFile.open() to raise some exception). Examples of such case:
1. Deleting an expired object.
2. GET on an expired object.

Other cases where there is a race will also raise an exception, resulting fd being leaked. Example:
1. Client-1 does a GET on AUTH_test/c1/o1
   Internally this results in open(), stat() and getxattr() in that order.
2. If Client-2 does a DELETE on AUTH_test/c1/o1 after say, open() in the example above has completed, stat() and getxattr() will raise exception, thus leaving the fd open.

Version-Release number of selected component (if applicable):
RHS 3.1u4
gluster-swift version 1.13.1

Steps to reproduce and see the fd leak:

Watch the object server process for open fds used by it (in another terminal):

watch -n 0.5 ls -l /proc/<pid>/fd
strace -ff -s 80 -e open,close -p <pid>

While the above monitors have been set up, invoke curl command equivalent of the following:

1. PUT an object with `X-Delete-After` set to say 3 seconds
2. Wait for 3 seconds for the object to be expired
3. Do a GET on the expired object to get a `404`.

On the watch that you've set up, you can see that this fd is never closed.

Actual results:
file descriptors are left open and not closed.

Expected results:
file descriptors to be properly closed.

Additional info:
****************
The way Python's with statement works:

with <expression> as X:
    ....

Now, when the "with" statement is executed, Python evaluates the <expression>, calls the __enter__ method on the resulting value (which is called a "context guard"), and assigns whatever __enter__ returns to the variable given by as. Python will then execute the code body, and no matter what happens in that code, call the guard object’s __exit__ method.

The problem
The with statement makes sure __exit__() is called if any exception is raised by code within the with block. BUT, if the <expression> itself raises exception, both __enter__() and __exit__() is never called!
Refer: https://stackoverflow.com/questions/30909463/what-happens-to-exceptions-raised-in-a-with-statement-expression

In gluster-swift...

with diskfile.open():
    ....

In cases where diskfile.open() raises an exception such as DiskFileExpired, __enter__() and __exit()__ is never called. So fd is not closed.

Why does this not affect vanilla/plain Swift ?
I'm guessing (I could be wrong): We use the raw fd directly where as Swift's diskfile uses the File object which when garbage collected, closes the fd.

Comment 5 Prashanth Pai 2015-08-18 06:51:14 UTC
Upstream fix: https://review.openstack.org/#/c/211866/

Comment 6 Prashanth Pai 2015-08-19 04:38:44 UTC
Fix has been merged upstream.

Comment 7 SATHEESARAN 2015-09-02 16:45:37 UTC
Tested with glusterfs-3.7.1-13.el7rhgs and swiftonfile-1.13.1-4.el7rhgs.noarch with the steps as mentioned in comment0

There are no fd leaks, and open() + close() was observed when issuing GET on the expired object

Comment 8 Prashanth Pai 2015-09-28 09:07:03 UTC
Doc text with some minor corrections:

"Previously, when there was an error and exception was raised, in some cases, open file descriptors were not closed. This resulted in file descriptors being leaked. With this fix, exceptions raised are caught first, file descriptor are closed and then the original exception caught is re-raised."

Comment 9 Prashanth Pai 2015-09-28 09:33:29 UTC
Thanks Divya. Looks good to me.

Comment 11 errata-xmlrpc 2015-10-05 07:17:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html


Note You need to log in before you can comment on or make changes to this bug.