Bug 1238116

Summary: Gluster-swift object server leaks fds in failure cases (when exceptions are raised)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prashanth Pai <ppai>
Component: gluster-swiftAssignee: Prashanth Pai <ppai>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: asrivast, divya, nlevinki, rhs-bugs, thiago, vagarwal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: swiftonfile-1.13.1-3 Doc Type: Bug Fix
Doc Text:
Previously, when there was an error and exception was raised, in some cases, open file descriptors were not closed. This resulted in file descriptors being leaked. With this fix, exceptions raised are caught first, file descriptor are closed and then the original exception caught is re-raised.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-05 07:17:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1251815    

Description Prashanth Pai 2015-07-01 08:58:13 UTC
Description of problem:
The object-server process leaks file descriptors (does not close the fd) in certain cases (negative path - the ones that causes DiskFile.open() to raise some exception). Examples of such case:
1. Deleting an expired object.
2. GET on an expired object.

Other cases where there is a race will also raise an exception, resulting fd being leaked. Example:
1. Client-1 does a GET on AUTH_test/c1/o1
   Internally this results in open(), stat() and getxattr() in that order.
2. If Client-2 does a DELETE on AUTH_test/c1/o1 after say, open() in the example above has completed, stat() and getxattr() will raise exception, thus leaving the fd open.

Version-Release number of selected component (if applicable):
RHS 3.1u4
gluster-swift version 1.13.1

Steps to reproduce and see the fd leak:

Watch the object server process for open fds used by it (in another terminal):

watch -n 0.5 ls -l /proc/<pid>/fd
strace -ff -s 80 -e open,close -p <pid>

While the above monitors have been set up, invoke curl command equivalent of the following:

1. PUT an object with `X-Delete-After` set to say 3 seconds
2. Wait for 3 seconds for the object to be expired
3. Do a GET on the expired object to get a `404`.

On the watch that you've set up, you can see that this fd is never closed.

Actual results:
file descriptors are left open and not closed.

Expected results:
file descriptors to be properly closed.

Additional info:
****************
The way Python's with statement works:

with <expression> as X:
    ....

Now, when the "with" statement is executed, Python evaluates the <expression>, calls the __enter__ method on the resulting value (which is called a "context guard"), and assigns whatever __enter__ returns to the variable given by as. Python will then execute the code body, and no matter what happens in that code, call the guard object’s __exit__ method.

The problem
The with statement makes sure __exit__() is called if any exception is raised by code within the with block. BUT, if the <expression> itself raises exception, both __enter__() and __exit__() is never called!
Refer: https://stackoverflow.com/questions/30909463/what-happens-to-exceptions-raised-in-a-with-statement-expression

In gluster-swift...

with diskfile.open():
    ....

In cases where diskfile.open() raises an exception such as DiskFileExpired, __enter__() and __exit()__ is never called. So fd is not closed.

Why does this not affect vanilla/plain Swift ?
I'm guessing (I could be wrong): We use the raw fd directly where as Swift's diskfile uses the File object which when garbage collected, closes the fd.

Comment 5 Prashanth Pai 2015-08-18 06:51:14 UTC
Upstream fix: https://review.openstack.org/#/c/211866/

Comment 6 Prashanth Pai 2015-08-19 04:38:44 UTC
Fix has been merged upstream.

Comment 7 SATHEESARAN 2015-09-02 16:45:37 UTC
Tested with glusterfs-3.7.1-13.el7rhgs and swiftonfile-1.13.1-4.el7rhgs.noarch with the steps as mentioned in comment0

There are no fd leaks, and open() + close() was observed when issuing GET on the expired object

Comment 8 Prashanth Pai 2015-09-28 09:07:03 UTC
Doc text with some minor corrections:

"Previously, when there was an error and exception was raised, in some cases, open file descriptors were not closed. This resulted in file descriptors being leaked. With this fix, exceptions raised are caught first, file descriptor are closed and then the original exception caught is re-raised."

Comment 9 Prashanth Pai 2015-09-28 09:33:29 UTC
Thanks Divya. Looks good to me.

Comment 11 errata-xmlrpc 2015-10-05 07:17:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html