Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1246694

Summary: Log::reopen_log_file() must take the flusher lock to avoid closing an fd ::_flush() is still using
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Samuel Just <sjust>
Component: RADOSAssignee: Samuel Just <sjust>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.3.0CC: ceph-eng-bugs, dzafman, flucifre, kchai, kdreyer, tmuthami, vumrao
Target Milestone: rc   
Target Release: 1.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-0.94.1-16.el7cp Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1247752 (view as bug list) Environment:
Last Closed: 2015-07-31 12:54:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Samuel Just 2015-07-24 22:53:38 UTC
Description of problem:

Log::reopen_log_file() does not take a lock, and _flush() might continue writing to m_fd after it has been closed.  This could result in lost log entries, or it could result in the log being written to an fd which has been opened by the filestore.  That latter case could cause data corruption.

Version-Release number of selected component (if applicable):

firefly,hammer,current

How reproducible:

Very hard to reproduce.  I'll probably have to create some special code to reproduce the conditions.  There is one case which may have been caused by this, and it required another bug which caused a massive amount of logging to happen in a tight loop.  I'll get back to you on this part.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Samuel Just 2015-07-24 23:07:18 UTC
I have re-targeted this at 1.3.1 as it is a simple, low risk fix and a potentially serious data corruption risk (though admittedly the bug has been around since before firefly and we've seen it either never or once).

Comment 3 Ken Dreyer (Red Hat) 2015-07-28 15:00:07 UTC
Fine with me. When https://github.com/ceph/ceph/pull/5348 gets merged to master, I'll take that as the sign that we should cherry-pick it downstream.

Comment 7 Ken Dreyer (Red Hat) 2015-07-29 20:13:38 UTC
For non-RHEL (Ubuntu), the fix will be in Ceph v0.94.1.5.

Comment 8 Tamil 2015-07-31 00:45:49 UTC
works fine!

Comment 10 errata-xmlrpc 2015-07-31 12:54:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1527