Bug 1299308

Summary: disable filestore_xfs_extsize by default
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Ken Dreyer (Red Hat) <kdreyer>
Component: RADOSAssignee: Ken Dreyer (Red Hat) <kdreyer>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.3.1CC: ceph-eng-bugs, dzafman, kchai, mlawrenc, tganguly, vakulkar
Target Milestone: rc   
Target Release: 1.3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-0.94.3-6.el7cp, Ubuntu: ceph_0.94.3.3-2redhat1trusty Doc Type: Bug Fix
Doc Text:
Cause: Ceph's filestore_xfs_extsize setting reduces OSD fragmentation at the expense of large sequential write performance. This option is enabled by default in RHCS 1.3. As a consequence, Ceph's large sequential write performance is degraded in comparison to RHCS 1.2. The filestore_xfs_extsize setting is now disabled by default in RHCS 1.3, and Ceph's large sequential write performance is improved.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-08 21:29:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Dreyer (Red Hat) 2016-01-18 03:53:00 UTC
Description of problem:
filestore_xfs_extsize defaults to "true" in Hammer. This option is designed to reduce fragmentation on the OSDs.

Later tests have found that disabling filestore_xfs_extsize in upstream hammer improves large sequential write performance by about 20% (Ben Turner's cluster), and in some other tests by lesser amounts. This brings us closer to the large sequential write performance of Firefly.

Version-Release number of selected component (if applicable):
ceph-0.94.5-1 and earlier ceph-0.94.z versions

Steps to Reproduce:
1. Leave filestore_xfs_extsize unset (currently defaults to "true"),
2. Run large sequential writes tests via CBT, note performance numbers,
3. Set filestore_xfs_extsize to "false",
4. Re-run large sequential writes in CBT, note performance numbers.

Actual results:
The default filestore_xfs_extsize setting results in a large sequential write performance degradation.

Expected results:
The default filestore_xfs_extsize setting should not result in a large sequential write performance degradation.

The proposed fix is to set filestore_xfs_extsize back to "false" in src/common/config_opts.h.


Additional info:
This change comes with a trade-off, because it introduces fragmentation on the OSDs. To address this, we should introduce documentation which explains the fragmentation cost and suggest that customers who have a large sequential read use-cases (object storage/CDN) toggle the value to reduce the fragmentation impact over time.

Comment 1 Ken Dreyer (Red Hat) 2016-01-18 15:31:55 UTC
Upstream PR @ https://github.com/ceph/ceph/pull/7265 - Mark, could you please review it?

Comment 2 Ken Dreyer (Red Hat) 2016-01-18 19:43:09 UTC
PR was merged upstream; need to cherry-pick to ceph-1.3.1-rhel-patches in Gerrit.

Comment 5 Ken Dreyer (Red Hat) 2016-01-19 15:46:50 UTC
Ubuntu build with this patch is ceph_0.94.3.3-1redhat1trusty

Comment 7 Ken Dreyer (Red Hat) 2016-01-19 19:28:52 UTC
(In reply to Ken Dreyer (Red Hat) from comment #5)
> Ubuntu build with this patch is ceph_0.94.3.3-1redhat1trusty

I had to bump the version number, so it's ceph_0.94.3.3-2redhat1trusty

Comment 8 Tanay Ganguly 2016-02-02 06:41:14 UTC
Marking this Bug as Verified as this was tested part of 1.3.1 Async Release.

Confirmed this Fix was also part of 1.3.2 code base.

ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config get filestore_xfs_extsize

{   
    "filestore_xfs_extsize": "false"
}

Comment 10 errata-xmlrpc 2016-02-08 21:29:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0133