Bug 1924129

Summary: [RFE] write-same operation should efficiently allocate zeroed extents
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Jason Dillaman <jdillama>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: NEW --- QA Contact: Manohar Murthy <mmurthy>
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: akupczyk, bhubbard, ceph-eng-bugs, danken, idryomov, ndevos, nojha, pdhiran, rzarzyns, sseshasa, vereddy, vumrao
Target Milestone: ---Keywords: FutureFeature, Performance
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason Dillaman 2021-02-02 16:42:49 UTC
Description of problem:
RBD utilizes the RADOS write-same operation to thick-provision RBD images by transferring a small zeroed buffer with the op with the write-same length of the maximum RBD object size (default 4MiB). 

There is a desire to optimize the Ceph cluster IO impact for the thick-provisioned case by having BlueStore treat a write-same of zeroes as a request to allocate the specified amount of space but avoid the need to actually zero the space (i.e. track that the extent is in-use but flag it as being zeroed/uninitialized). 

In the future, CephFS could also add support for utilizing write-same for its "fallocate" handler (seems to only support punch-hole right now).

Version-Release number of selected component (if applicable):
5.0

Comment 1 RHEL Program Management 2021-02-02 16:42:56 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 7 Laura Flores 2021-10-25 19:30:16 UTC
> There is a desire to optimize the Ceph cluster IO impact for the thick-provisioned case by having BlueStore treat a write-same of zeroes as a request to allocate the specified amount of space but avoid the need to actually zero the space (i.e. track that the extent is in-use but flag it as being zeroed/uninitialized). 

I am working on a solution for this, where I avoid writing bufferlists in BlueStore that contain zeroes. See this PR, which is still a work in progress, for more details: https://github.com/ceph/ceph/pull/43337