Bug 1028673

Summary: Add zerofill FOP to offload creating zeroed (VM) files
Product: [Community] GlusterFS Reporter: M. Mohan Kumar <mohan>
Component: coreAssignee: M. Mohan Kumar <mohan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, mohan, ndevos, sasundar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.5.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-17 11:50:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Program to test Zerofill FOP
none
Program to test Zerofill FOP
none
Zerofill tcpdump none

Description M. Mohan Kumar 2013-11-09 11:24:38 UTC
Description of problem:

Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image
provisioning or  during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls  and acknowledgements.
    
WRITESAME is a  SCSI T10 command that takes a block of data as input and
writes the same data to other blocks and this write is handled
completely within the storage and hence is known as offload . Linux ,now
has support for SCSI WRITESAME command which is exposed to the user in
the form of BLKZEROOUT ioctl.  BD Xlator can exploit BLKZEROOUT ioctl to
implement this fop. Thus zeroing out operations can be completely
offloaded to the storage device , making it highly efficient.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2013-11-09 11:29:15 UTC
REVIEW: http://review.gluster.org/5327 (glusterfs: zerofill support) posted (#3) for review on master by M. Mohan Kumar (mohan.com)

Comment 2 Niels de Vos 2013-11-09 12:01:30 UTC
Hi Mohan,

I'd like to add support for ZEROFILL to Wireshark, and therefore I need to have a network capture to verify the dissection and upload that for the Wireshark regression testing.

Could you please capture a tcpdump, similar to this:

    # tcpdump -s 0 -w /tmp/zerofill.pcap -i any tcp and port <port-of-brick> &
    # ./offloaded aakash-test log 20
    # killall tcpdump
    # gzip -9 /tmp/zerofill.pcap

Please attach the /tmp/zerofill.pcap.gz file to this bug.

Or, attach the source for the 'offloaded' binary, so that I can test it easily myself :)

Thanks!

Comment 3 M. Mohan Kumar 2013-11-09 12:20:32 UTC
Created attachment 821865 [details]
Program to test Zerofill FOP

Comment 4 Anand Avati 2013-11-09 12:22:03 UTC
REVIEW: http://review.gluster.org/5327 (glusterfs: zerofill support) posted (#4) for review on master by M. Mohan Kumar (mohan.com)

Comment 5 M. Mohan Kumar 2013-11-09 12:42:11 UTC
Created attachment 821876 [details]
Program to test Zerofill FOP

Comment 6 M. Mohan Kumar 2013-11-09 12:49:28 UTC
Created attachment 821877 [details]
Zerofill tcpdump

Comment 7 Anand Avati 2013-11-11 05:25:56 UTC
COMMIT: http://review.gluster.org/5327 committed in master by Vijay Bellur (vbellur) 
------
commit c8fef37c5d566c906728b5f6f27baaa9a8d2a20d
Author: M. Mohan Kumar <mohan.com>
Date:   Sat Nov 9 14:51:53 2013 +0530

    glusterfs: zerofill support
    
    Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
    the specified range. This fop will be useful when a whole file needs to
    be initialized with zero (could be useful for zero filled VM disk image
    provisioning or  during scrubbing of VM disk images).
    
    Client/application can issue this FOP for zeroing out. Gluster server
    will zero out required range of bytes ie server offloaded zeroing. In
    the absence of this fop,  client/application has to repetitively issue
    write (zero) fop to the server, which is very inefficient method because
    of the overheads involved in RPC calls  and acknowledgements.
    
    WRITESAME is a  SCSI T10 command that takes a block of data as input and
    writes the same data to other blocks and this write is handled
    completely within the storage and hence is known as offload . Linux ,now
    has support for SCSI WRITESAME command which is exposed to the user in
    the form of BLKZEROOUT ioctl.  BD Xlator can exploit BLKZEROOUT ioctl to
    implement this fop. Thus zeroing out operations can be completely
    offloaded to the storage device , making it highly efficient.
    
    The fop takes two arguments offset and size. It zeroes out 'size' number
    of bytes in an opened file starting from 'offset' position.
    
    This patch adds zerofill support to the following areas:
    	- libglusterfs
    	- io-stats
    	- performance/md-cache,open-behind
    	- quota
    	- cluster/afr,dht,stripe
    	- rpc/xdr
    	- protocol/client,server
    	- io-threads
    	- marker
    	- storage/posix
    	- libgfapi
    
    Client applications can exloit this fop by using glfs_zerofill introduced in
    libgfapi.FUSE support to this fop has not been added as there is no system call
    for this fop.
    
    Changes from previous version 3:
    * Removed redundant memory failure log messages
    
    Changes from previous version 2:
    * Rebased and fixed build error
    
    Changes from previous version 1:
    * Rebased for latest master
    
    TODO :
         * Add zerofill support to trace xlator
         * Expose zerofill capability as part of gluster volume info
    
    Here is a performance comparison of server offloaded zeofill vs zeroing
    out using repeated writes.
    
    [root@llmvm02 remote]# time ./offloaded aakash-test log 20
    
    real	3m34.155s
    user	0m0.018s
    sys	0m0.040s
    [root@llmvm02 remote]# time ./manually aakash-test log 20
    
    real	4m23.043s
    user	0m2.197s
    sys	0m14.457s
    [root@llmvm02 remote]# time ./offloaded aakash-test log 25;
    
    real	4m28.363s
    user	0m0.021s
    sys	0m0.025s
    [root@llmvm02 remote]# time ./manually aakash-test log 25
    
    real	5m34.278s
    user	0m2.957s
    sys	0m18.808s
    
    The argument log is a file which we want to set for logging purpose and
    the third argument is size in GB .
    
    As we can see there is a performance improvement of around 20% with this
    fop.
    
    Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
    BUG: 1028673
    Signed-off-by: Aakash Lal Das <aakash.ibm.com>
    Signed-off-by: M. Mohan Kumar <mohan.com>
    Reviewed-on: http://review.gluster.org/5327
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 Anand Avati 2013-11-13 05:29:31 UTC
REVIEW: http://review.gluster.org/6255 (zerofill: Update API version version 6 adds zerofill FOP) posted (#1) for review on master by M. Mohan Kumar (mohan.com)

Comment 9 Anand Avati 2013-11-13 07:26:49 UTC
COMMIT: http://review.gluster.org/6255 committed in master by Anand Avati (avati) 
------
commit 3a4bd6ddc551179a1785c3535e477ce5867da68d
Author: M. Mohan Kumar <mohan.com>
Date:   Wed Nov 13 10:45:40 2013 +0530

    zerofill: Update API version
    version 6 adds zerofill FOP
    
    BUG: 1028673
    Change-Id: I27cfc48cd6f7f0f6daf94e1c9cfbe420a0d090af
    Signed-off-by: M. Mohan Kumar <mohan.com>
    Reviewed-on: http://review.gluster.org/6255
    Reviewed-by: Bharata B Rao <bharata.rao>
    Tested-by: Bharata B Rao <bharata.rao>
    Reviewed-by: Anand Avati <avati>

Comment 10 Niels de Vos 2014-04-17 11:50:29 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user