RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 626561 - GFS2: [RFE] fallocate support for GFS2
Summary: GFS2: [RFE] fallocate support for GFS2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 455572
Blocks: 659123 707091
TreeView+ depends on / blocked
 
Reported: 2010-08-23 20:20 UTC by Ben Marzinski
Modified: 2011-05-24 02:21 UTC (History)
5 users (show)

Fixed In Version: kernel-2.6.32-83.el6
Doc Type: Enhancement
Doc Text:
Clone Of: 455572
Environment:
Last Closed: 2011-05-23 20:50:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch posted to rhkernel. (10.14 KB, patch)
2010-10-14 17:43 UTC, Ben Marzinski
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Ben Marzinski 2010-08-23 20:20:28 UTC
+++ This bug was initially created as a clone of Bug #455572 +++

We should support fallocate even though its just as quick to do streaming writes
since the FALLOC_FL_KEEP_SIZE flag has slightly different behaviour to extending
via streaming writes.

--- Additional comment from swhiteho on 2010-05-14 05:34:27 EDT ---

We need to check that fsck.gfs2 can cope with files in which the size is less than the number of allocated blocks.

--- Additional comment from rpeterso on 2010-05-21 09:17:12 EDT ---

I checked the fsck.gfs2 code and didn't see any places where
it cared whether the di_size matches the di_blocks count.
Just to be a little more secure in the knowledge, I created
a 1MB file in gfs2, unmounted it and patched the di_size
to 0x100 bytes.  So the size was 0x100, but the allocated
blocks di_blocks was 0x102.  The fsck.gfs2 ran just fine with
no complaints about this situation.

--- Additional comment from bmarzins on 2010-07-20 02:33:40 EDT ---

This is still a work-in-progress, but this patch works (at least on some quick single-machine tests).  However, it's not quite right.  Instead of rounding the requested allocation to the nearest fs-block, it rounds it to the nearest page.

--- Additional comment from bmarzins on 2010-07-22 18:19:11 EDT ---

So this definitely doesn't work as is.  For starters, GFS2 isn't able to handle
an arbitrarily large allocation during one transaction.

--- Additional comment from swhiteho on 2010-07-30 07:52:06 EDT ---

Yes, thats true. You need to be especially careful of this with journaled data files. You should be able to create large enough transactions that the performance is still substantially better than the single page system though.

Also, I think you should avoid using block_prepare_write() and instead call gfs2_block_map directly to map (up to) a whole extent at once. That can then be written with zeros before attempting the next allocation. That should make things much faster, overall.

The other thing we need to look at is error handling. If there is an error should we give up after having allocated as much as possible or should we trim off the blocks? If the latter then I have a useful helper function which is part of my new truncate code that is waiting for the next merge window to by completed.

Other than that, it looks pretty good, and smaller than I'd expected too.

--- Additional comment from bmarzins on 2010-08-06 19:16:38 EDT ---

This is a revised version of the fallocate patch that can handle fallocate requests that are larger than a single resource group.  It starts by looking for resource groups that are more than half empty, and each time it is unable to satisfy its target size, it cuts the goal in half.  Regardless of what size it asked for, when it finds a resource group, it reserves as many blocks as it can from that resource group.  With this patch, fallocates usually are around five times as fast as dd'ing zeros to the file.

--- Additional comment from bmarzins on 2010-08-06 19:21:27 EDT ---

I looked at doing away with block_prepare_write(), but it looks like I need to do most of what it does.  For the rare times when I might need to call it multiple times for one page there is probably some savings to be had, but that could only happen when using fallocate to fill in a holey file whose blocksize is less than
the page size.

--- Additional comment from swhiteho on 2010-08-09 09:06:50 EDT ---

This looks really good. Just a few (minor) niggles though:

fs2_page_add_databufs and gfs2_write_alloc_required should be able to retain their "unsigned int" sized size arguments since the max allocation cannot be larger than one rgrp which is a max of 2^32 - (sizeof rgrp header blocks) long. Maybe that was done to try and lose some casts along the way?

At the top of gfs2_fallocate() there are a few shift and mask operations that it would be good to have meaningfully named macros or inlined functions for.

Beyond that it looks really good to me. I realised that there will be a merge order issue wrt the new truncate code, since i_disksize is going to go away once that has been merged. I'm currenly waiting for the merge of the vfs tree in the current merge window, and once that has been done, I'll be able to rebase the new truncate code.

The only difference that it is likely to make is that you won't need to update i_disksize separately from i_size. One further though occurs: if we allow the inode to grow beyond its filesize with fallocate, then should a truncate which is set above the current file size still remove any allocated blocks beyond the requested truncate point. Hmm. I wonder how other filesystems handle that. Anyway we can figure that out as we go along.

The fact that you've managed a 5x speed up with this, means that we now have a target for Dave Chinner's multipage write work,too.

--- Additional comment from bmarzins on 2010-08-23 11:57:20 EDT ---

Posted.

--- Additional comment from bmarzins on 2010-08-23 15:52:54 EDT ---

This is the version of the fallocate patch that I posted, and Steve accepted.

Comment 2 Ben Marzinski 2010-10-14 17:43:00 UTC
Created attachment 453519 [details]
patch posted to rhkernel.

This patch adds fallocate support to GFS2.

Comment 3 Aristeu Rozanski 2010-11-17 19:46:08 UTC
Patch(es) available on kernel-2.6.32-83.el6

Comment 6 Nate Straz 2011-04-07 21:18:51 UTC
Verified using fallocate tests from LTP.

[root@dash-01 community]# uname -r
2.6.32-128.el6.x86_64
[root@dash-01 community]# mount -t gfs2
/dev/mapper/dash-renamer on /mnt/renamer type gfs2 (rw,seclabel,relatime,hostdata=jid=0)

[root@dash-01 community]# export TMPDIR=/mnt/renamer
[root@dash-01 community]# ./fallocate01
fallocate01    1  TPASS  :  fallocate(3, 0, 49152, 4096) returned 0
fallocate01    2  TPASS  :  write operation on fallocated(3, 0, 49152, 4096) returned 1
fallocate01    3  TPASS  :  fallocate(4, 1, 49152, 4096) returned 0
fallocate01    4  TPASS  :  write operation on fallocated(4, 1, 49152, 4096) returned 1
[root@dash-01 community]# ./fallocate02
fallocate02    1  TPASS  :  fallocate(tfile_read_3018:3, 1, 0, 4096) returned 9
fallocate02    2  TPASS  :  fallocate(tfile_write_3018:4, 1, -4096, 4096) returned 22
fallocate02    3  TPASS  :  fallocate(tfile_write_3018:4, 1, 4096, -4096) returned 22
fallocate02    4  TPASS  :  fallocate(tfile_write_3018:4, 1, 49152, 0) returned 22
fallocate02    5  TPASS  :  fallocate(tfile_write_3018:4, 1, 49152, -4096) returned 22
fallocate02    6  TPASS  :  fallocate(tfile_write_3018:4, 1, -98304, 4096) returned 22
fallocate02    7  TPASS  :  fallocate(tfile_write_3018:4, 1, 0, 4096) returned 0
[root@dash-01 community]# ./fallocate03
fallocate03    1  TPASS  :  fallocate(tfile_sparse_3019, 0, 8192, 4096) returned 0
fallocate03    2  TPASS  :  fallocate(tfile_sparse_3019, 0, 49152, 4096) returned 0
fallocate03    3  TPASS  :  fallocate(tfile_sparse_3019, 0, 69632, 4096) returned 0
fallocate03    4  TPASS  :  fallocate(tfile_sparse_3019, 0, 102400, 4096) returned 0
fallocate03    5  TPASS  :  fallocate(tfile_sparse_3019, 1, 8192, 4096) returned 0
fallocate03    6  TPASS  :  fallocate(tfile_sparse_3019, 1, 49152, 4096) returned 0
fallocate03    7  TPASS  :  fallocate(tfile_sparse_3019, 1, 77824, 4096) returned 0
fallocate03    8  TPASS  :  fallocate(tfile_sparse_3019, 1, 106496, 4096) returned 0

Comment 7 errata-xmlrpc 2011-05-23 20:50:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.