Bug 490071

Summary: [LTC 6.0 FEAT] Request to include fallocate support in glibc [201881]
Product: Red Hat Enterprise Linux 6 Reporter: IBM Bug Proxy <bugproxy>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NEXTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: jjarvis, jlarrew, notting
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-14 19:46:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 356741    

Description IBM Bug Proxy 2009-03-13 03:50:42 UTC
=Comment: #0=================================================
Emily J. Ratliff <ratliff.com> - 
1. Feature Overview:
Feature Id:	[201881]
a. Name of Feature:	Request to include fallocate support in glibc
b. Feature Description
fallocate() was supported in glibc via posix_fallocate() call, but glibc was missing a direct
support of fallocate(2).  Application today who would like to have to call syscall fallocate()
directly.  This makes it hard to port cross archs.
 
 It seems Ulrich Drepper has added fallocate(2) and fallocate64(2) and DB2 would like to have this
added to RHEL 6.

2. Feature Details:
Sponsor:	SWG - DB2
Architectures:
x86_64
ppc64
s390x

Arch Specificity: Both
Affects Toolchain: Yes
Delivery Mechanism: Request Red Hat development assistance
Category:	Toolchain
Request Type:	Code Development Support from Distributor
d. Upstream Acceptance:	Accepted
Sponsor Priority	2
f. Severity: Medium
IBM Confidential:	no
Code Contribution:	3rd party code
g. Component Version Target:	

3. Business Case
Having this code in RHEL 6 would make it easier for DB2 to port across archs.

4. Primary contact at Red Hat: 
John Jarvis
jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Stephanie Glass, sglass.com, 512-838-9284

Technical contact(s):

Mingming Cao, mcao.com

Comment 1 Bill Nottingham 2009-03-13 19:50:26 UTC
...
Application today who would like to have to call syscall fallocate()
directly.  This makes it hard to port cross archs
...

Given that posix_fallocate is, well, POSIX, and exists as a glibc call on all arches, why would you need the nonstandard Linux-only version that needs an entry in each architecture's syscall table to make porting across arches easier?

Comment 2 IBM Bug Proxy 2009-03-13 20:50:34 UTC
(In reply to comment #6)
> ...
> Application today who would like to have to call syscall fallocate()
> directly.  This makes it hard to port cross archs
> ...
>
> Given that posix_fallocate is, well, POSIX, and exists as a glibc call on all
> arches, why would you need the nonstandard Linux-only version that needs an
> entry in each architecture's syscall table to make porting across arches
> easier?
>

Currently, the Linux fallocate system call has already being added in each architecture since 2.6.23 kernel.  This syscall is only exposed via glibc's
posix_fallocate() interface. Unfortunately, this is does not cover the following two cases:

1) If the underlying fs doesn't support fallocate()(like ext2/3), posix_fallocate() would fall back to slow zero out preallocation. This is not prefered by application like cp or database, which would like fast preallocated the space if the underlying fs support fast preallocate(like ext4), but would not like the posix_fallocate() to zero the file out if the underlying fs (like ext3) doesn't support fast preallocation, as apps will rewrite the data immediately anyway.

2)As described in http://sourceware.org/bugzilla/show_bug.cgi?id=7083
(cut and paste the contents here)

the Linux fallocate system call is only exposed via glibc's
posix_fallocate() interface, this means it is not possible for
the user to pass in the FALLOC_FL_KEEP_SIZE flag.   This flag reserves blocks
for use by the inode, but without changing the size of the file as reported by
i_size.   This can be useful in some circumstances.   For example, a program
that rotates logfiles might want to use fallocate() with FALLOC_FL_KEEP_SIZE to
reserve space for the new /var/log/messages file, without changing the reported
i_size of the file.   This allows the new /var/log/messages file to be
contiguous, while still allowing tail -f to work, and log analyzers to read from
the file without getting a large number of trailing zero's --- which would be
the case if posix_fallocate() (which does not use FALLOC_FL_KEEP_SIZE flag, as
required by POSIX specification) were used to preallocate the file.

For these reason, we need a glibc-mediated interface that exposes the Linux
fallocate system call's flags field.  Without this, we will have to recommend
that programs that wish to use this functionality (which is implemented in ext4
and xfs) will have to call the system call directly, which would be hard to make
portable across different Linux architectures.

Comment 3 Bill Nottingham 2009-03-13 20:58:01 UTC
syscall(sys_fallocate, ...)

It's no less portable across arches than a glibc wrapper. (I'm not denying that having the non-POSIX behavior may be useful... there's just nothing architecture-specific about it.)

Comment 4 IBM Bug Proxy 2009-03-13 21:50:23 UTC
The prototype for sys_fallocate is
long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);

The parameters 'offset' and 'len' are 64 bit, for the 64 bit system call

they fit into a register. So for 64 bit the call is simple:

syscall(__NR_fallocate, fd, 0, offset, len);

For the native 32 bit system call and the compat system call the

register only has 32 bit. The 'offset' and 'len' parameters are split

into two 32 bit values:

syscall(__NR_fallocate, fd, 0, offset >> 32, (unsigned int) offset,

len >> 32, (unsigned int) len);

and there is big endian and little endian issue on 32 bit machine

This is application need to do if they call syscall directly without the glibc wrapper

#include <endian.h>

#if (__WORDSIZE == 64 )

// all 64-bit systems: x86_64, s390x, ppc64

syscall( __NR_fallocate, fd, 0, offset, len ) ;

#else

#if __BYTE_ORDER == __BIG_ENDIAN

// big-endian 32-bit systems: s390, ppc

syscall( __NR_fallocate, fd, 0,

(unsigned int)(offset >> 32), (unsigned int)offset,

(unsigned int)(len >> 32), (unsigned int)len);

#else

// little-endian 32-bit systems: i386

syscall( __NR_fallocate, fd, 0,

(unsigned int)offset, (unsigned int)(offset >> 32),

(unsigned int)len, (unsigned int)(len >> 32));

#endif

#endif

Note that Ulrich Drepper has added fallocate(2) and fallocate64(2) in glibc cvs a few weeks ago, according to his comment in bug:
http://sourceware.org/bugzilla/show_bug.cgi?id=7083

This request is to update the glibc in RHEL6 which include this  fallocate(2) and fallocate64(2) wrapper.  Thanks,

Regards,
Mingming

Comment 5 Jakub Jelinek 2009-03-14 19:46:59 UTC
Given that this is already in glibc CVS and RHEL6 will be based on glibc 2.10 or newer, this is already fixed.