Bug 213635

Summary: 32-bit fstat on 64-bit kernel returns EOVERFLOW when st_ino gets too large
Product: Red Hat Enterprise Linux 3 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: esandeen, petrides, steved, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2007-0436 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-11 17:57:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 212183    
Attachments:
Description Flags
patch 1
none
kernel patch 1
none
kernel patch 2
none
kernel patch 3
none
patch 4
none
patch 5 none

Description Issue Tracker 2006-11-02 12:38:01 UTC
Escalated to Bugzilla from IssueTracker

Comment 14 Jeff Layton 2006-11-02 13:13:04 UTC
Opening this as a glibc case since that's where the EOVERFLOW is generated, but
a kernel fix might actually be appropriate. This is also a problem on RHEL4 and
FC6 (and I'm presuming it's likely a problem upstream as well).

The fundamental problem here is that the kernel declares ino_t as "unsigned
long". On ia32, this is a 32-bit uint, and on x86_64, it's a 64-bit uint.
Everything works correctly when you run 32-bit apps on a 32-bit kernel, or
64-bit apps on a 64-bit kernel. The problem is running 32-bit apps on a 64-bit
kernel.

fstat seems to use fstat64, and from what I can tell everything there between
glibc and the kernel seems to operate correctly. glibc allocates a stat64 struct
to hold the data from the fstat64 call.

From there, though, it has to copy that data into the statbuf that was allocated
by the userspace app. Unless the app was compiled with -D_FILE_OFFSET_BITS=64,
the stat buffer has a 32-bit st_ino field. __xstat32_conv does best effort to
copy the info returned by the kernel into this buffer, but if it won't fit, it
generates an EOVERFLOW error in userspace.

The upshot here is that we have 32-bit apps (not compiled with
-D_FILE_OFFSET_BITS=64) that work just fine on a 32-bit kernel, but when run in
compatability mode on a 64-bit kernel, they fall down once the inode counter
gets high enough.

The right fix would seem to be to just make ino_t be "unsigned long long" so
that the sizes are the same everywhere, but that would seem to be a huge ABI
breaker and might have performance impact in the kernel. Obviously that would
need to be an upstream change.

In the meantime, we probably need a workaround of sorts in glibc. The kernel
seems to start the inode counter at 1. Perhaps we could make an environment var
switch that zeroes out the st_ino field instead of returning EOVERFLOW in this
condition?


Comment 15 Jeff Layton 2006-11-02 13:43:28 UTC
Created attachment 140129 [details]
patch 1

Here's a proposed (but untested) patch. This makes the st_ino EOVERFLOW
behavior conditional on $ST_INO_64_BIT_. If that env var is set, we zero out
the st_ino field instead of throwing an EOVERFLOW error.

Comment 16 Jakub Jelinek 2006-11-02 13:47:41 UTC
The glibc behaviour is correct and shouldn't change, if some fd has st_ino
that doesn't fit into 32-bit ino_t, reporting a wrong value is worse than
reporting an error, where the user has a chance to use *stat64 instead and get
the correct value.

"The right fix would seem to be to just make ino_t be "unsigned long long" so
that the sizes are the same everywhere, but that would seem to be a huge ABI
breaker and might have performance impact in the kernel. Obviously that would
need to be an upstream change."
Hm, that's exactly what -D_FILE_OFFSET_BITS=64 is about, it switches to an ABI
with 64-bit ino_t (among other things).

The right fix is to make sure everything that might ever need 64-bit
ino_t/off_t/etc. is built with -D_FILE_OFFSET_BITS=64.  Not sure what's the
exact state of this in RHEL3, but in RHEL4 and even more in FC6 most programs
that matter are built with this.

In the mean time, the kernel should try to use values that fit into 32-bit
fields where it has the luxury to control that (e.g. when some on-disk filesystem
uses >= 4GB inode numbers, it doesn't have that luxury, but e.g. in the pipe
fs inode cases it has that, only it can mean slightly slower inode generation).
That could help a little bit non-LFS apps before they are finally fixed up
(recompiled with LFS after fixing possible LFS issues in them (using wrong types
etc.)).  In new_inode that pipes use there is a static unsigned long counter
that is incremented each time, which means that e.g. for 32-bit kernels it
really doesn't care that there can be duplicate st_ino values.  So, 64-bit
kernels either could equally not care so much and change the counter to unsigned
int, or use some data structure which would allow it to track what inode numbers
are currently taken and what are free to use.

Comment 17 Jakub Jelinek 2006-11-02 13:50:00 UTC
For c#15 patch, there is no way we will add something like this into glibc.

Comment 18 Jeff Layton 2006-11-02 13:57:13 UTC
Ok, that sounds reasonable. It does seem like if 32-bit uints are OK for
new_inode() on a 32-bit kernel, then they ought to be ok on a 64-bit kernel.
I'll code up such a patch and see what it breaks ;-).


Comment 19 Jeff Layton 2006-11-02 14:07:02 UTC
Created attachment 140130 [details]
kernel patch 1

Untested kernel patch. Changes the last_inode counter in new_inode to be an
"unsigned int" rather than a "unsigned long". This will mean that the counter
will overflow much sooner, but presumably this is expected at some point, and
the effects shouldn't be dire.

i.e. if 32-bits is enough on a 32-bit kernel, it should be enough on a 64-bit
kernel. If we're really concerned here, we could also consider making this an
x86_64-only change.

Comment 20 Jeff Layton 2006-11-02 14:16:29 UTC
Created attachment 140132 [details]
kernel patch 2

Updated patch. Make this conditional on CONFIG_IA32_EMULATION. That should just
make this just happen on our x86_64 kernels.

Comment 21 Jeff Layton 2006-11-02 14:48:01 UTC
Created attachment 140136 [details]
kernel patch 3

Might as well do this for ia64->ia32 compatability mode as well.

Comment 23 Jeff Layton 2006-11-09 15:43:28 UTC
Created attachment 140783 [details]
patch 4

This changes the #ifdef to be dependent on CONFIG_COMPAT (to catch the same
problem on other arches with a compatability mode). Also, change the static
counter in iunique in a similar fashion.

Comment 24 Jeff Layton 2006-11-09 15:58:30 UTC
Created attachment 140787 [details]
patch 5

Slight revision. Don't set the static counter to 0 in iunique. It's not needed.

Comment 28 RHEL Program Management 2006-12-15 14:47:25 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 29 Jay Turner 2006-12-18 14:14:34 UTC
QE ack for 3.9.

Comment 30 Ernie Petrides 2006-12-21 02:58:51 UTC
A fix for this problem has just been committed to the RHEL3 U9
patch pool this evening (in kernel version 2.4.21-47.4.EL).


Comment 36 Red Hat Bugzilla 2007-06-11 17:57:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0436.html