Bug 241348

Summary:	large inode number patch breaks lots of applications
Product:	Red Hat Enterprise Linux 5	Reporter:	Pierre Ossman <pierre-bugzilla>
Component:	kernel	Assignee:	Peter Staubach <staubach>
Status:	CLOSED DUPLICATE	QA Contact:	Martin Jenner <mjenner>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5.0	CC:	astrand, dhowells, john.johansson, k.georgiou, steved
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-12-04 13:58:06 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pierre Ossman 2007-05-25 11:19:34 UTC

The kernel included with RHEL has a patch called
"linux-2.6-nfs-64-bit-inode-support.patch" which makes the NFS client expose the
large inode numbers it gets from the server.

The addition of this patch is based on the flawed assumption that getting such
large inode numbers from the server is rare and/or that user space doesn't have
a problem with large inode number.

Assumption 1:

The NFS server included in Linux uses the local inode number as the file handle,
which usually means a low number. The assumption breaks here if the underlying
file system does not have inodes (e.g. FAT) and the system generates them in any
other way than counting from the bottom.

The assumption is also flawed as there are many other NFS server. E.g. the unfs3
server, unfsd, Microsoft's NFS server, Novell's NFS server, etc.
At least the unfs* ones regurarly give out large file handles (unfs3 works like
the Linux kernel server on Unix systems these days, but when run on Windows it
always gives out large inode numbers).

Assumption 2:

glibc always calls stat64() and similar, meaning the kernel doesn't have to
return -E2BIG. But glibc has its own logic for dealing with 64->32 conversion,
so applications calling stat[32]() will fail.

This is a very serious problem for us and our customers who currently have to
build custom kernels. Since packaging of e.g. KDE and OpenOffice.org does not
have large file support, it's more or less impossible to run the standard kernel
on a production system where NFS access is needed. User space is not yet ready
for large inodes, so please revert this patch.

Comment 1 John Johansson 2007-05-28 09:51:17 UTC

I can do nothing but join the appeal for the patch to ble reverted. In our
enviroment we have rhel client 5 as desktop and building custom kernels and
managing updates for theese is not what we had expected when we choose red hat
as distribution for our enviroment.

Comment 2 David Howells 2007-06-01 13:54:54 UTC

I'll see what I can do.  Note, however, that this patch fixes another bug 
whereby ld.so breaks horribly because it sees two objects as being the same 
because they have the same device numbers, and, *apparently*, the same inode 
numbers.  Whilst ld.so does check the 64-bit inode numbers, it is still 
vulnerable to anything that compresses 64-bit inode number space down to 
32-bit.

Can you give more details on how KDE and OpenOffice fail?

I can see a couple of ways offhand of dealing with this in the immediate term 
in the kernel:

 (1) Control the use of this feature with a mount option or some other option.

 (2) Drop the second half of the patch entirely (relating to NFS), but replace 
the NFS inode number hashing function with something that produces numbers 
that are less likely to conflict.

I'll discuss this with various people.

Comment 3 Pierre Ossman 2007-06-05 06:42:06 UTC

(In reply to comment #2)
> 
> Can you give more details on how KDE and OpenOffice fail?
> 

They just say "unable to read file" and give some nonsense about permissions.
"cat" and other tools work fine, so that is complete bollocks.

strace:ing the thing shows a couple of stat64():s as the last operations before
failure. They of course succeed, so the problem is deeper, inside glibc's stat()
function.

The problem is easily attributed to inodes as two identical files, except for
the inode number, cause different behaviour.

Comment 4 Peter Åstrand 2007-06-07 21:54:43 UTC

>Note, however, that this patch fixes another bug 
>whereby ld.so breaks horribly because it sees two objects as being the same 
>because they have the same device numbers, and, *apparently*, the same inode 
>numbers.  Whilst ld.so does check the 64-bit inode numbers, it is still 
>vulnerable to anything that compresses 64-bit inode number space down to 
>32-bit.

Can you give us a bit more of information on this problem? Do you have a
Bugzilla reference?

Comment 5 David Howells 2007-06-08 09:57:02 UTC

Bug 171702 pertains to the RHEL4 version of this bug.  I don't know whether 
you'll be able to see most of the text as it's mostly private.  The bug was 
originally detected with AFS, but they also reproduced it with NFS (bug 171702 
comment 11), as have I.

Bug 202461 pertains to the RHEL5 version of this bug.

To sum the problem up, the original report stated that "_dl_map_object_from_fd 
from dl-load.c only uses st_ino and st_dev to determine if a library is 
already loaded".

With AFS, the OpenAFS client produced a 64-bit virtual inode number from a 
combination of vnode ID and key that represented the volume the vnode resided 
in.  With NFSv3+, the server can issue a 64-bit file ID.  In both cases, these 
were being munged into 32-bit quantities by the VFS on a 32-bit kernel, 
resulting in inode number collisions.

The ELF loader then malfunctioned when it was asked to load two separate 
libraries that had the same apparent inode number.  It incorrectly *thought* 
that the two libraries were the same as they had identical dev ID and inode 
numbers, when in fact they weren't.

Comment 6 Peter Åstrand 2007-06-18 14:43:30 UTC

I'm unable to access any of those bugs, but I think I understand the problem
from your description. I believe the current solution is not very good, though.
There are at least two other solutions:

1) Keep the kernel patch, so that the entire 64 bits are passed to glibc, but
patch glibc so that it can do the truncation instead of the kernel. This would
make it possible to expose all 64 bits to LFS-aware applications, but truncate
to 32 bits for non-LFS applications calling stat. A glibc solution would also
allow runtime configuration, perhaps using an environment variable. 

An additional advantage of this solution is that it would allow 32-bit non-LFS
apps to run on 64-bit kernels with large fileids, which is currently not
possible (stat gives EOVERFLOW). One problem might be that stat gives different
inode numbers depending on if LFS is used or not, but I don't think this is a
major one. 


2) Revert linux-2.6-nfs-64-bit-inode-support.patch. This has the advantage that
the RH kernel would work as the normal one. Instead, ld.so could be modified to
take some extra care. For example, instead of just checking st_dev and st_ino,
it could check say st_size and st_mtime as well. Or even a hash of some portion
of the library. 

We won't achieve perfectness until everyone is using LFS or 64-bit kernels and
applications, so we need to have some kind of "good enough" solution until this
happens. But having *every* non-LFS application fail upon large NFS fileids,
even on 32 bit kernels, is not good enough, IMHO. It could be argued that it's
more safe/deterministic to remove support for non-LFS applications altogether
then; it's not nice with applications that starts failing "someday", just
because the file server volume has grown beyond a certain point, or perhaps due
to a NFS server software upgrade. 

NFSv3, which added support for 64 bit fileids, is 12 years now; we shouldn't be
satisfied with a solution that doesn't support this.

Comment 7 Peter Staubach 2007-08-15 14:06:58 UTC

I'd like a little more information, please.  Most of the NFS servers
which were mentioned in the first comment are not commonly used
servers.  The Novell NFS server may be more commonly used, but whether
it returns 64 bit fileids is really dependent upon the underlying file
system on the server.  What file system is it and is there a minimum
size that it needs to be before returning fileids which do not fit into
32 bits?  Is there a way to turn off the 64 bit fileids in any of the
mentioned NFS servers?

By the way, if the Novell NFS server is returning fileids that don't
fit into 32 bit, then the mentioned applications won't work on the
Novell system either.  They will be encountering these files locally.

Comment 8 Peter Åstrand 2007-08-15 19:24:23 UTC

(I'm answering also on behalf of the original reporter Pierre Ossman.) 

>Most of the NFS servers which were mentioned in the first comment are not
>commonly used servers.

True, most Linux clients connects to other servers, such as the Linux server. So
if you just look at the "percent figures", these servers are rare. In absolute
figures, however, this may affect many system and users. For example, the UNFS3
server is used in HPC installations, the ThinLinc product and embedded systems.
I believe the use of the NFSv3 protocol will increase further along with the
adoption of Linux on the desktop and interoperability with Novell and Microsoft
systems is getting more and more important, don't you think?


>The Novell NFS server may be more commonly used, but whether
>it returns 64 bit fileids is really dependent upon the underlying file
>system on the server.

Actually, Novell NFS servers are not very common, due to bugs and technical
restrictions. We have a great deal of experience with Linux+Novell integration
and our conclusion is that NCP/ncpfs is the more stable, and common, solution.
(As a side note, see bug 235074). But that doesn't mean that we should make the
Linux NFS client less tolerant or less capable; who knows, the next version of
Netware might have an excellent NFS implementation? 


>...but whether it returns 64 bit fileids is really dependent upon the
>underlying file system on the server.

Do you have any evidence for this? This is how the Linux NFS server works. This
is not a requirement of the NFSv3 protocol; servers are free to generate
arbitrary fileids. Perhaps Netware NFS servers always returns small fileids, or
perhaps they always returns large fileids, or perhaps large fileids are only
returned for files with names starting with "_", or perhaps it depends on the
file system. Or it might depend on which Service Pack is installed. We don't
know, and the point is that we shouldn't need to know: The client should be able
to handle large fileids in any case. 


>Is there a way to turn off the 64 bit fileids in any of the
>mentioned NFS servers?

I don't know about the Netware or MS servers, but for unfs3, there is currently
no such option. In principle, I could add such an option, but that would have
several drawbacks:

1) We would still have the interoperability problem with all currently deployed
versions. (For example, since the unfs3 server is embedded in the ThinLinc
client, it has been installed on something like thousands of client systems.)

2) The Windows version of unfs3 generates fileids by hashing. Restricting
fileids to 32 bits would greatly increase the risk of collisions. 

3) It wouldn't make sense to give out different fileids to different clients, so
an option for truncating to 32 bits should probably be a server global option.
That would mean that *all* clients would see truncated fileids, even 64 bit
kernels with 64 bit applications, fully capable of large fileids. 

Handling large fileids is only a problem for 32 bit non-LFS applications. In
short, it's a client problem. Solving a client problem on the server seems
fundamentally wrong to me. 


>By the way, if the Novell NFS server is returning fileids that don't
>fit into 32 bit, then the mentioned applications won't work on the
>Novell system either.  They will be encountering these files locally.

Again, nothing requires that NFS servers should return the file inode number as
the fileid. Many servers don't even have a concept of inode numbers. 

The Netware NOS does not provide a POSIX API (that I'm aware of), so saying that
"applications won't work locally either" doesn't really make sense. 


Let's take a step back. The purpose of this patch is to give 32 bit LFS
applications access to all 64 bits. This is good. However, the drawback is that
we'll get EOVERFLOW in non-LFS apps. This is something bad, more bad than
truncation, as we had before, right? So, the question is, is the drawback so
minor that we can ignore it (thus keep this patch)? I'm arguing that it's not.
Or, are the positive effects of this patch so minor that we can ignore them,
thus dropping this patch? Trond is arguing that this is not the case. 

I think we'll need an runtime option, so that the behaviour is configurable.
Implementing this should be easy.

Comment 9 Peter Staubach 2007-08-15 20:21:59 UTC

Let's be clear here, please.  This is not an NFS bug nor an NFS issue.
The situation is not unique to NFS.  This is an application problem
and it is the applications which should be fixed.

I asked about the server options because, historically, that's where
it has been worked around.

Without more specific information on real life situation where
critical applications have malfunctioned, I will need to close this
as WONTFIX.  There is no technical reason that NFS is broken.  The
real question is what is the business impact and I think, the
situation is being greatly overstated.  Yes, Windows and Linux
interoperability is becoming more common, but not with NFS.  CIFS
is the file system choice there.

I also haven't heard about any specific servers which actually
return 64 bit fileids.  What are they and why do they use the
large fileids and if that is valuable, then why is it also not
valuable in this situation?

Comment 10 Peter Åstrand 2007-08-16 08:33:25 UTC

>This is not an NFS bug nor an NFS issue. The situation is not unique to NFS.

True, but I would say that NFS is suffering the most from it. This problem of
sqeezing 64 bits into 32 has been around since NFSv3 was created 12 years ago,
but during this time, very few local file systems have supported 64 bit inode
numbers. 


>This is an application problem and it is the applications which should be >fixed.

Good, then we have consensus about this NOT being a NFS server problem. 


>I asked about the server options because, historically, that's where
>it has been worked around.

No, historically, the Linux NFSv3 client has truncated. As far as I know, it has
done so from the beginning. 


>Yes, Windows and Linux interoperability is becoming more common, but not with
>NFS.  CIFS is the file system choice there.

You cannot have $HOME on CIFS, since it doesn't support POSIX semantics. Btw, an
addition to the latest Windows Server version (Windows 2003R2) was SFU, which
contains the MS NFS server. 


>I also haven't heard about any specific servers which actually
>return 64 bit fileids.  What are they and why do they use the
>large fileids and if that is valuable, then why is it also not
>valuable in this situation?

As I said, unfs3 is using large fileid in some versions. Earlier versions used
them because unfs3 supports exporting multiple file systems under a single
export point. To avoid inum collisions, the device number was put into the high
bits of the fileid. In recent Linux versions, there is now support for
automatically creating new mounts upon fsid crossing. Because of this, we have
also changed recent unfs3 versions to return the real inum instead. But older
versions are still around, and the Windows port, as I said, is creating inums
from hashing paths. 


>Without more specific information on real life situation where
>critical applications have malfunctioned, I will need to close this
>as WONTFIX.

I thought I was pretty specific, mentioning both HPC installations, ThinLinc
servers and embedded systems...

In any case, you argue that the applications needs to be fixed, and we have
indentified two applications in RHEL5 that needs to be fixed, yet you plan to
close as WONTFIX? Why, do you want us to open new bug reports for each
identified broken application instead?

Still, I'm arguing that we should have a runtime kernel option. When things that
have been working for 10 years suddenly stops working, you have a regression.

Comment 11 Peter Staubach 2007-08-16 12:23:41 UTC

Yes, I want you to open bugzillas on the broken applications.

This is not an NFS bug.  These same issues occured when large file
support was implemented and large files starting appearing.  The
applications that needed to get fixed were fixed.  Now, we find
that more applications need to be fixed because the world has
changed around them.

The right solution is not to add complexity to the NFS implementation
in the Linux kernel.  The right solution is to fix the applications
which need fixing.

Comment 12 Peter Åstrand 2007-08-16 13:02:29 UTC

>Yes, I want you to open bugzillas on the broken applications.

Ok, we will do that. 


>This is not an NFS bug.  These same issues occured when large file
>support was implemented and large files starting appearing.  The
>applications that needed to get fixed were fixed.  Now, we find
>that more applications need to be fixed because the world has
>changed around them.

It's not an NFS bug, it's a NFS/kernel incompatibility problem. A flexible,
capable and configurable implementation could still support old apps and thus
avoiding changing the world around them and break things. 

I don't think the comparison with large files makes sense: In that case, you
could still use the applications as long as you didn't try to handle large
files. In this case, you can't even do a single stat(), even if you aren't
interested in the st_ino value. That's quite severe. 


>The right solution is not to add complexity to the NFS implementation
>in the Linux kernel.  

An option for truncating would be simple, probably a few lines of code. It could
even be added to the VFS layer instead of the NFS client, to make it handle the
case with local file systems as well.

Comment 13 Peter Staubach 2007-08-16 13:19:37 UTC

Why do you keep thinking that the mere existence of 64 bit ino support
in things like NFS automatically mean that no non-LFS applications will
work?  Just because the server _can_ generate 64 bit fileids does not
mean that it _will_.  I am not aware of any NFS servers, that ship with
any volume, that generate 64 bit fileids and without some form of
workaround.

A few lines here, a few lines there, and the code gets more and more
complicated.  More options equals more complexity.  The system should
just do the right thing and if that means fixing applications which
don't do the right thing, then so be it.  Perhaps they worked when they
were first designed and implemented, but the world has changed, and so
must they.

It is time to stop hobbling the NFS client and the system.

Comment 14 Peter Åstrand 2007-08-16 14:40:55 UTC

Ok, let's say that servers that returns 64 bit fileids are basically
non-existent. In that case, why are you pushing this patch at all? It won't make
a difference if servers keep returning small fileids. In that case, the earlier
implementation worked just fine. 


>but the world has changed

Not yet, but you are trying to change it for a reason I don't really understand. 


Btw, which kind of server and file system was used in 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171702#c11? Apparently,
whoever wrote that comment had access to a server which generated large fileids...

I fail to understand why this question for a 100% correct system that "does the
right thing" for LFS applications is more important than preserving backwards
compatibility, especially when backwards compatibility is so cheap and easy. 

If support for non-LFS applications is worth nothing, then why not remove the 32
bit stat() system call altogether?

Comment 15 Pierre Ossman 2007-08-16 14:59:27 UTC

If this patch is such an important thing, and the right way to go is to fix up
all non-LFS applications, then this can be done without forcing all users to
become beta testers:

1. Remove the non-LFS calls from glibc.

2. Do a full repository rebuild.

3. Fix all applications that fail to rebuild because of missing symbols.

Starting by silently and randomly breaking it in a way that customers will have
to discover in the field is not the correct approach.

It's also rather odd that such a disruptive patch is added to the supposedly
stable RHEL, and not Fedora.

Comment 16 Peter Staubach 2007-08-16 15:16:57 UTC

To respond to comment #14, I said that the set of servers which might
respond with 64 bit fileids was small.  I didn't say it was non-existent.
I also hae another bugzilla, 213518, where the 64 bit ino support is
required.  From that bugzilla, there is a clear technical and business
requirement for the support.

Yes, the world has changed.  64 bit ino fields are a reality and have
been so since the LFS architecture was designed.

You are greatly underestimating the cost of maintaining the backwards
compatibility.  Once implemented, we can _never_ get rid of it.  We
are stuck supporting it, forever.

Please stay constructive.  I don't think that the sarcasm or non
useful suggestions will help.

To respond to comment #15, please read the comments regarding the
set of NFS servers that will respond with 64 bit fileids.  Also
please note that the client side patch is already upstream and the
server side patch has been submitted once and will be submitted
again soon.

---

I have a customer, a large customer, with a clear technical and
business case.  I haven't seen anything to contradict that,
except for broken applications, that could have been fixed for
years, which may or may not even break and probably won't, for
the majority of our customers.

My suggestion would be that if an application is noticed, that
fails to work in some specific configuration, then please file
a bugzilla against that application so that it can be fixed.

Comment 17 Peter Åstrand 2007-08-16 15:31:31 UTC

>I also hae another bugzilla, 213518, where the 64 bit ino support is
>required.  From that bugzilla, there is a clear technical and business
>requirement for the support.

As most other Bugzilla references in this bug, this one is inaccessible. If
there are additional arguments for this patch, please show them to us. We cannot
draw conclusions from inaccessible bugzilla entries. 


>You are greatly underestimating the cost of maintaining the backwards
>compatibility.  Once implemented, we can _never_ get rid of it.  We
>are stuck supporting it, forever.

In my opinion, we got stuck 10-12 years ago when the decision was made to
truncate large inums. As you say, once you have something in place, you should
keep it. 


>I have a customer, a large customer, with a clear technical and
>business case.

Please tell us about it. 


>I haven't seen anything to contradict that, except for broken applications,
>that could have been fixed for years, 

How were application developers supposed to know that their apps were broken?
There have been no warning messages, neither from the kernel nor toolchains. As
earlier, I do not believe that 32 bit non-LFS apps are "broken"; they just use a
different slightly older ABI.

Comment 18 Pierre Ossman 2007-12-04 09:22:20 UTC

Could you consider adding the backward compatibility flag Trond added:

http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-2.6.git;a=commit;h=f43bf0bebed7c33b698a8a25f95812f9e87c3843

Comment 19 Peter Staubach 2007-12-04 13:58:06 UTC

That support is included in the proposed patch.

*** This bug has been marked as a duplicate of 253589 ***

Comment 20 Daniel Riek 2008-05-21 15:36:40 UTC

This has been addressed in 
http://rhn.redhat.com/errata/RHBA-2008-0314.html