The kernel included with RHEL has a patch called "linux-2.6-nfs-64-bit-inode-support.patch" which makes the NFS client expose the large inode numbers it gets from the server. The addition of this patch is based on the flawed assumption that getting such large inode numbers from the server is rare and/or that user space doesn't have a problem with large inode number. Assumption 1: The NFS server included in Linux uses the local inode number as the file handle, which usually means a low number. The assumption breaks here if the underlying file system does not have inodes (e.g. FAT) and the system generates them in any other way than counting from the bottom. The assumption is also flawed as there are many other NFS server. E.g. the unfs3 server, unfsd, Microsoft's NFS server, Novell's NFS server, etc. At least the unfs* ones regurarly give out large file handles (unfs3 works like the Linux kernel server on Unix systems these days, but when run on Windows it always gives out large inode numbers). Assumption 2: glibc always calls stat64() and similar, meaning the kernel doesn't have to return -E2BIG. But glibc has its own logic for dealing with 64->32 conversion, so applications calling stat[32]() will fail. This is a very serious problem for us and our customers who currently have to build custom kernels. Since packaging of e.g. KDE and OpenOffice.org does not have large file support, it's more or less impossible to run the standard kernel on a production system where NFS access is needed. User space is not yet ready for large inodes, so please revert this patch.
I can do nothing but join the appeal for the patch to ble reverted. In our enviroment we have rhel client 5 as desktop and building custom kernels and managing updates for theese is not what we had expected when we choose red hat as distribution for our enviroment.
I'll see what I can do. Note, however, that this patch fixes another bug whereby ld.so breaks horribly because it sees two objects as being the same because they have the same device numbers, and, *apparently*, the same inode numbers. Whilst ld.so does check the 64-bit inode numbers, it is still vulnerable to anything that compresses 64-bit inode number space down to 32-bit. Can you give more details on how KDE and OpenOffice fail? I can see a couple of ways offhand of dealing with this in the immediate term in the kernel: (1) Control the use of this feature with a mount option or some other option. (2) Drop the second half of the patch entirely (relating to NFS), but replace the NFS inode number hashing function with something that produces numbers that are less likely to conflict. I'll discuss this with various people.
(In reply to comment #2) > > Can you give more details on how KDE and OpenOffice fail? > They just say "unable to read file" and give some nonsense about permissions. "cat" and other tools work fine, so that is complete bollocks. strace:ing the thing shows a couple of stat64():s as the last operations before failure. They of course succeed, so the problem is deeper, inside glibc's stat() function. The problem is easily attributed to inodes as two identical files, except for the inode number, cause different behaviour.
>Note, however, that this patch fixes another bug >whereby ld.so breaks horribly because it sees two objects as being the same >because they have the same device numbers, and, *apparently*, the same inode >numbers. Whilst ld.so does check the 64-bit inode numbers, it is still >vulnerable to anything that compresses 64-bit inode number space down to >32-bit. Can you give us a bit more of information on this problem? Do you have a Bugzilla reference?
Bug 171702 pertains to the RHEL4 version of this bug. I don't know whether you'll be able to see most of the text as it's mostly private. The bug was originally detected with AFS, but they also reproduced it with NFS (bug 171702 comment 11), as have I. Bug 202461 pertains to the RHEL5 version of this bug. To sum the problem up, the original report stated that "_dl_map_object_from_fd from dl-load.c only uses st_ino and st_dev to determine if a library is already loaded". With AFS, the OpenAFS client produced a 64-bit virtual inode number from a combination of vnode ID and key that represented the volume the vnode resided in. With NFSv3+, the server can issue a 64-bit file ID. In both cases, these were being munged into 32-bit quantities by the VFS on a 32-bit kernel, resulting in inode number collisions. The ELF loader then malfunctioned when it was asked to load two separate libraries that had the same apparent inode number. It incorrectly *thought* that the two libraries were the same as they had identical dev ID and inode numbers, when in fact they weren't.
I'm unable to access any of those bugs, but I think I understand the problem from your description. I believe the current solution is not very good, though. There are at least two other solutions: 1) Keep the kernel patch, so that the entire 64 bits are passed to glibc, but patch glibc so that it can do the truncation instead of the kernel. This would make it possible to expose all 64 bits to LFS-aware applications, but truncate to 32 bits for non-LFS applications calling stat. A glibc solution would also allow runtime configuration, perhaps using an environment variable. An additional advantage of this solution is that it would allow 32-bit non-LFS apps to run on 64-bit kernels with large fileids, which is currently not possible (stat gives EOVERFLOW). One problem might be that stat gives different inode numbers depending on if LFS is used or not, but I don't think this is a major one. 2) Revert linux-2.6-nfs-64-bit-inode-support.patch. This has the advantage that the RH kernel would work as the normal one. Instead, ld.so could be modified to take some extra care. For example, instead of just checking st_dev and st_ino, it could check say st_size and st_mtime as well. Or even a hash of some portion of the library. We won't achieve perfectness until everyone is using LFS or 64-bit kernels and applications, so we need to have some kind of "good enough" solution until this happens. But having *every* non-LFS application fail upon large NFS fileids, even on 32 bit kernels, is not good enough, IMHO. It could be argued that it's more safe/deterministic to remove support for non-LFS applications altogether then; it's not nice with applications that starts failing "someday", just because the file server volume has grown beyond a certain point, or perhaps due to a NFS server software upgrade. NFSv3, which added support for 64 bit fileids, is 12 years now; we shouldn't be satisfied with a solution that doesn't support this.
I'd like a little more information, please. Most of the NFS servers which were mentioned in the first comment are not commonly used servers. The Novell NFS server may be more commonly used, but whether it returns 64 bit fileids is really dependent upon the underlying file system on the server. What file system is it and is there a minimum size that it needs to be before returning fileids which do not fit into 32 bits? Is there a way to turn off the 64 bit fileids in any of the mentioned NFS servers? By the way, if the Novell NFS server is returning fileids that don't fit into 32 bit, then the mentioned applications won't work on the Novell system either. They will be encountering these files locally.
(I'm answering also on behalf of the original reporter Pierre Ossman.) >Most of the NFS servers which were mentioned in the first comment are not >commonly used servers. True, most Linux clients connects to other servers, such as the Linux server. So if you just look at the "percent figures", these servers are rare. In absolute figures, however, this may affect many system and users. For example, the UNFS3 server is used in HPC installations, the ThinLinc product and embedded systems. I believe the use of the NFSv3 protocol will increase further along with the adoption of Linux on the desktop and interoperability with Novell and Microsoft systems is getting more and more important, don't you think? >The Novell NFS server may be more commonly used, but whether >it returns 64 bit fileids is really dependent upon the underlying file >system on the server. Actually, Novell NFS servers are not very common, due to bugs and technical restrictions. We have a great deal of experience with Linux+Novell integration and our conclusion is that NCP/ncpfs is the more stable, and common, solution. (As a side note, see bug 235074). But that doesn't mean that we should make the Linux NFS client less tolerant or less capable; who knows, the next version of Netware might have an excellent NFS implementation? >...but whether it returns 64 bit fileids is really dependent upon the >underlying file system on the server. Do you have any evidence for this? This is how the Linux NFS server works. This is not a requirement of the NFSv3 protocol; servers are free to generate arbitrary fileids. Perhaps Netware NFS servers always returns small fileids, or perhaps they always returns large fileids, or perhaps large fileids are only returned for files with names starting with "_", or perhaps it depends on the file system. Or it might depend on which Service Pack is installed. We don't know, and the point is that we shouldn't need to know: The client should be able to handle large fileids in any case. >Is there a way to turn off the 64 bit fileids in any of the >mentioned NFS servers? I don't know about the Netware or MS servers, but for unfs3, there is currently no such option. In principle, I could add such an option, but that would have several drawbacks: 1) We would still have the interoperability problem with all currently deployed versions. (For example, since the unfs3 server is embedded in the ThinLinc client, it has been installed on something like thousands of client systems.) 2) The Windows version of unfs3 generates fileids by hashing. Restricting fileids to 32 bits would greatly increase the risk of collisions. 3) It wouldn't make sense to give out different fileids to different clients, so an option for truncating to 32 bits should probably be a server global option. That would mean that *all* clients would see truncated fileids, even 64 bit kernels with 64 bit applications, fully capable of large fileids. Handling large fileids is only a problem for 32 bit non-LFS applications. In short, it's a client problem. Solving a client problem on the server seems fundamentally wrong to me. >By the way, if the Novell NFS server is returning fileids that don't >fit into 32 bit, then the mentioned applications won't work on the >Novell system either. They will be encountering these files locally. Again, nothing requires that NFS servers should return the file inode number as the fileid. Many servers don't even have a concept of inode numbers. The Netware NOS does not provide a POSIX API (that I'm aware of), so saying that "applications won't work locally either" doesn't really make sense. Let's take a step back. The purpose of this patch is to give 32 bit LFS applications access to all 64 bits. This is good. However, the drawback is that we'll get EOVERFLOW in non-LFS apps. This is something bad, more bad than truncation, as we had before, right? So, the question is, is the drawback so minor that we can ignore it (thus keep this patch)? I'm arguing that it's not. Or, are the positive effects of this patch so minor that we can ignore them, thus dropping this patch? Trond is arguing that this is not the case. I think we'll need an runtime option, so that the behaviour is configurable. Implementing this should be easy.
Let's be clear here, please. This is not an NFS bug nor an NFS issue. The situation is not unique to NFS. This is an application problem and it is the applications which should be fixed. I asked about the server options because, historically, that's where it has been worked around. Without more specific information on real life situation where critical applications have malfunctioned, I will need to close this as WONTFIX. There is no technical reason that NFS is broken. The real question is what is the business impact and I think, the situation is being greatly overstated. Yes, Windows and Linux interoperability is becoming more common, but not with NFS. CIFS is the file system choice there. I also haven't heard about any specific servers which actually return 64 bit fileids. What are they and why do they use the large fileids and if that is valuable, then why is it also not valuable in this situation?
>This is not an NFS bug nor an NFS issue. The situation is not unique to NFS. True, but I would say that NFS is suffering the most from it. This problem of sqeezing 64 bits into 32 has been around since NFSv3 was created 12 years ago, but during this time, very few local file systems have supported 64 bit inode numbers. >This is an application problem and it is the applications which should be >fixed. Good, then we have consensus about this NOT being a NFS server problem. >I asked about the server options because, historically, that's where >it has been worked around. No, historically, the Linux NFSv3 client has truncated. As far as I know, it has done so from the beginning. >Yes, Windows and Linux interoperability is becoming more common, but not with >NFS. CIFS is the file system choice there. You cannot have $HOME on CIFS, since it doesn't support POSIX semantics. Btw, an addition to the latest Windows Server version (Windows 2003R2) was SFU, which contains the MS NFS server. >I also haven't heard about any specific servers which actually >return 64 bit fileids. What are they and why do they use the >large fileids and if that is valuable, then why is it also not >valuable in this situation? As I said, unfs3 is using large fileid in some versions. Earlier versions used them because unfs3 supports exporting multiple file systems under a single export point. To avoid inum collisions, the device number was put into the high bits of the fileid. In recent Linux versions, there is now support for automatically creating new mounts upon fsid crossing. Because of this, we have also changed recent unfs3 versions to return the real inum instead. But older versions are still around, and the Windows port, as I said, is creating inums from hashing paths. >Without more specific information on real life situation where >critical applications have malfunctioned, I will need to close this >as WONTFIX. I thought I was pretty specific, mentioning both HPC installations, ThinLinc servers and embedded systems... In any case, you argue that the applications needs to be fixed, and we have indentified two applications in RHEL5 that needs to be fixed, yet you plan to close as WONTFIX? Why, do you want us to open new bug reports for each identified broken application instead? Still, I'm arguing that we should have a runtime kernel option. When things that have been working for 10 years suddenly stops working, you have a regression.
Yes, I want you to open bugzillas on the broken applications. This is not an NFS bug. These same issues occured when large file support was implemented and large files starting appearing. The applications that needed to get fixed were fixed. Now, we find that more applications need to be fixed because the world has changed around them. The right solution is not to add complexity to the NFS implementation in the Linux kernel. The right solution is to fix the applications which need fixing.
>Yes, I want you to open bugzillas on the broken applications. Ok, we will do that. >This is not an NFS bug. These same issues occured when large file >support was implemented and large files starting appearing. The >applications that needed to get fixed were fixed. Now, we find >that more applications need to be fixed because the world has >changed around them. It's not an NFS bug, it's a NFS/kernel incompatibility problem. A flexible, capable and configurable implementation could still support old apps and thus avoiding changing the world around them and break things. I don't think the comparison with large files makes sense: In that case, you could still use the applications as long as you didn't try to handle large files. In this case, you can't even do a single stat(), even if you aren't interested in the st_ino value. That's quite severe. >The right solution is not to add complexity to the NFS implementation >in the Linux kernel. An option for truncating would be simple, probably a few lines of code. It could even be added to the VFS layer instead of the NFS client, to make it handle the case with local file systems as well.
Why do you keep thinking that the mere existence of 64 bit ino support in things like NFS automatically mean that no non-LFS applications will work? Just because the server _can_ generate 64 bit fileids does not mean that it _will_. I am not aware of any NFS servers, that ship with any volume, that generate 64 bit fileids and without some form of workaround. A few lines here, a few lines there, and the code gets more and more complicated. More options equals more complexity. The system should just do the right thing and if that means fixing applications which don't do the right thing, then so be it. Perhaps they worked when they were first designed and implemented, but the world has changed, and so must they. It is time to stop hobbling the NFS client and the system.
Ok, let's say that servers that returns 64 bit fileids are basically non-existent. In that case, why are you pushing this patch at all? It won't make a difference if servers keep returning small fileids. In that case, the earlier implementation worked just fine. >but the world has changed Not yet, but you are trying to change it for a reason I don't really understand. Btw, which kind of server and file system was used in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171702#c11? Apparently, whoever wrote that comment had access to a server which generated large fileids... I fail to understand why this question for a 100% correct system that "does the right thing" for LFS applications is more important than preserving backwards compatibility, especially when backwards compatibility is so cheap and easy. If support for non-LFS applications is worth nothing, then why not remove the 32 bit stat() system call altogether?
If this patch is such an important thing, and the right way to go is to fix up all non-LFS applications, then this can be done without forcing all users to become beta testers: 1. Remove the non-LFS calls from glibc. 2. Do a full repository rebuild. 3. Fix all applications that fail to rebuild because of missing symbols. Starting by silently and randomly breaking it in a way that customers will have to discover in the field is not the correct approach. It's also rather odd that such a disruptive patch is added to the supposedly stable RHEL, and not Fedora.
To respond to comment #14, I said that the set of servers which might respond with 64 bit fileids was small. I didn't say it was non-existent. I also hae another bugzilla, 213518, where the 64 bit ino support is required. From that bugzilla, there is a clear technical and business requirement for the support. Yes, the world has changed. 64 bit ino fields are a reality and have been so since the LFS architecture was designed. You are greatly underestimating the cost of maintaining the backwards compatibility. Once implemented, we can _never_ get rid of it. We are stuck supporting it, forever. Please stay constructive. I don't think that the sarcasm or non useful suggestions will help. To respond to comment #15, please read the comments regarding the set of NFS servers that will respond with 64 bit fileids. Also please note that the client side patch is already upstream and the server side patch has been submitted once and will be submitted again soon. --- I have a customer, a large customer, with a clear technical and business case. I haven't seen anything to contradict that, except for broken applications, that could have been fixed for years, which may or may not even break and probably won't, for the majority of our customers. My suggestion would be that if an application is noticed, that fails to work in some specific configuration, then please file a bugzilla against that application so that it can be fixed.
>I also hae another bugzilla, 213518, where the 64 bit ino support is >required. From that bugzilla, there is a clear technical and business >requirement for the support. As most other Bugzilla references in this bug, this one is inaccessible. If there are additional arguments for this patch, please show them to us. We cannot draw conclusions from inaccessible bugzilla entries. >You are greatly underestimating the cost of maintaining the backwards >compatibility. Once implemented, we can _never_ get rid of it. We >are stuck supporting it, forever. In my opinion, we got stuck 10-12 years ago when the decision was made to truncate large inums. As you say, once you have something in place, you should keep it. >I have a customer, a large customer, with a clear technical and >business case. Please tell us about it. >I haven't seen anything to contradict that, except for broken applications, >that could have been fixed for years, How were application developers supposed to know that their apps were broken? There have been no warning messages, neither from the kernel nor toolchains. As earlier, I do not believe that 32 bit non-LFS apps are "broken"; they just use a different slightly older ABI.
Could you consider adding the backward compatibility flag Trond added: http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-2.6.git;a=commit;h=f43bf0bebed7c33b698a8a25f95812f9e87c3843
That support is included in the proposed patch. *** This bug has been marked as a duplicate of 253589 ***
This has been addressed in http://rhn.redhat.com/errata/RHBA-2008-0314.html