+++ This bug was initially created as a clone of Bug #250609 +++ +++ This bug was initially created as a clone of Bug #250594 +++ Escalated to Bugzilla from IssueTracker -- Additional comment from tao on 2007-08-02 07:51 EST -- Description of problem: rpc.mountd crashes with particular netgroup setup How reproducible: [Customers Setup] We have a particular setup for nfs using netgroups on a large clustered NFS server with many home directories. Every user has a netgroup of its own and we use member netgroups to customize exports for each home directory to sets of machines. One particular research groups has machines where over 150 home directories and other NFS shares exported via netgroups are accessible. This crashes rpc.mountd. We have located the bug in function auth_authenticate_internal() in utils/mountd/auth.c in the source for the RHEL4 nfs-utils-1.0.6-80 RPM. Here are lines 77 to 84: <snip> char *n; my_client.m_naddr = 0; my_client.m_addrlist[0] = caller->sin_addr; n = client_compose(caller->sin_addr); *error = unknown_host; if (!n) return NULL; strcpy(my_client.m_hostname, *n?n:"DEFAULT"); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ </snip> The problem is that my_client.m_hostname is only 1024 bytes large, strcpy and not strncpy is used and that the string returned by client_compose() in our case is over 1024 characters long. The call to strcpy corrupts memory. The bug is still present in the upstream source at http://linux-nfs.org/ and thuss affects also RHEL5. This event sent from IssueTracker by sprabhu [SEG - Storage] issue 128322 -- Additional comment from jlayton on 2007-08-02 09:12 EST -- Fixing this will take more than just changing this to strncpy. We could (and maybe should) trivially do that, but we'll still have artificial limits due to the string length. The ideal thing would be to eliminate the need for this concatenated string entirely. Unfortunately, my last effort at this was shot down upstream by Neil B (for good reason -- it didn't work the way it needed to). I'll have look again at this code and see if we can do it in a way that takes care of Neil's objections.
Cloning for fedora to track upstream work...
Created attachment 160915 [details] patch 1 -- make m_hostname a dynamically allocated string This patch just makes it so that m_hostname is a dynamically allocated string, rather than a fixed length array of NFSCLNT_IDMAX.
Created attachment 160916 [details] patch 2 -- make e_hostname a dynamically allocated string This makes the exportent.e_hostname a dynamically allocated string. This needs to be done since we copy m_hostname into it in some places.
Created attachment 160917 [details] patch3 -- make a client_resolve function and have client_compose take a hostent This patch may not be strictly needed, but it should get rid of one set of hostname resolution from the auth_authenticate codepath. Rather than resolving the name in auth_authenticate and then doing it again in client_compose, we just use reuse the first one in client_compose.
These patches should get rid of the userspace limitations. It also adds a bit of optimization. The current code does a lot of string copying that can be avoided, and also should help eliminate some extra hostname resolution. We should be able to have arbitrarily long m_hostname strings with this. I believe, however, we have a limitation on how long this string can be when it's written to the kernel cache. So the big issue is what to do now: 1) hash these strings and use the hashes as cache keys, but that means pulling in a hashing routine (maybe libcrypt.a?). This is also doesn't deal with another problem which is that when using a lot of netgroups, you have to resolve them any that are listed in any export. 2) use Neil B's idea and use the ipaddr as a cache key. This means some duplication in the export cache, but maybe that's acceptable to cut down on the number of netgroup lookups. Still investigating...
Reproducer is apparently here. Though by inspection, I can see what the issue is I think... https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=160598
Created attachment 161172 [details] patch 4 -- add "use_ipaddr" command line option This patch adds a use_ipaddr command-line option and makes it so that ipaddr maps to ipaddr in auth.unix.ip cache when it's enabled. When we go to look up in the nfsd.export cache, we have an IP address and a path. mountd can then just compare the ip address/hostname to entries that match the path. In addition to working around the string length limitations of the kernel caches, this should prevent the (rather severe) performance penalty incurred when dealing with an export table that contains a large number of netgroups.
As a side note, NFSCLNT_IDMAX is set to 1024, but that seems to be a number that was grabbed out of thin air. The actual string length limitation seems to be ~945 or so (between 940 and 949, determined emprirically). I suspect that the total string length of a cache line maxes out at 1024). I'm not thrilled with using a command-line flag for this, but I don't see a great way to automatically switch between the two modes. I need to do some more testing and then I'll post it to nfs list as an RFC. Perhaps someone there will have a good way to switch between the two modes.
My testing hasn't shown any obvious problems. Sent patches to upstream list as an RFC to see if the general approach is valid. Awaiting feedback...
Created attachment 161874 [details] patch4 -- add new "mode" for mountd to change how ipaddrs are mapped Neil Brown commented that the patches looked good to him. My testing has been successful with them so far, so I've told him to go ahead and commit the first 3 patches as is. I've respun the 4th patch though to make it so that the "use_ipaddr" mode gets enabled automatically when the list of netgroups is more than 1/2 of NFSCLNT_IDMAX. This is the explanation I sent with the upstream patch: ----------[snip]----------- If the total length of all m_hostnames in the MCL_NETGROUP list is more than half NFSCLNT_IDMAX, then switch to the "use_ipaddr" mode. I considered doing what Neil suggested and just using all of the m_hostnames from all of the clientlists. Consider however a diskless client setup where someone has a ton of exports all exported to different IP addresses. We probably don't want to enable this there, but it would probably be enabled if we based this on all of the m_hostnames. I figure someone will want to enable this new mode for 2 reasons: 1) a client is likely to overflow NFSCLNT_IDMAX (which is a decent approximation of the max length of a line in the kernel cache) ...or... 2) we have a lot of netgroups in /etc/exports and resolving them all for a single mount is prohibitively expensive I think the only reasonable way to end up in situation #1 is if you are dealing with a lot of netgroups. A client isn't likely to match a bunch of different MCL_WILDCARD entries (though maybe it can), nor is it likely to match a bunch of different MCL_SUBNET entries. So it really only makes sense to pay attention to the MCL_NETGROUP list for this. Why NFSCLNT_IDMAX/2? No particular reason. I figured that would mean that we had a lot of netgroups, but still had enough space in the string to allow for non-netgroup matches. -------[snip]---------- Waiting for upstream comments at the moment. Ideally we'd like to make this not need a new command-line option at all. I think that would entail getting rid of the "old" behavior altogether, but Neilb seems to want to keep it since it does reduce the size of the kernel caches. So we may be stuck with a new command-line option unless he can be convinced.
New patchset posted upstream. Awaiting comments/commits...
Patchset is now upstream in nfs-utils git tree. There is still some ongoing discussion about whether we should also include a commandline option to turn the new behavior on and off, but I'm going to close this as UPSTREAM for now.