674314 – NFSD: memory corruption due to writing beyond the array

Bug 674314 - NFSD: memory corruption due to writing beyond the array

Summary: NFSD: memory corruption due to writing beyond the array

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	J. Bruce Fields
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	717937 717940 717941 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-02-01 12:45 UTC by Konstantin Khorenko
Modified:	2011-09-29 09:23 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-06-21 16:49:32 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
A patch to fix memory corruption due to writing beyond the array in NFSD. (886 bytes, patch) 2011-02-01 12:45 UTC, Konstantin Khorenko	no flags	Details \| Diff
View All

Description Konstantin Khorenko 2011-02-01 12:45:07 UTC

Created attachment 476384 [details]
A patch to fix memory corruption due to writing beyond the array in NFSD.

Parallels Virtuozzo Containers/OpenVZ linux kernel team found a memory corruption due to writing beyond the array in NFSD code.

kernel-2.6.18-238.1.1.el5 is affected.

./fs/nfsd/vfs.c:
 782 static inline struct raparms *
 783 nfsd_get_raparms(dev_t dev, ino_t ino)
 784 {
...
       // here we searched our file in a readahead cache
       // but not found it
...
 801         depth = nfsdstats.ra_size*11/10;
...
       // some stuff, but "depth" is not changed
...
 819         nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
...

And here we write to the 12th array element (nfsdstats.ra_size*11/10*10/nfsdstats.ra_size == 11), while ra_depth has only 11 elements:

struct nfsd_stats {
...
        unsigned int    ra_depth[11];   /* number of times ra entry was found that deep
                                         * in the cache (10percentiles). [10] = not found */
#ifdef CONFIG_NFSD_V4
        unsigned int    nfs4_opcount[LAST_NFS4_OP + 1]; /* count of individual nfsv4 operations */
#endif
};

This means that each time we did not find a file in a readahead cache we corrupt memory. Fortunately in a kernel with NFSDv4 compiled in it corrupts (inc) NFS4 ops counter, but in a kernel with NFSDv4 disabled it corrupts (inc) some other data, that lays in the memory beyond nfsdstats.

Proposed fix is attached.

Comment 2 J. Bruce Fields 2011-02-02 19:04:28 UTC

Yep, thanks, it appears the bug has been there since the nfsd code was originally added (in 2.3.99pre3-4!) all the way up to the latest upstream.

I believe the bug only triggers in configurations with a certain number of nfsd threads (multiples of 40 or any of the 7 numbers preceding the multiple; so 33 is the smallest problematic number, and no power of 2 is a problem).  But we encourage people to use something larger than the default, so people probably do hit it.  But the worst consequence for our kernels (since they have NFSv4 configured in) is an inaccurate v4 opcount.

Comment 3 J. Bruce Fields 2011-02-02 19:15:19 UTC

Oh, but actually that first opcount is meaningless anyway, since there is no NFSv4 op 0.

Comment 4 Konstantin Khorenko 2011-02-03 09:33:38 UTC

Yes, you are surely correct, and we faced this just because compiled some versions of PVC kernels with NFSv4 disabled, got corruptions caused by this bug and unfortunately for us there was a list next to this struct in memory => pointer got corrupted => we got a kernel panic.
So we had to find out the cause. :)

Thank you for handling this!

Comment 5 RHEL Program Management 2011-06-20 22:30:56 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 6 J. Bruce Fields 2011-06-21 16:49:32 UTC

This has been fixed upstream, and has a most extremely minor consequences for existing RHEL versions, so I think we'll leave this be.

Comment 7 Petr Matousek 2011-09-29 09:23:01 UTC

*** Bug 717937 has been marked as a duplicate of this bug. ***

Comment 8 Petr Matousek 2011-09-29 09:23:11 UTC

*** Bug 717940 has been marked as a duplicate of this bug. ***

Comment 9 Petr Matousek 2011-09-29 09:23:17 UTC

*** Bug 717941 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.