Created attachment 476384 [details] A patch to fix memory corruption due to writing beyond the array in NFSD. Parallels Virtuozzo Containers/OpenVZ linux kernel team found a memory corruption due to writing beyond the array in NFSD code. kernel-2.6.18-238.1.1.el5 is affected. ./fs/nfsd/vfs.c: 782 static inline struct raparms * 783 nfsd_get_raparms(dev_t dev, ino_t ino) 784 { ... // here we searched our file in a readahead cache // but not found it ... 801 depth = nfsdstats.ra_size*11/10; ... // some stuff, but "depth" is not changed ... 819 nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++; ... And here we write to the 12th array element (nfsdstats.ra_size*11/10*10/nfsdstats.ra_size == 11), while ra_depth has only 11 elements: struct nfsd_stats { ... unsigned int ra_depth[11]; /* number of times ra entry was found that deep * in the cache (10percentiles). [10] = not found */ #ifdef CONFIG_NFSD_V4 unsigned int nfs4_opcount[LAST_NFS4_OP + 1]; /* count of individual nfsv4 operations */ #endif }; This means that each time we did not find a file in a readahead cache we corrupt memory. Fortunately in a kernel with NFSDv4 compiled in it corrupts (inc) NFS4 ops counter, but in a kernel with NFSDv4 disabled it corrupts (inc) some other data, that lays in the memory beyond nfsdstats. Proposed fix is attached.
Yep, thanks, it appears the bug has been there since the nfsd code was originally added (in 2.3.99pre3-4!) all the way up to the latest upstream. I believe the bug only triggers in configurations with a certain number of nfsd threads (multiples of 40 or any of the 7 numbers preceding the multiple; so 33 is the smallest problematic number, and no power of 2 is a problem). But we encourage people to use something larger than the default, so people probably do hit it. But the worst consequence for our kernels (since they have NFSv4 configured in) is an inaccurate v4 opcount.
Oh, but actually that first opcount is meaningless anyway, since there is no NFSv4 op 0.
Yes, you are surely correct, and we faced this just because compiled some versions of PVC kernels with NFSv4 disabled, got corruptions caused by this bug and unfortunately for us there was a list next to this struct in memory => pointer got corrupted => we got a kernel panic. So we had to find out the cause. :) Thank you for handling this!
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
This has been fixed upstream, and has a most extremely minor consequences for existing RHEL versions, so I think we'll leave this be.
*** Bug 717937 has been marked as a duplicate of this bug. ***
*** Bug 717940 has been marked as a duplicate of this bug. ***
*** Bug 717941 has been marked as a duplicate of this bug. ***