Red Hat Bugzilla – Bug 118839
RPC: buffer allocation failures for NFS client
Last modified: 2007-11-30 17:07:00 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; Linux i686; U) Opera 7.22 [en]
Description of problem:
We have several systems experiencing this problem. It seems to
possibly be related to memory pressure. These systems are all
running Oracle version 188.8.131.52 but NFS is only used for home
The original symptom is that a system will hang when doing a simple
'ls -l' on any nfs mounted directory. A standard 'ls' and even other
normal file operations seems to work fine. You can even still umount
and mount the same NFS file system or other NFS systems, but the 'ls
-l' command will always hang. The only way to kill it is to do a
'kill -9' on the ls process and then it will hang in 'D' state then
you can 'kill -9 rpciod' and the proccess will release.
Whenever you get the hanging 'ls-l commands you also get entried like
RPC: buffer allocation failed for task da8fbca8
RPC: buffer allocation failed for task da8fbca8
RPC: buffer allocation failed for task dc00fca8
This is happening on a 4-way Dell 8450 w/8GB of RAM, a 4-way Dell
6450 w/4GB, and a 2-way 2550 w/4GB RAM. All of these systems are
under significant memory pressure but gernerally preform well. This
problem has appeared since the recent upgrade of these systems from
AS 2.1 to AS 3.
The NFS server is a Dell 2650 running ES 3.
Please ley me know what other information I can provide.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Mount filesystem vis NFS
2. Let system run for a while until memory pressure occurs
3. Preform 'ls -l' until it hangs
Actual Results: Hangs when trying to do 'ls -l' of NFS mounted
Expected Results: Should list directories as normal.
These system are connected to a Dell/EMC CX400 disk array
This issue occured again today on our production Oracle box. That
system can't seem to make more than 24 hours with a working NFS.
The hang today was worse than previous, pretty much any access to the
NFS share was completely broken. The 8450 with 8GB of RAM seems to
have much more problems that the other systems with only 4GB. Is this
likely to be a low-mem issue. Would it be possible that running the
hugemem kernel might be an improvement even though the system has on
8GB of RAM?
If it's a VM problem with allocating contiguous buffers, then yes
switching to the hugemem kernel may help since the amount of available
kernel memory will increase by about 400%.
Assigning to SteveD in case it's an NFS problem anyway ;)
Yow. This happened once, was hoping it was a fluke...but twice now.
I'm also seeing this on a lightly loaded backup server, that doesn't
do a ton of NFS:
|eowyn# cp -a web web2ls -l web web2
|-rw-r--r-- 1 brian support 2435809 Mar 18 18:28 apache_1.3.29.tar.gz
|-rw-r--r-- 1 brian support 4630171 Mar 22 20:32 php-4.3.4.tar.gz
|-rw------- 1 brian support 2435809 Mar 18 18:28 apache_1.3.29.tar.gz
This hangs, but is able to be interrupted. Now that the machine is
having problems, there's some interesting things. dmesg reports:
RPC: buffer allocation failed for task d1fe5ca
for each attempt (diff hex code). I can ls -R the whole dir, but try
one ls -l and then it hangs again. I can rm -r it as well.
I have a machine in this state now, which I could create a login
for a tech, I'll prob have to reboot the machine tomorrow before
an upgrade to RHE 3 on my other servers.
NFS server: Red Hat 9 2.4.20-30.9smp (upgrading tomorrow to AS)
Client: Single processor Xeon, 1GB RAM 2.4.21-9.0.1.ELsmp
I cannot recall if I was seeing this with the old kernel.
Ok. This is creepy:
cp -r web h
cp -rp web i
hangs. dmesg shows:
RPC: buffer allocation failed for task f2b51ca8
but the message only appears when you ^c the cp.
Last one, I promise. I was thinking kernel as well, but I note that
with the cp -a or -rp commands I have done, the owner and perms have
not been changed when I interrupt them.
So I tried some chown, chown -R, chmod, and chmod -R. All worked
fine, verifying with ls -l on another system.
I did foreach to the stat command with every individual format option.
It did not hang on any of them.
I have no idea what this means, but I d/l and compile the fileutils
4.1 package from gnu.org, and bingo, ls -l and cp -a work. (I did
have to configure on another RHE3 AS system, due to config.status
hanging on creating the makefiles. But the compile ran on the
That's the limit of my mojo tonigh.
I suspect you'll be able to find more info in /proc/slabinfo ;)
In particular, after the first copy the dentry and inode caches should
still be at a more or less reasonable size, but after the second copy
they are probably really really big.
I have a suspicion on what your problem could be: if there is enough
memory free, we don't reclaim slab cache memory to fulfill higher
order memory allocations, but only normal user/cache pages.
If the slab cache is really big, maybe we need to reclaim slab and
buffer headers from the defragmentation routine that's used when
higher order memory allocations can't be immediately satisfied...
Larry, does this make sense ?
Created attachment 98780 [details]
FYI: Attached is the /proc/slabinfo output of the affected machine, before and
during an ls -l.
This may help, an strace on ls -l to_do and cp -p to_do a both hang at:
and print <unfinished> when ^c
Hi, the tech at redhat directed me to this bug report, which appears
to be the exact same behavior that I'm seeing; namely, 'ls' works fine
on my nfs-mounted home directories, but 'ls -l' just hangs, with a
corresponding "kernel: RPC: buffer allocation failed for task d4dc3ca8".
'ls -l' seems to be working fine on local directories. 'cp -rp'
exhibits the same hanging behavior that Brian mentioned, with that RPC
error popping up when you eventually ctrl-c it.
NFS server is RHEL ES 3.0, Dell PE4600 with 4GB ram and HT on.
NFS client is also RHEL ES 3.0, single-cpu Dell dimension with 1GB
ram, and kernel 2.4.21-9.0.1.ELsmp. I'm trying to update the kernel
on the nfs client box to see if that will fix anything.
I upgraded to 2.4.21-9.0.3.ELsmp and this is still happening, after
about 5 days uptime. It doesn't seem to happen with a non-smp kernel,
as I rand one of those 20 days w/o problem.
After upgrading to 2.4.21-14.ELsmp (from the Rhel 3.0 beta channel), I
haven't yet seen the problem, but this is my secondary server and
isn't highly stressed. I'll keep an eye on it, especially if this is
something that disappears for a while after rebooting (such as for
upgrading the kernel). If it does work, that kernel is going on my
I also haven't been able to reproduce this problem since upgrading
all but two of my systems that were having this problem to 2.4.21-12.
ELsmp several weeks ago, even on the servers that regularly showed
this problem in only a few hours with 2.4.21-9.0.1.
I'll give that a shot, then!
Unfortunately, vmlinuz-2.4.21-14.ELsmp, produced the error afer about
5 days. Back to 2.4.21-9.0.3.EL.img.
2.4.21-15.ELsmp is producing the error after about five days, now not
just on my back server, but the web server as well. Marvelous.
On the backup server if I run the non-smp version the behavior does
not manifest (9 days and counting). Both of these machines are dual
xeon motherboards. Backup server has 1 hyperthreading processor, web
has two non-HT xeons. Another server I have on the same MB as the
backup server has two HT xeons in it, and has not shown the problem.
Why does this seems to happen with two procs on the smp kernel?
Any way to turn off acls on nfs shares?
According to "man exports" you can add "no_acl" to the export line to
mask off acl permissions on nfs shares.
Also, I think I've discovered that my problem went away when I
actually mounted up the filesystem that I was exporting with the
"acl" option. In other words I was exporting the directory
/mnt/misfiles which was a standard ext3 filesystem but was not
mounted with acl support. Other systems accessing this filesystem
via NFS would generate the RPC errors on occassion. However, once I
modified the /mnt/misfiles ext3 filesystem on the nfs server to
enable acl support the problem seems to go away on the clients.
I haven't seen this issue in months on my machines. Maybe just luck,
but it was happening regularly before I made the changes to enable
ACL's on the underlying filesystems that were being exported.
I'm having this issue too. The box is a development server that
supports a few developers and serves up an SVN repository and some
development web pages/cgis. Our home directories are NFS mounted
from a Solaris box via the automounter, which in turn reads its
configuration from an NIS domain hosted by a different Solaris box.
Every few weeks, I'll run an 'ls -al' command and it will hang.
Nothing will dump to the syslog until you try to kill the process,
at which point the syslog is spammed with the following:
Jun 3 13:54:34 roclindev1 kernel: RPC: buffer allocation failed for
Jun 3 13:54:54 roclindev1 last message repeated 331 times
Once a particular path is hung, any attempt to access it will hang,
even a plain ls. However, other paths on the same NFS mount will be
fine. In addition, attempting an strace on the hung process will
result in a hung strace.
I swear I was able to recover one time through a combination of
'kill -9 pid', 'umount -f /affected/mount', 'umount
-l /affected/mount', 'mount /affected/mount', and
'/etc/init.d/autofs restart'. But I haven't been able to do that
since, I don't waste my time, I just reboot the box. I've spent a
few hours researching the problem on the web and to the best of my
knowledge but I've never been very good at NFS troubleshooting. If
I can provide any more information, please let me know.
Sadly, adding acl to the mounts on the servers didn't fix the problem,
and combinations of no_acl don't seem to matter. Thus, maybe it's a
kernel memory issue.
Before 2.4.21-15.ELsmp I was seeing this only on a machine with an
Intel SE7501BR2 7501 chipset motherboad, now I'm seeing it on Tyan
S2720GN 7500 chipset boards.
Interestly, 2.4.21-15.ELsmp on the initial machine the problem seems
to come and go now. ls -l hangs, and then I ^C and sometime later for
some unknown reason it works.
Some more data:
Happened again just now. I was doing an 'mv -f ' on a large (581MB)
backup file to an NFS mount, noticed it was taking longer than
necessary. Here's exactly the steps I followed:
1) log into linux box
2) ls on backup directory -- this worked fine
3) ls -al on backup directory -- hang
4) log in again
5) ls on backup directory -- hang
I'll see if I can reproduce tonight once no one needs the box any
longer -- this is the quickest the hang has happened to me after a
reboot. Maybe the large file triggers something?
i see exactly the same problem. 2.4.21-15ELsmp on both client and
server. it's extremely annoying. it also only seems to do it on 'high
load' clients. on one client ls -l works fine on a directory. on
another which has heavy network (nfs) load it doesn't.
some additional information - we do a reasonable amount of NFS.
approximately 3-4Tbyte/day over nfs (the joys of a popular internet
archive). i have the following in my sysctl.conf (if it makes any
# Performance tuning to increase fragment buffer memory
#net.ipv4.ipfrag_high_thresh = 4194304
#net.ipv4.ipfrag_low_thresh = 1048576
# Performance tuning to increase tcp read/write buffers
#net.ipv4.tcp_rmem = 4096 349520 1048576
#net.ipv4.tcp_wmem = 4096 131072 1048576
this bug is very reproducible and affects every nfs client we have
that does any kind of load.
Created attachment 101372 [details]
slabinfo from nfs client which locks up with ls -l
this is from an nfs client running 2.4.15ELsmp.
with some further testing the bug is strange.
ftpd isn't affected (proftpd internal ls doesn't appear to do an ls -l)
httpd (apache 1.3.29) isn't affected. it seems able to browse directory
trees without issues
rsyncd (2.6.2) isn't affected. i can rsync list in long form
directories remotely (rsyncd) as well as over ssh.
so really i can only reproduce this by using the command line, i.e going
to an NFS client and trying to ls -l in any nfs mounted directory.
this is particularly annoying for home directories of course..
other info - all the clients are mounted RO. except for one which is
RW (and interestingly it doesn't have the ls problems). i suspect
this is because it doesn't do that much reading across nfs, just
relatively small (10-20G/day) writes (compared with 2000-3000G/day
reads on clients).
I have noticed the same problem.
The server is an L200 with 2GB RAM and the client is an N400 with 4GB
RAM. We are running RHEL 3 Update 2 (2.4.21-15.0.2.ELsmp) on both systems.
I have noticed that running (on the client) a program that
allocates/touches/frees a considerable amount of memory (1GB for
example), and hence decreasing buffers & cached, makes "ls -l" to work
again during some time. Maybe this helps to diagnose the problem, I
can attach some slabinfo if needed.
Has anyone opened an official support case with Redhat on this issue?
This seems to be a pretty big issue which is starting to show up for
quite a few users, perhaps opening an official support case would get
this bug some attention.
I would do it but, even though I originally reported this bug, I'm
actually no longer having the issue so I wouldn't be a good choice to
pursue a support case.
The issue has been opened with Red Hat support.
I've taken a brief look through the code. It looks to me that the
problem is based around two things:
1) The fact that RPC in Linux reserves response buffers before the RPC
call is made
2) The size of the buffer that Linux has to reserve for getacl requests.
Since we reserve response buffers before we make an RPC call, and
since we have no way of knowing the maximum size of a response for a
getacl call (which is required for getxattr requests on systems that
support ACL's), we have to assume a worst case scenario. In the case
of the getacl call, this requires quite a bit of space (since 1024 ACE
objects are supported per response). This means that we need to make
at least an order 2 allocation for every getacl request sent. I would
imagine that systems under a high level of memory pressure with
significant fragmentation would not be able to satisfy this request.
I believe the rpc code, in an effort to prevent all NFS action from
blocking, simply fails this request, rather than putting the calling
process to sleep.
As a side note, it may be helpful, if you find yourself in this
situation, to mount your NFS shares with an rsize and a wsize of 4096.
This will decrease your performance somwhat, but doing so should
alieve some of the systems demands for order 2 allocations and may
prevent these errors from occuring.
Just a ditto of the above. I'm seeing the same problems on our Oracle
server doing a trivial amount of NFS per day (<5GB). Runnin kernel
2.4.21-9.0.1.ELsmp, tried reducing rsize and wsize to 4096 and still
had the problem. Only 'ls -l' appears to be affected.
We are running 15.0.3.ELsmp with NFS on a NAS server from Dell running
Win2k3 and we suffer the same symptoms. The ls lookups happen irregularly.
We are going to use another distro to check it is a RHEL problem or what?
So far the only thing the new kernels have done for me is:
1) make more systems experience the bug
2) make the interval of useful uptime less
Has anyone seen this with a non-smp kernel?
I have seen this issue on non-smp, but only on one server. Once
again it was a server that was under significant memory pressure.
Actaully, adding more memory seemed to resolve the issue.
Another thing I have noted when trying to figure out why I no longer
see this problem while others still do, I remember that I made a
change to the inactive_clean_percent to return it to the behavior of
2.4.21-4.EL which I think was the original kernel. I did this to
resolve another issue where the kernel goes way too far into swap
instead of reclaiming memory, however, maybe the extra memory reclaim
gives NFS a better chance of completing it's order 2 allocations. I
added the following to my sysctl.conf to make this change:
vm.inactive_clean_percent = 100
It probably doesn't have anything to do with it, but, since I haven't
seen this issue in months, and this is one of the changes I made
during those months, I thought I'd mention it.
You probably shouldn't set your inactive_clean_percent to 100. That
implies that anytime you have dirty memory pages, the VM is going to
try to clean them. Thats alot of unneeded overhead to put on your VM,
and might in fact make your system quite unresponsive. It would of
course help optimize the number of order 2 allocations you had
available, but at a pretty terrible cost.
Actually, as far as I can tell, in any system under memory pressure
from applications doing lots of IO, not having this set causes the
system load to be much higher because the system fails to reclaim
memory agressively enough to avoid it from going heavy into swap, and
using swap is a much higher cost that using CPU cycles to reclaim
In my experience, and many others if you do a search on Bugzilla,or
the Taroon or Oracle mailing lists, this setting actually has a
positive effect. I have one server here where it is VERY noticable.
Still, you don't have to set it to 100 (even though that was the
behaviour of the original RHEL 3 kernel) you can set it to 75 or 50
and see if it helps. I'm just trying to offer some suggestions since
so many people are having this issue and there doesn't seem to be any
progress on fixing it. If this setting kills performance then they
can change it back, but on the 15 servers I have the performance
imapact is negligible at best, and on a few memory constrained
servers it actually improved performance by making the system use
less swap. Even if it has a small impact on performance, some people
may be willing to trade a few percent of their CPU cycles for a
working NFS client (assuming that actually helps with the problem).
Neil, while setting inactive_clean_percent to 100 could result in bad
disk IO when doing swapping, it should be pretty harmless when doing
mmap()d IO over NFS.
In fact, starting the write out earlier should guarantee that there is
more free memory available - making it easier to allocate the RPC
buffers and reducing the chance of an allocation failure.
This is a special case, though. For normal disk bound systems it's
probably best to leave inactive_clean_percent at a lower value...
I've set it up on my backup server (performance won't matter a whole lot), to try to see if it
hits the problem. I'll report back in 5 days or so when it would be sure to do it by then.
So what is the value set at in the current kernels, and what might be a good value to avoid
this problem if it works?
believe the default on the -15 kernels is 30
We are running RedHat Enterprise Linux ES release 3 with linux kernel
2.4.21-9.ELsmp and once every day or 2 days the 'mv' command hangs
when moving a file from local directory to a NFS mounted directory.
This is accompanied by repeated errors in messages file of
kernel: RPC: buffer allocation failed for task e0c43cb4 for different taks
The only fix we have found is to reboot.
Running kernel-2.4.21-15.0.3.EL and 2.4.21-15.0.2.ELsmp
I am getting this exact same issue with Dell 2650s. It is really
really annoying and making the machines virtually unuseable at times.
The machines in question are under fairly heavy cpu loads from
statistical analysis programs but not heavy nfs load.
I can't reboot without the stats jobs failing so this isn't an option
as they run for weeks at a time.
Things that hang in these circumstances include vi, vim, rm, ls.
It's also happened on machines that are not heavily loaded.
I never got this problem with redhat 9.
If anyone got a fix from Red Hat support could they please mail it to
me urgently? Cheers
FYI: Though it seemed to last a day longer, the backup server with
vm.inactive_clean_percent = 100 did experience the problem again. :(
same problem here. 2GB Memory with kernel-smp-2.4.21-15.0.3.EL.
Typical paging and disk I/O info are following,
11:30:03 AM pgpgin/s pgpgout/s activepg inadtypg inaclnpg inatarpg
11:40:03 AM 242.07 1659.09 235921 195099 7315 97402
11:50:01 AM 5.45 759.29 235434 196169 8065 98205
12:00:00 PM 0.52 1522.38 213007 213161 7239 98026
12:10:00 PM 215.22 1449.89 236690 196093 8192 98322
12:20:00 PM 413.29 1162.54 240586 192554 7018 98195
12:30:04 PM 28.32 1307.60 214242 210747 7385 97660
12:40:00 PM 424.62 1443.24 241216 192619 7463 98339
12:50:00 PM 12.09 928.98 228875 201079 7220 98179
01:00:01 PM 132.47 1861.38 221040 212119 4645 96681
01:10:03 PM 386.30 1753.47 247961 179431 6705 94704
01:20:03 PM 48.78 790.09 245355 173183 6429 92946
01:30:00 PM 99.52 1086.99 251283 180604 21462 95238
11:30:03 AM DEV tps sect/s
11:40:03 AM dev8-0 134.28 3802.32
11:50:01 AM dev8-0 58.85 1529.47
12:00:00 PM dev8-0 120.93 3045.80
12:10:00 PM dev8-0 120.58 3330.22
12:20:00 PM dev8-0 127.76 3151.68
12:30:04 PM dev8-0 102.59 2671.85
12:40:00 PM dev8-0 134.42 3735.71
12:50:00 PM dev8-0 65.35 1882.15
01:00:01 PM dev8-0 166.50 3987.71
01:10:03 PM dev8-0 159.99 4279.54
01:20:03 PM dev8-0 70.15 1677.75
01:30:00 PM dev8-0 98.44 2373.01
It happened again in less than 24 hours.
13:20:36 up 23:15, 5 users, load average: 5.04, 5.00, 3.98
119 processes: 118 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.4% 0.0% 1.4% 0.0% 0.0% 0.0% 98.0%
cpu00 0.9% 0.0% 1.9% 0.0% 0.0% 0.0% 97.0%
cpu01 0.0% 0.0% 0.9% 0.0% 0.0% 0.0% 99.0%
Mem: 2061636k av, 1934868k used, 126768k free, 0k shrd,
1046136k actv, 469100k in_d, 28304k in_c
Swap: 2096440k av, 2240k used, 2094200k free
[root@log]# cat /proc/slabinfo
slabinfo - version: 1.1 (SMP)
kmem_cache 96 96 244 6 6 1 : 1008 252
nfs_write_data 40 40 384 4 4 1 : 496 124
nfs_read_data 360 360 384 36 36 1 : 496 124
nfs_page 1020 1020 128 34 34 1 : 1008 252
ip_fib_hash 16 224 32 2 2 1 : 1008 252
ip_conntrack 260 260 384 26 26 1 : 496 124
urb_priv 0 0 64 0 0 1 : 1008 252
ext3_xattr 0 0 44 0 0 1 : 1008 252
journal_head 1241 13321 48 45 173 1 : 1008 252
revoke_table 5 250 12 1 1 1 : 1008 252
revoke_record 224 224 32 2 2 1 : 1008 252
clip_arp_cache 0 0 256 0 0 1 : 1008 252
ip_mrt_cache 0 0 128 0 0 1 : 1008 252
tcp_tw_bucket 23 30 128 1 1 1 : 1008 252
tcp_bind_bucket 224 224 32 2 2 1 : 1008 252
tcp_open_request 30 30 128 1 1 1 : 1008 252
inet_peer_cache 3 116 64 2 2 1 : 1008 252
secpath_cache 0 0 128 0 0 1 : 1008 252
xfrm_dst_cache 0 0 256 0 0 1 : 1008 252
ip_dst_cache 225 225 256 15 15 1 : 1008 252
arp_cache 30 30 256 2 2 1 : 1008 252
flow_cache 0 0 128 0 0 1 : 1008 252
blkdev_requests 3072 3360 128 112 112 1 : 1008 252
kioctx 0 0 128 0 0 1 : 1008 252
kiocb 0 0 128 0 0 1 : 1008 252
dnotify_cache 0 0 20 0 0 1 : 1008 252
file_lock_cache 120 120 96 3 3 1 : 1008 252
async_poll_table 0 0 140 0 0 1 : 1008 252
fasync_cache 0 0 16 0 0 1 : 1008 252
uid_cache 224 224 32 2 2 1 : 1008 252
skbuff_head_cache 1495 1495 168 65 65 1 : 1008 252
sock 290 290 1408 58 58 2 : 240 60
sigqueue 261 261 132 9 9 1 : 1008 252
kiobuf 0 0 128 0 0 1 : 1008 252
cdev_cache 269 290 64 5 5 1 : 1008 252
bdev_cache 7 116 64 2 2 1 : 1008 252
mnt_cache 27 116 64 2 2 1 : 1008 252
inode_cache 3205 4823 512 689 689 1 : 496 124
dentry_cache 1469 2550 128 85 85 1 : 1008 252
dquot 0 0 128 0 0 1 : 1008 252
filp 1453 1470 128 49 49 1 : 1008 252
names_cache 44 44 4096 44 44 1 : 240 60
buffer_head 254395 273980 108 7827 7828 1 : 1008 252
mm_struct 250 250 384 25 25 1 : 496 124
vm_area_struct 4536 4536 68 81 81 1 : 1008 252
fs_cache 406 406 64 7 7 1 : 1008 252
files_cache 210 210 512 30 30 1 : 496 124
signal_cache 580 580 64 10 10 1 : 1008 252
sighand_cache 178 180 1408 36 36 2 : 240 60
pte_chain 5494 12150 128 284 405 1 : 1008 252
pae_pgd 406 406 64 7 7 1 : 1008 252
size-131072(DMA) 0 0 131072 0 0 32 : 0 0
size-131072 0 0 131072 0 0 32 : 0 0
size-65536(DMA) 0 0 65536 0 0 16 : 0 0
size-65536 0 0 65536 0 0 16 : 0 0
size-32768(DMA) 0 0 32768 0 0 8 : 0 0
size-32768 0 0 32768 0 0 8 : 0 0
size-16384(DMA) 1 1 16384 1 1 4 : 0 0
size-16384 22 23 16384 22 23 4 : 0 0
size-8192(DMA) 0 0 8192 0 0 2 : 0 0
size-8192 6 8 8192 6 8 2 : 0 0
size-4096(DMA) 0 0 4096 0 0 1 : 240 60
size-4096 662 722 4096 662 722 1 : 240 60
size-2048(DMA) 0 0 2048 0 0 1 : 240 60
size-2048 350 350 2048 175 175 1 : 240 60
size-1024(DMA) 0 0 1024 0 0 1 : 496 124
size-1024 100 100 1024 25 25 1 : 496 124
size-512(DMA) 0 0 512 0 0 1 : 496 124
size-512 576 576 512 72 72 1 : 496 124
size-256(DMA) 0 0 256 0 0 1 : 1008 252
size-256 1080 1080 256 72 72 1 : 1008 252
size-128(DMA) 1 30 128 1 1 1 : 1008 252
size-128 2277 2400 128 80 80 1 : 1008 252
size-64(DMA) 0 0 128 0 0 1 : 1008 252
size-64 660 660 128 22 22 1 : 1008 252
size-32(DMA) 17 58 64 1 1 1 : 1008 252
size-32 883 1044 64 18 18 1 : 1008 252
vm.inactive_clean_percent = 100 does not help
We are running RedHat Enterprise Linux ES release 3 with linux kernel
2.4.21-9.ELsmp and once every day or 2 days the 'mv' command hangs
when moving a file from local directory to a NFS mounted directory.
Currently, we just had a vi session hang accompanied by errors in the
RPC: buffer allocation failed for task
The hanging up of the nfs mounted filesystem is (ae)ffecting our
production. Can someone give me a time frame for when this bug may be
fixed? That way, we can decide whether its worth the effort to find a
Would replacing the nfs system that comes with RedHat Enterprise ES
with the nfs system that comes with Red Hat Enterprise 2.1 avoid the bug?
Thats a pretty tall order. Have you all tried using the hugemem
kernel? Its not exactly a fix for the problem, but it will certainly
avoid the problem by poentailly quadrupling the amount of lowmem that
you have to work with (assuming that you have 4GB of ram in these
> Have you tried the hugemem kernel? ...
> assuming that you have 4GB...
What happens if one installs hugemem on a box with less than 4GB?
Would that be bad?
Nope, nothing wrong with it, but it won't maximize the advantage that
the hugemem configuration provides. Non-hugemem kernels have 1GB of
kernel address space (lowmem), while hugemem kernels have 4GB. So if
you have less than 1GB of total RAM, hugemem isn't helpful. 1GB to
4GB ram provides a scaled advantage, by which I mean a system with 2GB
of RAM will have 2GB of lowmem, a system with 3GB of RAM has three 3GB
of lowmem, up to 4GB, after which you're back into adding highmem.
Since RPC allocates kernel memory for response buffers, it uses
lowmem. More lowmem means more memory for RPC to allocate if it needs it.
The above solution:
you can 'kill -9 rpciod' and the proccess will release
Is true, but the system is hosed. I cannot further nfs mount anything.
Not a Dell box, not running Oracle... just RHEL3 with the -15 kernel.
The client is mounting NFS partitions that are GFS file systems on the
Yeah, Don't do that. Killing your rpc/nfs tasks won't lead to
anything good. This is a memory problem. The best solution right now
that I can think of is moving to the hugemem kernel (at least for
those systems that have > 1GB of RAM).
Moving to hugemem only delays the inevitable.
Can someone get "AltSysrq M" outputs for both smp and hugemem kernels
when this problem is happening so we can figure out if this is a
memory fragmentation issue or lowmem exhaustion issue?
Thanks, Larry Woodman
i think we need to backprot the mempool_alloc infratructure from 2.6,
that should solve this issue
I just got the error message in a machine (dual Xeon 2GB) that was moved recently to
RHEL3 from 7.3. It managed to survive without a reboot for about a year and about 20
users running remote X sessions (Exceed) and heavy computational problems (often
needing more than 1GB of memory). After the rebuild it was only used by one user for
about two weeks before nfs failed.
Until the problem is fixed (backporting mempool_alloc, whatever) what other options are
there any other options except running the hugemem kernel ?
Will it help to lower vm.max_map_count? It is listed as a possible solution when you run
out of lowmem pages in a RedHat document. Is there a way to check how many VMAs are
used by each process to check if it's going to cause problems ?
With Hugemem kernel, our system still behaves normal after 1 day (24
hours). It acts up in one night with other kernels.
Thats going to be the best solution, at least for now. If you
absolutely can't move to hugemem. Alternatively, it may help to
reduce the NFS_ACL_MAX_ENTIRES definition from its current setting of
1024 to something smaller. At a value of 1024 it implies an order 2
allocation (~12KB) for each NFS_ACL request, at 512 it requires an
order 1 allocation, and at 256, it requires an order 0 allocation.
This should help alieviate the need for large contiguous buffer
allocation in the RPC layer for NFS_ACL requests
Created attachment 102284 [details]
patch to reduce memory requirements for NFS_ACL responses
If someone wants to give it a try, heres a patch I think might help. I haven't
been able to determine if reducing the number of ACE objects we support in a
single response message violates any germaine standards or RFC's, but It seems
at the least it should relieve some of the memory demand the NFS_ACL has.
Hugemem is an ok idea, but since this problem appears mainly on
clients, most of mine have 1GB of memory, so it's probably not going
to help small clients.
>Can someone get "AltSysrq M" outputs for both smp and hugemem kernels
I can do it for smp, how exactly does one do this?
>patch to reduce memory requirements for NFS_ACL responses
Sure, I'll attempt a new kernel on my backup server.
Would lowering NFS_ACL_MAX_ENTIRES approximate the "noacl" client
mount option and cause the NFS servers to thrash (and slow NFS file
>I can do it for smp, how exactly does one do this?
echo 1 > /proc/sys/kernel/sysrq
>Would lowering NFS_ACL_MAX_ENTIRES approximate the "noacl"
Performance will probably be degraded slightly, although I can't put a
number on how much. Certainly it will be better than the performance
you get when you receive the out of memory errors documented above.
Created attachment 102301 [details]
sysrq m output
Okay here's the sysrq m output from two machines with similar setups. One that
is currently experiencing the bug, and another that will in the future.
They are both running 2.4.21-15.0.3.ELsmp. I don't have any hugemem kernels
running at the moment.
*** Bug 127830 has been marked as a duplicate of this bug. ***
My system is still functioning normal with hugemem kernel after 5
days. The physical memory size is 2GB.
thats good to hear. any results yet from the patch I posted?
Neil, Haven't chance to test your one line patch yet. I use ICP RAID
controller and the driver is supported under unsupported module. The
config file for unsupported couldn't be found in src package. Do you
know where could find/download the config file for unsupported hugemem
in the configs directory the hugmem kernel is the config that you want.
I just rebooted my back and web servers with the NFS_ACL_MAX_ENTIRES
from 1024 -> 256 nfs3.h change.
I should know in a few days if it extends the lifetime.
My system got kernel panic after 6 days with standard hugemem kernel.
The last two lines displayed on console are,
Code: 8b 81 84 00 00 00 42 39 41 70 89 d9 0f 43 54 24 10 81 e1 00
Kernel Panic: Fatal Exception
Couldn't get sysrq+m output...
NFS_ACL_MAX_ENTIRES from 1024 to 256 looks having positive impact. One
of my system couldn't live longer than 1 day with default kernel, but
now it is still behaving normal after 2 days with Neil's patch.
OK, this issue just hit me again for the first time in months. What's
the current consensus on the best solution? It seems the options for
1. Run hugemem -- live with the performance drop that comes with this.
2. Try NFS patch and hope it helps.
The patch seems more attractive to me. Do we know if it's really
helping to workaround this issue?
We've been seeing positive results from people who've applied the patch.
I second that, I'm about 2 days past where I would normally ls -l hang
by changing to NFS_ACL_MAX_ENTIRES = 256.
OK, I compiled a kernel with the patch and decided to try something a
little wild. I had several systems with the NFS hang, I couldn't even
copy a file from NFS. I really needed NFS on these systems to work,
but they also run production critical apps that I couldn't really reboot.
Since NFS client support is compiled as a module I thought I could
unload the nfs.o module and replace it on the fly with a binary
compatible nfs.o module that includes the patch. So I compiled an
nfs.o module for 15.0.3-ELsmp and 15.0.4-ELsmp that I could drop in to
the module directory to replace the delivered nfs.o module from
Redhat. After testing on a non-production system I unmounted my NFS
mounts, did a 'rmmod nfs', copied my new custom nfs.o over the
existing one in the modules directory, ran depmod and then remounted
my NFS exports. Sure enough after this NFS would work fine. I could
even swap back and forth between the Redhat nfs.o and the custom nfs.o
and switch between working and non-working NFS, so this patch
definately seems to help. Basically I can fix my critical systems on
the fly by simply replacing this one module.
My new question is, does this have any other side effects and does
Redhat consider this an official fix? Certainly an official fix is
No this is not an official fix. Ideally we would like to fix the
problem without needing to reduce the number of ACE objects supported
in a single rpc response. I expect we'll post the alternative here
just as soon as its ready.
Tom's nfs module switch looks a good procedure to fix nfs on fly. My
system is still good after 6 days with Neil's patch. Thanks.
Created attachment 102796 [details]
patch to make NFS3_ACL_MAX_ENTRIES configurable
This is a variation on my previous patch, which makes the number of ACE objects
supported by the nfs client configurable. Could someone test this please and
confirm that it works as well as the previous patch? Thanks!
Created attachment 102803 [details]
follow on patch to add same functionality to nfsd module
This patch adds on the same acl ace entry module option to the nfsd module for
those who are interested.
I've installed a kernel built with Neil's latest patch. It seems to
be working, but I'll have to use the new kernel for about a week to be
really sure that this bug is fixed (since it usually takes a few days
to show up after a reboot).
I'm having exactly the same problem on my web server EL 3 clients. The
NFS server is running redhat Linux 9.
I have 2 identical web servers mounted to a common nfs server. The ll
lockup condition can occur on one client while it is working correctly
on the other. The problem is fixable only by rebooting either the nfs
server or the troubled client.
The problem didn't appear until I upgrade the web servers from RH 7.3.
Now we're sitting on a time bomb. I've set the rsize and wsize to
4,096. I don't have the option of setting the no_acl in my nfs
mounts. They are not recognized by the nfs clients in my install of EL
3, yet the documentation says it should.
I'm running kernel 2.4.21-15.0.3.ELsmp on the clients. Linux version
2.4.20-8smp on the server.
Don't really want to go and patch the source for the webservers. Will
probably have to uninstall EL 3 and rebuild the servers with RH 9.
I don't have anyway of predicting the problem. They have been trouble
free at times anywhere from 2 weeks to just a few hours. These are
relatively low bandwidth web servers.
Thomas, how has the patch been running for you this week?
I've had no problems since I rebooted to the new kernel six days ago.
Note that the machine where this problem occurrs hosts a cluster of
about 130 nodes. I added 256 more nodes to the cluster, and the
problem went from about a two week cycle to a two day cycle.
What would change (more than linearly) with the additional nodes is
the amount of ssh and rsh'ing occurring from the host. At least 4k of
each per hour, estimated.
There would also be NIS pressure, but the host load got so high on the
host (i.e. whenever 1000 processes would start simultaneously on the
nodes), that I turned NIS off in favor of local files... and the "RPC:
buffer allocation failed" still occurred.
Note that I turned off attribute caching alltogether on the host
client mounts (where this host is an NFS client), using the NFS "noac"
option, and that did not help.
NIS and NFS are the only two things in your comments that would have
made any effect on the situation, since those are the only two
subsystems that you mentioned which use RPC. Did you by any chance
try the latest patch that I have attached that allows you to reduce
the number of ACE objects that the nfs clients supports?
Created attachment 103306 [details]
enhancement on prior patch to display module acl option via proc file
same patch as before (combined nfs/nfsd changes into one patch) and exported
nfsd_acl_max_entries as a read only proc file
Created attachment 103367 [details]
new patch to add nfs_acl_max_entries module option to nfs.o
this patch (split out from the last larger patch adds the aforementioned
nfs_acl_max_entries modules option to nfs.o
Created attachment 103368 [details]
follow-on patch to add same functionality to nfsd.o
This patch builds on the last patch, adding the same module parameter
functionality to nfsd.o, and includes the addition of a sysctl to report the
Created attachment 103553 [details]
same patch with added nfs sysctl
same patch as before, but adds nfs sysctl and is diffed against latest kernel
Created attachment 103555 [details]
follow on nfsd patch for new kernel
same as last nfsd patch, but diffed against new kernel.
Customer, finally got to installing the kernel generated
(kernel-2.4.21-20.EL.RPCTEST.i686.rpm). The machine ran successfully
for 10 hours then locked up. No response from console or
anywhere. Required a power off.
I'm seeing this hard lockup on multiple systems since upgrading to
2.4.21-20.EL, systems both with and without the patch, everything from
an 8-way Dell 8450 with 12GB of RAM and a Qlogic 2300 FC adapter to my
700Mhz 1-CPU Dell PowerApp.web 100 with standard IDE drives. Have
also seen the lockup on a IBM HS20 Dual 3Ghz system. I'm falling back
to 15.0.4 on everything but my test server until I get a better feel
for 20, but right now I think 20 is buggy.
The last two comments regarding lockups sound like they are related to
a different prloblem than the one being addressed here. I'd open a
separate bugzilla. Just out of curiosity though, are the affected
machines all running with more than 4GB of RAM? If so, try booting
with less than 4GB of RAM (you can use the mem=4G kernel command line
option). If that relieves the lock up it might give us a clue as to
where the problem lies.
I agree they are likely a different problem. I'm going to apply the
existing patch for 20 to my 15.0.4 kernels and see how that goes.
I'm still researching existing bugzilla entries on the hang to see if
they match my problem. One of the systems that hard locked has only
512MB of RAM, but I think it may have been bit by the Bug 132547. The
other two systems have 8GB and 12GB respectively, but booting with
less memory is not really possible on these systems as they run large,
active Oracle databases.
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.7.EL).
One of my systems just got hit by the bug again under
kernel-smp-2.4.21-20.9.EL and fs.nfs.nfs3_acl_max_entries set to 256
after 22 days of uptime :(
can you get a sysrq-m off the system and post it here during the failure?
Regarding the latest set of patches (comment #96 and comment #97),
exactly what *is* the nfs module parameter name/syntax, and how to use
it in practice? Put something in /etc/modules.conf?
//etc/sysctl.conf? It's not entirely clear to me.
there are two module options, one for the client and one for the
server (nfs3_acl_max_entries and nfsd3_acl_max_entries). You specify
them with an options directive in modules.conf. Both parameters take
integer values, and allow you to specify the number of acl entries
listed per NFSACL transaction. By specifying a lower value, you save
memory when the RPC subsystem allocates buffers to store the
transaction response, thereby avoiding buffer allocation failures from
this particular problem. Comment 61 in this bugzilla provides
interesting values for these options that correlate to allocation size
In regards to comment #105, which patch was actually ported to U4, so
that those of us affected can patch now?
The last two on the attachement list below (the only two patches that
are not obsoleted), dated:
FYI, this fix does *not* correct the bug described in bug 126598 or
Re: comment #113, so I'd add the following to /etc/modules.conf, for
options nfsd nfs3_acl_max_entries=256 nfsd3_acl_max_entries=256
Ken (Comment #119) did you actually add the module parameters requied
to actually implement this fix? Just running the kernel isn't enough,
you have to actually add the options to your modules.conf.
Of course, you may still be correct, your problem might not even be
related to this bug, but I just wanted to make sure you actually
tested it with the proper changes to modules.conf as it seems there is
some confusion as to how to actually implement this fix.
Just to clear things up, my understanding is as follows:
1. Just running the new kernel doesn't change anything
2. To actually implement the fix you must load the modules with the
new options, either via manually loading the driver and explicitly
supplying the options, or by adding them to modules.conf.
3. The /proc interface is a READ-ONLY interface so that you can see
what values the modules were loaded with, you cannot actually change
the values via this interface.
There seem to be multple people confused about this (even discussed on
Taroon list). Could someone verify that my understanding correct?
Re: comment #120, here's the correct modules.conf syntax/usage after
some trial and error:
options nfsd nfsd3_acl_max_entries=256
options nfs nfs3_acl_max_entries=256
*** Bug 133246 has been marked as a duplicate of this bug. ***
In response to comment #119, Tom is correct. The parameters which
this patchset adds to nfs and nfsd are settable once via module
parameters at load time. The proc interface is a read-only interface,
allowing you to see what the load time settings were. It was decided
some time ago, that allowing the dynamic resizing of the max acl
message size could be rather racy, and so we decided to require one
time initalization only at module load time.
kernel-2.4.21-20.EL with patch applied on a DELL 1GB RAM box using:
options nfsd nfsd3_acl_max_entries=256
options nfs nfs3_acl_max_entries=256
This box is an NFS server (and client, in this case against a
venerable redhat9 server) with random processes hanging twice in the
last 2 days... here's the latest seemingly relavent syslog entries
Nov 22 12:07:59 x kernel: lockd: cannot monitor x.x.x.133
Nov 22 12:08:24 x kernel: lockd: cannot monitor x.x.x.133
Nov 22 12:09:07 x kernel: lockd: cannot unmonitor x.x.x.103
Nov 22 12:09:32 x kernel: lockd: cannot monitor x.x.x.133
Nov 22 12:09:57 x kernel: lockd: cannot monitor x.x.x.133
Nov 22 12:10:22 x kernel: lockd: cannot monitor x.x.x.133
Nov 22 12:10:47 x kernel: lockd: cannot monitor x.x.x.6
Nov 22 12:11:12 x kernel: lockd: cannot monitor x.x.x.133
Does this sound like something related to this bug, or should I open a
It might be a simmilar type of issue, but its probably unrelated. I'd
open another bugzilla for it.
OK, submitted as bug #140385 "lockd: cannot monitor/unmonitor"
Exactly what is the current status of this ? Should we get the test
kernel from RHEL ES3QU4 Beta and run it on.. the server ? the client ?
If we do, are there any patches to apply or are they already applied ?
Are there fixes for the other nfs bugs in the beta kernel or are there
separate patches that have to be applied for that ?
The fix as attached in the latest patch set to this bz is applied to
the U4 beta kernel
Run it on the machine in which the log messages appeared. This could
be the client or the server, but is in most cases the client. don't
forget to set the new nfs module options appropriately.
"Are there fixes for the other nfs bugs in the beta kernel"
What exactly do you mean here? Are there other bugzillas you are
specifically concerned about?
*** Bug 139952 has been marked as a duplicate of this bug. ***
thanks muchly neil - i've started the process of applying this to our
nfs clients and i've fired it up on the server (rhel3qu4 beta kernel).
we currently do between 200-400Mbit/sec (around 3-4Tbyte per day) so
nfs is hammered fairly hard. this is expected to double within the
next 12 months.
the other bugs i was thinking of were the ones listed in #119.
*** Bug 121803 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
*** Bug 136423 has been marked as a duplicate of this bug. ***