From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; Linux i686; U) Opera 7.22 [en] Description of problem: We have several systems experiencing this problem. It seems to possibly be related to memory pressure. These systems are all running Oracle version 9.2.0.4 but NFS is only used for home directories. The original symptom is that a system will hang when doing a simple 'ls -l' on any nfs mounted directory. A standard 'ls' and even other normal file operations seems to work fine. You can even still umount and mount the same NFS file system or other NFS systems, but the 'ls -l' command will always hang. The only way to kill it is to do a 'kill -9' on the ls process and then it will hang in 'D' state then you can 'kill -9 rpciod' and the proccess will release. Whenever you get the hanging 'ls-l commands you also get entried like the following: RPC: buffer allocation failed for task da8fbca8 RPC: buffer allocation failed for task da8fbca8 RPC: buffer allocation failed for task dc00fca8 This is happening on a 4-way Dell 8450 w/8GB of RAM, a 4-way Dell 6450 w/4GB, and a 2-way 2550 w/4GB RAM. All of these systems are under significant memory pressure but gernerally preform well. This problem has appeared since the recent upgrade of these systems from AS 2.1 to AS 3. The NFS server is a Dell 2650 running ES 3. Please ley me know what other information I can provide. Thanks, Tom Version-Release number of selected component (if applicable): kernel-smp-2.4.21-9.0.1.EL How reproducible: Sometimes Steps to Reproduce: 1. Mount filesystem vis NFS 2. Let system run for a while until memory pressure occurs 3. Preform 'ls -l' until it hangs Actual Results: Hangs when trying to do 'ls -l' of NFS mounted directories Expected Results: Should list directories as normal. Additional info: These system are connected to a Dell/EMC CX400 disk array
This issue occured again today on our production Oracle box. That system can't seem to make more than 24 hours with a working NFS. The hang today was worse than previous, pretty much any access to the NFS share was completely broken. The 8450 with 8GB of RAM seems to have much more problems that the other systems with only 4GB. Is this likely to be a low-mem issue. Would it be possible that running the hugemem kernel might be an improvement even though the system has on 8GB of RAM? Thanks, Tom
If it's a VM problem with allocating contiguous buffers, then yes switching to the hugemem kernel may help since the amount of available kernel memory will increase by about 400%. Assigning to SteveD in case it's an NFS problem anyway ;)
Yow. This happened once, was hoping it was a fluke...but twice now. I'm also seeing this on a lightly loaded backup server, that doesn't do a ton of NFS: |eowyn# cp -a web web2ls -l web web2 |web: |total 2384 |-rw-r--r-- 1 brian support 2435809 Mar 18 18:28 apache_1.3.29.tar.gz |-rw-r--r-- 1 brian support 4630171 Mar 22 20:32 php-4.3.4.tar.gz | |web2: |total 2384 |-rw------- 1 brian support 2435809 Mar 18 18:28 apache_1.3.29.tar.gz This hangs, but is able to be interrupted. Now that the machine is having problems, there's some interesting things. dmesg reports: RPC: buffer allocation failed for task d1fe5ca for each attempt (diff hex code). I can ls -R the whole dir, but try one ls -l and then it hangs again. I can rm -r it as well. I have a machine in this state now, which I could create a login for a tech, I'll prob have to reboot the machine tomorrow before an upgrade to RHE 3 on my other servers. More info: NFS server: Red Hat 9 2.4.20-30.9smp (upgrading tomorrow to AS) Client: Single processor Xeon, 1GB RAM 2.4.21-9.0.1.ELsmp I cannot recall if I was seeing this with the old kernel.
Ok. This is creepy: cp -r web h works fine. cp -rp web i hangs. dmesg shows: RPC: buffer allocation failed for task f2b51ca8 but the message only appears when you ^c the cp.
Last one, I promise. I was thinking kernel as well, but I note that with the cp -a or -rp commands I have done, the owner and perms have not been changed when I interrupt them. So I tried some chown, chown -R, chmod, and chmod -R. All worked fine, verifying with ls -l on another system. I did foreach to the stat command with every individual format option. It did not hang on any of them. I have no idea what this means, but I d/l and compile the fileutils 4.1 package from gnu.org, and bingo, ls -l and cp -a work. (I did have to configure on another RHE3 AS system, due to config.status hanging on creating the makefiles. But the compile ran on the affected machine.) That's the limit of my mojo tonigh.
I suspect you'll be able to find more info in /proc/slabinfo ;) In particular, after the first copy the dentry and inode caches should still be at a more or less reasonable size, but after the second copy they are probably really really big. I have a suspicion on what your problem could be: if there is enough memory free, we don't reclaim slab cache memory to fulfill higher order memory allocations, but only normal user/cache pages. If the slab cache is really big, maybe we need to reclaim slab and buffer headers from the defragmentation routine that's used when higher order memory allocations can't be immediately satisfied... Larry, does this make sense ?
Created attachment 98780 [details] /proc/slabinfo output FYI: Attached is the /proc/slabinfo output of the affected machine, before and during an ls -l.
This may help, an strace on ls -l to_do and cp -p to_do a both hang at: getxattr("to_do", "system.posix_acl_access" and print <unfinished> when ^c
Hi, the tech at redhat directed me to this bug report, which appears to be the exact same behavior that I'm seeing; namely, 'ls' works fine on my nfs-mounted home directories, but 'ls -l' just hangs, with a corresponding "kernel: RPC: buffer allocation failed for task d4dc3ca8". 'ls -l' seems to be working fine on local directories. 'cp -rp' exhibits the same hanging behavior that Brian mentioned, with that RPC error popping up when you eventually ctrl-c it. NFS server is RHEL ES 3.0, Dell PE4600 with 4GB ram and HT on. NFS client is also RHEL ES 3.0, single-cpu Dell dimension with 1GB ram, and kernel 2.4.21-9.0.1.ELsmp. I'm trying to update the kernel on the nfs client box to see if that will fix anything.
I upgraded to 2.4.21-9.0.3.ELsmp and this is still happening, after about 5 days uptime. It doesn't seem to happen with a non-smp kernel, as I rand one of those 20 days w/o problem.
After upgrading to 2.4.21-14.ELsmp (from the Rhel 3.0 beta channel), I haven't yet seen the problem, but this is my secondary server and isn't highly stressed. I'll keep an eye on it, especially if this is something that disappears for a while after rebooting (such as for upgrading the kernel). If it does work, that kernel is going on my primary server.
I also haven't been able to reproduce this problem since upgrading all but two of my systems that were having this problem to 2.4.21-12. ELsmp several weeks ago, even on the servers that regularly showed this problem in only a few hours with 2.4.21-9.0.1.
I'll give that a shot, then!
Unfortunately, vmlinuz-2.4.21-14.ELsmp, produced the error afer about 5 days. Back to 2.4.21-9.0.3.EL.img.
2.4.21-15.ELsmp is producing the error after about five days, now not just on my back server, but the web server as well. Marvelous. On the backup server if I run the non-smp version the behavior does not manifest (9 days and counting). Both of these machines are dual xeon motherboards. Backup server has 1 hyperthreading processor, web has two non-HT xeons. Another server I have on the same MB as the backup server has two HT xeons in it, and has not shown the problem. Why does this seems to happen with two procs on the smp kernel? Any way to turn off acls on nfs shares?
According to "man exports" you can add "no_acl" to the export line to mask off acl permissions on nfs shares. Also, I think I've discovered that my problem went away when I actually mounted up the filesystem that I was exporting with the "acl" option. In other words I was exporting the directory /mnt/misfiles which was a standard ext3 filesystem but was not mounted with acl support. Other systems accessing this filesystem via NFS would generate the RPC errors on occassion. However, once I modified the /mnt/misfiles ext3 filesystem on the nfs server to enable acl support the problem seems to go away on the clients. I haven't seen this issue in months on my machines. Maybe just luck, but it was happening regularly before I made the changes to enable ACL's on the underlying filesystems that were being exported. Later, Tom
I'm having this issue too. The box is a development server that supports a few developers and serves up an SVN repository and some development web pages/cgis. Our home directories are NFS mounted from a Solaris box via the automounter, which in turn reads its configuration from an NIS domain hosted by a different Solaris box. Every few weeks, I'll run an 'ls -al' command and it will hang. Nothing will dump to the syslog until you try to kill the process, at which point the syslog is spammed with the following: Jun 3 13:54:34 roclindev1 kernel: RPC: buffer allocation failed for task c8b31ca8 Jun 3 13:54:54 roclindev1 last message repeated 331 times Once a particular path is hung, any attempt to access it will hang, even a plain ls. However, other paths on the same NFS mount will be fine. In addition, attempting an strace on the hung process will result in a hung strace. I swear I was able to recover one time through a combination of 'kill -9 pid', 'umount -f /affected/mount', 'umount -l /affected/mount', 'mount /affected/mount', and '/etc/init.d/autofs restart'. But I haven't been able to do that since, I don't waste my time, I just reboot the box. I've spent a few hours researching the problem on the web and to the best of my knowledge but I've never been very good at NFS troubleshooting. If I can provide any more information, please let me know.
Sadly, adding acl to the mounts on the servers didn't fix the problem, and combinations of no_acl don't seem to matter. Thus, maybe it's a kernel memory issue. Before 2.4.21-15.ELsmp I was seeing this only on a machine with an Intel SE7501BR2 7501 chipset motherboad, now I'm seeing it on Tyan S2720GN 7500 chipset boards. Interestly, 2.4.21-15.ELsmp on the initial machine the problem seems to come and go now. ls -l hangs, and then I ^C and sometime later for some unknown reason it works.
Some more data: Happened again just now. I was doing an 'mv -f ' on a large (581MB) backup file to an NFS mount, noticed it was taking longer than necessary. Here's exactly the steps I followed: 1) log into linux box 2) ls on backup directory -- this worked fine 3) ls -al on backup directory -- hang 4) log in again 5) ls on backup directory -- hang I'll see if I can reproduce tonight once no one needs the box any longer -- this is the quickest the hang has happened to me after a reboot. Maybe the large file triggers something?
i see exactly the same problem. 2.4.21-15ELsmp on both client and server. it's extremely annoying. it also only seems to do it on 'high load' clients. on one client ls -l works fine on a directory. on another which has heavy network (nfs) load it doesn't. -jason
some additional information - we do a reasonable amount of NFS. approximately 3-4Tbyte/day over nfs (the joys of a popular internet archive). i have the following in my sysctl.conf (if it makes any difference) # Performance tuning to increase fragment buffer memory #net.ipv4.ipfrag_high_thresh = 4194304 #net.ipv4.ipfrag_low_thresh = 1048576 # Performance tuning to increase tcp read/write buffers #net.ipv4.tcp_rmem = 4096 349520 1048576 #net.ipv4.tcp_wmem = 4096 131072 1048576 this bug is very reproducible and affects every nfs client we have that does any kind of load. -jason
Created attachment 101372 [details] slabinfo from nfs client which locks up with ls -l this is from an nfs client running 2.4.15ELsmp.
with some
with some further testing the bug is strange. ftpd isn't affected (proftpd internal ls doesn't appear to do an ls -l) httpd (apache 1.3.29) isn't affected. it seems able to browse directory trees without issues rsyncd (2.6.2) isn't affected. i can rsync list in long form directories remotely (rsyncd) as well as over ssh. so really i can only reproduce this by using the command line, i.e going to an NFS client and trying to ls -l in any nfs mounted directory. this is particularly annoying for home directories of course.. other info - all the clients are mounted RO. except for one which is RW (and interestingly it doesn't have the ls problems). i suspect this is because it doesn't do that much reading across nfs, just relatively small (10-20G/day) writes (compared with 2000-3000G/day reads on clients).
Hi, I have noticed the same problem. The server is an L200 with 2GB RAM and the client is an N400 with 4GB RAM. We are running RHEL 3 Update 2 (2.4.21-15.0.2.ELsmp) on both systems. I have noticed that running (on the client) a program that allocates/touches/frees a considerable amount of memory (1GB for example), and hence decreasing buffers & cached, makes "ls -l" to work again during some time. Maybe this helps to diagnose the problem, I can attach some slabinfo if needed. Regards, Juanjo
Has anyone opened an official support case with Redhat on this issue? This seems to be a pretty big issue which is starting to show up for quite a few users, perhaps opening an official support case would get this bug some attention. I would do it but, even though I originally reported this bug, I'm actually no longer having the issue so I wouldn't be a good choice to pursue a support case. Later, Tom
The issue has been opened with Red Hat support. I've taken a brief look through the code. It looks to me that the problem is based around two things: 1) The fact that RPC in Linux reserves response buffers before the RPC call is made 2) The size of the buffer that Linux has to reserve for getacl requests. Since we reserve response buffers before we make an RPC call, and since we have no way of knowing the maximum size of a response for a getacl call (which is required for getxattr requests on systems that support ACL's), we have to assume a worst case scenario. In the case of the getacl call, this requires quite a bit of space (since 1024 ACE objects are supported per response). This means that we need to make at least an order 2 allocation for every getacl request sent. I would imagine that systems under a high level of memory pressure with significant fragmentation would not be able to satisfy this request. I believe the rpc code, in an effort to prevent all NFS action from blocking, simply fails this request, rather than putting the calling process to sleep. As a side note, it may be helpful, if you find yourself in this situation, to mount your NFS shares with an rsize and a wsize of 4096. This will decrease your performance somwhat, but doing so should alieve some of the systems demands for order 2 allocations and may prevent these errors from occuring.
Just a ditto of the above. I'm seeing the same problems on our Oracle server doing a trivial amount of NFS per day (<5GB). Runnin kernel 2.4.21-9.0.1.ELsmp, tried reducing rsize and wsize to 4096 and still had the problem. Only 'ls -l' appears to be affected.
We are running 15.0.3.ELsmp with NFS on a NAS server from Dell running Win2k3 and we suffer the same symptoms. The ls lookups happen irregularly. We are going to use another distro to check it is a RHEL problem or what?
So far the only thing the new kernels have done for me is: 1) make more systems experience the bug 2) make the interval of useful uptime less Has anyone seen this with a non-smp kernel?
I have seen this issue on non-smp, but only on one server. Once again it was a server that was under significant memory pressure. Actaully, adding more memory seemed to resolve the issue. Another thing I have noted when trying to figure out why I no longer see this problem while others still do, I remember that I made a change to the inactive_clean_percent to return it to the behavior of 2.4.21-4.EL which I think was the original kernel. I did this to resolve another issue where the kernel goes way too far into swap instead of reclaiming memory, however, maybe the extra memory reclaim gives NFS a better chance of completing it's order 2 allocations. I added the following to my sysctl.conf to make this change: vm.inactive_clean_percent = 100 It probably doesn't have anything to do with it, but, since I haven't seen this issue in months, and this is one of the changes I made during those months, I thought I'd mention it. Later, Tom
You probably shouldn't set your inactive_clean_percent to 100. That implies that anytime you have dirty memory pages, the VM is going to try to clean them. Thats alot of unneeded overhead to put on your VM, and might in fact make your system quite unresponsive. It would of course help optimize the number of order 2 allocations you had available, but at a pretty terrible cost.
Actually, as far as I can tell, in any system under memory pressure from applications doing lots of IO, not having this set causes the system load to be much higher because the system fails to reclaim memory agressively enough to avoid it from going heavy into swap, and using swap is a much higher cost that using CPU cycles to reclaim memory. In my experience, and many others if you do a search on Bugzilla,or the Taroon or Oracle mailing lists, this setting actually has a positive effect. I have one server here where it is VERY noticable. Still, you don't have to set it to 100 (even though that was the behaviour of the original RHEL 3 kernel) you can set it to 75 or 50 and see if it helps. I'm just trying to offer some suggestions since so many people are having this issue and there doesn't seem to be any progress on fixing it. If this setting kills performance then they can change it back, but on the 15 servers I have the performance imapact is negligible at best, and on a few memory constrained servers it actually improved performance by making the system use less swap. Even if it has a small impact on performance, some people may be willing to trade a few percent of their CPU cycles for a working NFS client (assuming that actually helps with the problem). Later, Tom
Neil, while setting inactive_clean_percent to 100 could result in bad disk IO when doing swapping, it should be pretty harmless when doing mmap()d IO over NFS. In fact, starting the write out earlier should guarantee that there is more free memory available - making it easier to allocate the RPC buffers and reducing the chance of an allocation failure. This is a special case, though. For normal disk bound systems it's probably best to leave inactive_clean_percent at a lower value...
I've set it up on my backup server (performance won't matter a whole lot), to try to see if it hits the problem. I'll report back in 5 days or so when it would be sure to do it by then. So what is the value set at in the current kernels, and what might be a good value to avoid this problem if it works?
believe the default on the -15 kernels is 30
We are running RedHat Enterprise Linux ES release 3 with linux kernel 2.4.21-9.ELsmp and once every day or 2 days the 'mv' command hangs when moving a file from local directory to a NFS mounted directory. This is accompanied by repeated errors in messages file of kernel: RPC: buffer allocation failed for task e0c43cb4 for different taks The only fix we have found is to reboot.
Running kernel-2.4.21-15.0.3.EL and 2.4.21-15.0.2.ELsmp I am getting this exact same issue with Dell 2650s. It is really really annoying and making the machines virtually unuseable at times. The machines in question are under fairly heavy cpu loads from statistical analysis programs but not heavy nfs load. I can't reboot without the stats jobs failing so this isn't an option as they run for weeks at a time. Things that hang in these circumstances include vi, vim, rm, ls. It's also happened on machines that are not heavily loaded. I never got this problem with redhat 9. If anyone got a fix from Red Hat support could they please mail it to me urgently? Cheers
FYI: Though it seemed to last a day longer, the backup server with vm.inactive_clean_percent = 100 did experience the problem again. :(
same problem here. 2GB Memory with kernel-smp-2.4.21-15.0.3.EL. Typical paging and disk I/O info are following, 11:30:03 AM pgpgin/s pgpgout/s activepg inadtypg inaclnpg inatarpg 11:40:03 AM 242.07 1659.09 235921 195099 7315 97402 11:50:01 AM 5.45 759.29 235434 196169 8065 98205 12:00:00 PM 0.52 1522.38 213007 213161 7239 98026 12:10:00 PM 215.22 1449.89 236690 196093 8192 98322 12:20:00 PM 413.29 1162.54 240586 192554 7018 98195 12:30:04 PM 28.32 1307.60 214242 210747 7385 97660 12:40:00 PM 424.62 1443.24 241216 192619 7463 98339 12:50:00 PM 12.09 928.98 228875 201079 7220 98179 01:00:01 PM 132.47 1861.38 221040 212119 4645 96681 01:10:03 PM 386.30 1753.47 247961 179431 6705 94704 01:20:03 PM 48.78 790.09 245355 173183 6429 92946 01:30:00 PM 99.52 1086.99 251283 180604 21462 95238 11:30:03 AM DEV tps sect/s 11:40:03 AM dev8-0 134.28 3802.32 11:50:01 AM dev8-0 58.85 1529.47 12:00:00 PM dev8-0 120.93 3045.80 12:10:00 PM dev8-0 120.58 3330.22 12:20:00 PM dev8-0 127.76 3151.68 12:30:04 PM dev8-0 102.59 2671.85 12:40:00 PM dev8-0 134.42 3735.71 12:50:00 PM dev8-0 65.35 1882.15 01:00:01 PM dev8-0 166.50 3987.71 01:10:03 PM dev8-0 159.99 4279.54 01:20:03 PM dev8-0 70.15 1677.75 01:30:00 PM dev8-0 98.44 2373.01
It happened again in less than 24 hours. 13:20:36 up 23:15, 5 users, load average: 5.04, 5.00, 3.98 119 processes: 118 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 0.4% 0.0% 1.4% 0.0% 0.0% 0.0% 98.0% cpu00 0.9% 0.0% 1.9% 0.0% 0.0% 0.0% 97.0% cpu01 0.0% 0.0% 0.9% 0.0% 0.0% 0.0% 99.0% Mem: 2061636k av, 1934868k used, 126768k free, 0k shrd, 48852k buff 1046136k actv, 469100k in_d, 28304k in_c Swap: 2096440k av, 2240k used, 2094200k free 1736384k cached [root@log]# cat /proc/slabinfo slabinfo - version: 1.1 (SMP) kmem_cache 96 96 244 6 6 1 : 1008 252 nfs_write_data 40 40 384 4 4 1 : 496 124 nfs_read_data 360 360 384 36 36 1 : 496 124 nfs_page 1020 1020 128 34 34 1 : 1008 252 ip_fib_hash 16 224 32 2 2 1 : 1008 252 ip_conntrack 260 260 384 26 26 1 : 496 124 urb_priv 0 0 64 0 0 1 : 1008 252 ext3_xattr 0 0 44 0 0 1 : 1008 252 journal_head 1241 13321 48 45 173 1 : 1008 252 revoke_table 5 250 12 1 1 1 : 1008 252 revoke_record 224 224 32 2 2 1 : 1008 252 clip_arp_cache 0 0 256 0 0 1 : 1008 252 ip_mrt_cache 0 0 128 0 0 1 : 1008 252 tcp_tw_bucket 23 30 128 1 1 1 : 1008 252 tcp_bind_bucket 224 224 32 2 2 1 : 1008 252 tcp_open_request 30 30 128 1 1 1 : 1008 252 inet_peer_cache 3 116 64 2 2 1 : 1008 252 secpath_cache 0 0 128 0 0 1 : 1008 252 xfrm_dst_cache 0 0 256 0 0 1 : 1008 252 ip_dst_cache 225 225 256 15 15 1 : 1008 252 arp_cache 30 30 256 2 2 1 : 1008 252 flow_cache 0 0 128 0 0 1 : 1008 252 blkdev_requests 3072 3360 128 112 112 1 : 1008 252 kioctx 0 0 128 0 0 1 : 1008 252 kiocb 0 0 128 0 0 1 : 1008 252 dnotify_cache 0 0 20 0 0 1 : 1008 252 file_lock_cache 120 120 96 3 3 1 : 1008 252 async_poll_table 0 0 140 0 0 1 : 1008 252 fasync_cache 0 0 16 0 0 1 : 1008 252 uid_cache 224 224 32 2 2 1 : 1008 252 skbuff_head_cache 1495 1495 168 65 65 1 : 1008 252 sock 290 290 1408 58 58 2 : 240 60 sigqueue 261 261 132 9 9 1 : 1008 252 kiobuf 0 0 128 0 0 1 : 1008 252 cdev_cache 269 290 64 5 5 1 : 1008 252 bdev_cache 7 116 64 2 2 1 : 1008 252 mnt_cache 27 116 64 2 2 1 : 1008 252 inode_cache 3205 4823 512 689 689 1 : 496 124 dentry_cache 1469 2550 128 85 85 1 : 1008 252 dquot 0 0 128 0 0 1 : 1008 252 filp 1453 1470 128 49 49 1 : 1008 252 names_cache 44 44 4096 44 44 1 : 240 60 buffer_head 254395 273980 108 7827 7828 1 : 1008 252 mm_struct 250 250 384 25 25 1 : 496 124 vm_area_struct 4536 4536 68 81 81 1 : 1008 252 fs_cache 406 406 64 7 7 1 : 1008 252 files_cache 210 210 512 30 30 1 : 496 124 signal_cache 580 580 64 10 10 1 : 1008 252 sighand_cache 178 180 1408 36 36 2 : 240 60 pte_chain 5494 12150 128 284 405 1 : 1008 252 pae_pgd 406 406 64 7 7 1 : 1008 252 size-131072(DMA) 0 0 131072 0 0 32 : 0 0 size-131072 0 0 131072 0 0 32 : 0 0 size-65536(DMA) 0 0 65536 0 0 16 : 0 0 size-65536 0 0 65536 0 0 16 : 0 0 size-32768(DMA) 0 0 32768 0 0 8 : 0 0 size-32768 0 0 32768 0 0 8 : 0 0 size-16384(DMA) 1 1 16384 1 1 4 : 0 0 size-16384 22 23 16384 22 23 4 : 0 0 size-8192(DMA) 0 0 8192 0 0 2 : 0 0 size-8192 6 8 8192 6 8 2 : 0 0 size-4096(DMA) 0 0 4096 0 0 1 : 240 60 size-4096 662 722 4096 662 722 1 : 240 60 size-2048(DMA) 0 0 2048 0 0 1 : 240 60 size-2048 350 350 2048 175 175 1 : 240 60 size-1024(DMA) 0 0 1024 0 0 1 : 496 124 size-1024 100 100 1024 25 25 1 : 496 124 size-512(DMA) 0 0 512 0 0 1 : 496 124 size-512 576 576 512 72 72 1 : 496 124 size-256(DMA) 0 0 256 0 0 1 : 1008 252 size-256 1080 1080 256 72 72 1 : 1008 252 size-128(DMA) 1 30 128 1 1 1 : 1008 252 size-128 2277 2400 128 80 80 1 : 1008 252 size-64(DMA) 0 0 128 0 0 1 : 1008 252 size-64 660 660 128 22 22 1 : 1008 252 size-32(DMA) 17 58 64 1 1 1 : 1008 252 size-32 883 1044 64 18 18 1 : 1008 252
vm.inactive_clean_percent = 100 does not help
We are running RedHat Enterprise Linux ES release 3 with linux kernel 2.4.21-9.ELsmp and once every day or 2 days the 'mv' command hangs when moving a file from local directory to a NFS mounted directory. Currently, we just had a vi session hang accompanied by errors in the /var/log/messages like RPC: buffer allocation failed for task The hanging up of the nfs mounted filesystem is (ae)ffecting our production. Can someone give me a time frame for when this bug may be fixed? That way, we can decide whether its worth the effort to find a workaround. Thanks.
Would replacing the nfs system that comes with RedHat Enterprise ES with the nfs system that comes with Red Hat Enterprise 2.1 avoid the bug?
Thats a pretty tall order. Have you all tried using the hugemem kernel? Its not exactly a fix for the problem, but it will certainly avoid the problem by poentailly quadrupling the amount of lowmem that you have to work with (assuming that you have 4GB of ram in these systems).
> Have you tried the hugemem kernel? ... > assuming that you have 4GB... What happens if one installs hugemem on a box with less than 4GB? Would that be bad?
Nope, nothing wrong with it, but it won't maximize the advantage that the hugemem configuration provides. Non-hugemem kernels have 1GB of kernel address space (lowmem), while hugemem kernels have 4GB. So if you have less than 1GB of total RAM, hugemem isn't helpful. 1GB to 4GB ram provides a scaled advantage, by which I mean a system with 2GB of RAM will have 2GB of lowmem, a system with 3GB of RAM has three 3GB of lowmem, up to 4GB, after which you're back into adding highmem. Since RPC allocates kernel memory for response buffers, it uses lowmem. More lowmem means more memory for RPC to allocate if it needs it.
The above solution: you can 'kill -9 rpciod' and the proccess will release Is true, but the system is hosed. I cannot further nfs mount anything. Not a Dell box, not running Oracle... just RHEL3 with the -15 kernel. The client is mounting NFS partitions that are GFS file systems on the server.
Yeah, Don't do that. Killing your rpc/nfs tasks won't lead to anything good. This is a memory problem. The best solution right now that I can think of is moving to the hugemem kernel (at least for those systems that have > 1GB of RAM).
Moving to hugemem only delays the inevitable.
Can someone get "AltSysrq M" outputs for both smp and hugemem kernels when this problem is happening so we can figure out if this is a memory fragmentation issue or lowmem exhaustion issue? Thanks, Larry Woodman
i think we need to backprot the mempool_alloc infratructure from 2.6, that should solve this issue
I just got the error message in a machine (dual Xeon 2GB) that was moved recently to RHEL3 from 7.3. It managed to survive without a reboot for about a year and about 20 users running remote X sessions (Exceed) and heavy computational problems (often needing more than 1GB of memory). After the rebuild it was only used by one user for about two weeks before nfs failed. Until the problem is fixed (backporting mempool_alloc, whatever) what other options are there any other options except running the hugemem kernel ? Will it help to lower vm.max_map_count? It is listed as a possible solution when you run out of lowmem pages in a RedHat document. Is there a way to check how many VMAs are used by each process to check if it's going to cause problems ?
With Hugemem kernel, our system still behaves normal after 1 day (24 hours). It acts up in one night with other kernels.
Thats going to be the best solution, at least for now. If you absolutely can't move to hugemem. Alternatively, it may help to reduce the NFS_ACL_MAX_ENTIRES definition from its current setting of 1024 to something smaller. At a value of 1024 it implies an order 2 allocation (~12KB) for each NFS_ACL request, at 512 it requires an order 1 allocation, and at 256, it requires an order 0 allocation. This should help alieviate the need for large contiguous buffer allocation in the RPC layer for NFS_ACL requests
Created attachment 102284 [details] patch to reduce memory requirements for NFS_ACL responses If someone wants to give it a try, heres a patch I think might help. I haven't been able to determine if reducing the number of ACE objects we support in a single response message violates any germaine standards or RFC's, but It seems at the least it should relieve some of the memory demand the NFS_ACL has.
Hugemem is an ok idea, but since this problem appears mainly on clients, most of mine have 1GB of memory, so it's probably not going to help small clients. >Can someone get "AltSysrq M" outputs for both smp and hugemem kernels I can do it for smp, how exactly does one do this? >patch to reduce memory requirements for NFS_ACL responses Sure, I'll attempt a new kernel on my backup server.
Would lowering NFS_ACL_MAX_ENTIRES approximate the "noacl" client mount option and cause the NFS servers to thrash (and slow NFS file access)?
>I can do it for smp, how exactly does one do this? echo 1 > /proc/sys/kernel/sysrq press alt-printscreen-m >Would lowering NFS_ACL_MAX_ENTIRES approximate the "noacl" Performance will probably be degraded slightly, although I can't put a number on how much. Certainly it will be better than the performance you get when you receive the out of memory errors documented above.
Created attachment 102301 [details] sysrq m output Okay here's the sysrq m output from two machines with similar setups. One that is currently experiencing the bug, and another that will in the future. They are both running 2.4.21-15.0.3.ELsmp. I don't have any hugemem kernels running at the moment.
*** Bug 127830 has been marked as a duplicate of this bug. ***
My system is still functioning normal with hugemem kernel after 5 days. The physical memory size is 2GB.
thats good to hear. any results yet from the patch I posted?
Neil, Haven't chance to test your one line patch yet. I use ICP RAID controller and the driver is supported under unsupported module. The config file for unsupported couldn't be found in src package. Do you know where could find/download the config file for unsupported hugemem kernel? Thanks.
in the configs directory the hugmem kernel is the config that you want.
I just rebooted my back and web servers with the NFS_ACL_MAX_ENTIRES from 1024 -> 256 nfs3.h change. I should know in a few days if it extends the lifetime.
My system got kernel panic after 6 days with standard hugemem kernel. The last two lines displayed on console are, Code: 8b 81 84 00 00 00 42 39 41 70 89 d9 0f 43 54 24 10 81 e1 00 Kernel Panic: Fatal Exception Couldn't get sysrq+m output...
NFS_ACL_MAX_ENTIRES from 1024 to 256 looks having positive impact. One of my system couldn't live longer than 1 day with default kernel, but now it is still behaving normal after 2 days with Neil's patch.
OK, this issue just hit me again for the first time in months. What's the current consensus on the best solution? It seems the options for now are: 1. Run hugemem -- live with the performance drop that comes with this. 2. Try NFS patch and hope it helps. The patch seems more attractive to me. Do we know if it's really helping to workaround this issue? Later, Tom
We've been seeing positive results from people who've applied the patch.
I second that, I'm about 2 days past where I would normally ls -l hang by changing to NFS_ACL_MAX_ENTIRES = 256.
OK, I compiled a kernel with the patch and decided to try something a little wild. I had several systems with the NFS hang, I couldn't even copy a file from NFS. I really needed NFS on these systems to work, but they also run production critical apps that I couldn't really reboot. Since NFS client support is compiled as a module I thought I could unload the nfs.o module and replace it on the fly with a binary compatible nfs.o module that includes the patch. So I compiled an nfs.o module for 15.0.3-ELsmp and 15.0.4-ELsmp that I could drop in to the module directory to replace the delivered nfs.o module from Redhat. After testing on a non-production system I unmounted my NFS mounts, did a 'rmmod nfs', copied my new custom nfs.o over the existing one in the modules directory, ran depmod and then remounted my NFS exports. Sure enough after this NFS would work fine. I could even swap back and forth between the Redhat nfs.o and the custom nfs.o and switch between working and non-working NFS, so this patch definately seems to help. Basically I can fix my critical systems on the fly by simply replacing this one module. My new question is, does this have any other side effects and does Redhat consider this an official fix? Certainly an official fix is needed. Thanks, Tom
No this is not an official fix. Ideally we would like to fix the problem without needing to reduce the number of ACE objects supported in a single rpc response. I expect we'll post the alternative here just as soon as its ready.
Tom's nfs module switch looks a good procedure to fix nfs on fly. My system is still good after 6 days with Neil's patch. Thanks.
Created attachment 102796 [details] patch to make NFS3_ACL_MAX_ENTRIES configurable This is a variation on my previous patch, which makes the number of ACE objects supported by the nfs client configurable. Could someone test this please and confirm that it works as well as the previous patch? Thanks!
Created attachment 102803 [details] follow on patch to add same functionality to nfsd module This patch adds on the same acl ace entry module option to the nfsd module for those who are interested.
I've installed a kernel built with Neil's latest patch. It seems to be working, but I'll have to use the new kernel for about a week to be really sure that this bug is fixed (since it usually takes a few days to show up after a reboot).
I'm having exactly the same problem on my web server EL 3 clients. The NFS server is running redhat Linux 9. I have 2 identical web servers mounted to a common nfs server. The ll lockup condition can occur on one client while it is working correctly on the other. The problem is fixable only by rebooting either the nfs server or the troubled client. The problem didn't appear until I upgrade the web servers from RH 7.3. Now we're sitting on a time bomb. I've set the rsize and wsize to 4,096. I don't have the option of setting the no_acl in my nfs mounts. They are not recognized by the nfs clients in my install of EL 3, yet the documentation says it should. I'm running kernel 2.4.21-15.0.3.ELsmp on the clients. Linux version 2.4.20-8smp on the server. Don't really want to go and patch the source for the webservers. Will probably have to uninstall EL 3 and rebuild the servers with RH 9. I don't have anyway of predicting the problem. They have been trouble free at times anywhere from 2 weeks to just a few hours. These are relatively low bandwidth web servers.
Thomas, how has the patch been running for you this week?
I've had no problems since I rebooted to the new kernel six days ago.
Note that the machine where this problem occurrs hosts a cluster of about 130 nodes. I added 256 more nodes to the cluster, and the problem went from about a two week cycle to a two day cycle. What would change (more than linearly) with the additional nodes is the amount of ssh and rsh'ing occurring from the host. At least 4k of each per hour, estimated. There would also be NIS pressure, but the host load got so high on the host (i.e. whenever 1000 processes would start simultaneously on the nodes), that I turned NIS off in favor of local files... and the "RPC: buffer allocation failed" still occurred. Note that I turned off attribute caching alltogether on the host client mounts (where this host is an NFS client), using the NFS "noac" option, and that did not help.
NIS and NFS are the only two things in your comments that would have made any effect on the situation, since those are the only two subsystems that you mentioned which use RPC. Did you by any chance try the latest patch that I have attached that allows you to reduce the number of ACE objects that the nfs clients supports?
Created attachment 103306 [details] enhancement on prior patch to display module acl option via proc file same patch as before (combined nfs/nfsd changes into one patch) and exported nfsd_acl_max_entries as a read only proc file
Created attachment 103367 [details] new patch to add nfs_acl_max_entries module option to nfs.o this patch (split out from the last larger patch adds the aforementioned nfs_acl_max_entries modules option to nfs.o
Created attachment 103368 [details] follow-on patch to add same functionality to nfsd.o This patch builds on the last patch, adding the same module parameter functionality to nfsd.o, and includes the addition of a sysctl to report the assigned value.
Created attachment 103553 [details] same patch with added nfs sysctl same patch as before, but adds nfs sysctl and is diffed against latest kernel
Created attachment 103555 [details] follow on nfsd patch for new kernel same as last nfsd patch, but diffed against new kernel.
Customer, finally got to installing the kernel generated (kernel-2.4.21-20.EL.RPCTEST.i686.rpm). The machine ran successfully for 10 hours then locked up. No response from console or anywhere. Required a power off. Any ideas?
I'm seeing this hard lockup on multiple systems since upgrading to 2.4.21-20.EL, systems both with and without the patch, everything from an 8-way Dell 8450 with 12GB of RAM and a Qlogic 2300 FC adapter to my 700Mhz 1-CPU Dell PowerApp.web 100 with standard IDE drives. Have also seen the lockup on a IBM HS20 Dual 3Ghz system. I'm falling back to 15.0.4 on everything but my test server until I get a better feel for 20, but right now I think 20 is buggy. Later, Tom
The last two comments regarding lockups sound like they are related to a different prloblem than the one being addressed here. I'd open a separate bugzilla. Just out of curiosity though, are the affected machines all running with more than 4GB of RAM? If so, try booting with less than 4GB of RAM (you can use the mem=4G kernel command line option). If that relieves the lock up it might give us a clue as to where the problem lies.
I agree they are likely a different problem. I'm going to apply the existing patch for 20 to my 15.0.4 kernels and see how that goes. I'm still researching existing bugzilla entries on the hang to see if they match my problem. One of the systems that hard locked has only 512MB of RAM, but I think it may have been bit by the Bug 132547. The other two systems have 8GB and 12GB respectively, but booting with less memory is not really possible on these systems as they run large, active Oracle databases. Later, Tom
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.7.EL).
One of my systems just got hit by the bug again under kernel-smp-2.4.21-20.9.EL and fs.nfs.nfs3_acl_max_entries set to 256 after 22 days of uptime :(
can you get a sysrq-m off the system and post it here during the failure?
Regarding the latest set of patches (comment #96 and comment #97), exactly what *is* the nfs module parameter name/syntax, and how to use it in practice? Put something in /etc/modules.conf? //etc/sysctl.conf? It's not entirely clear to me.
there are two module options, one for the client and one for the server (nfs3_acl_max_entries and nfsd3_acl_max_entries). You specify them with an options directive in modules.conf. Both parameters take integer values, and allow you to specify the number of acl entries listed per NFSACL transaction. By specifying a lower value, you save memory when the RPC subsystem allocates buffers to store the transaction response, thereby avoiding buffer allocation failures from this particular problem. Comment 61 in this bugzilla provides interesting values for these options that correlate to allocation size thresholds.
In regards to comment #105, which patch was actually ported to U4, so that those of us affected can patch now?
The last two on the attachement list below (the only two patches that are not obsoleted), dated: 2004-09-07 15:32 and 2004-09-07 15:34
FYI, this fix does *not* correct the bug described in bug 126598 or bug 129861.
Re: comment #113, so I'd add the following to /etc/modules.conf, for example: options nfsd nfs3_acl_max_entries=256 nfsd3_acl_max_entries=256
Ken (Comment #119) did you actually add the module parameters requied to actually implement this fix? Just running the kernel isn't enough, you have to actually add the options to your modules.conf. Of course, you may still be correct, your problem might not even be related to this bug, but I just wanted to make sure you actually tested it with the proper changes to modules.conf as it seems there is some confusion as to how to actually implement this fix. Just to clear things up, my understanding is as follows: 1. Just running the new kernel doesn't change anything 2. To actually implement the fix you must load the modules with the new options, either via manually loading the driver and explicitly supplying the options, or by adding them to modules.conf. 3. The /proc interface is a READ-ONLY interface so that you can see what values the modules were loaded with, you cannot actually change the values via this interface. There seem to be multple people confused about this (even discussed on Taroon list). Could someone verify that my understanding correct? Thanks, Tom
Re: comment #120, here's the correct modules.conf syntax/usage after some trial and error: options nfsd nfsd3_acl_max_entries=256 options nfs nfs3_acl_max_entries=256
*** Bug 133246 has been marked as a duplicate of this bug. ***
In response to comment #119, Tom is correct. The parameters which this patchset adds to nfs and nfsd are settable once via module parameters at load time. The proc interface is a read-only interface, allowing you to see what the load time settings were. It was decided some time ago, that allowing the dynamic resizing of the max acl message size could be rather racy, and so we decided to require one time initalization only at module load time.
kernel-2.4.21-20.EL with patch applied on a DELL 1GB RAM box using: options nfsd nfsd3_acl_max_entries=256 options nfs nfs3_acl_max_entries=256 This box is an NFS server (and client, in this case against a venerable redhat9 server) with random processes hanging twice in the last 2 days... here's the latest seemingly relavent syslog entries Nov 22 12:07:59 x kernel: lockd: cannot monitor x.x.x.133 Nov 22 12:08:24 x kernel: lockd: cannot monitor x.x.x.133 Nov 22 12:09:07 x kernel: lockd: cannot unmonitor x.x.x.103 Nov 22 12:09:32 x kernel: lockd: cannot monitor x.x.x.133 Nov 22 12:09:57 x kernel: lockd: cannot monitor x.x.x.133 Nov 22 12:10:22 x kernel: lockd: cannot monitor x.x.x.133 Nov 22 12:10:47 x kernel: lockd: cannot monitor x.x.x.6 Nov 22 12:11:12 x kernel: lockd: cannot monitor x.x.x.133 Does this sound like something related to this bug, or should I open a new one?
It might be a simmilar type of issue, but its probably unrelated. I'd open another bugzilla for it.
OK, submitted as bug #140385 "lockd: cannot monitor/unmonitor"
Hi, Exactly what is the current status of this ? Should we get the test kernel from RHEL ES3QU4 Beta and run it on.. the server ? the client ? both ? If we do, are there any patches to apply or are they already applied ? Are there fixes for the other nfs bugs in the beta kernel or are there separate patches that have to be applied for that ?
The fix as attached in the latest patch set to this bz is applied to the U4 beta kernel Run it on the machine in which the log messages appeared. This could be the client or the server, but is in most cases the client. don't forget to set the new nfs module options appropriately. "Are there fixes for the other nfs bugs in the beta kernel" What exactly do you mean here? Are there other bugzillas you are specifically concerned about?
*** Bug 139952 has been marked as a duplicate of this bug. ***
thanks muchly neil - i've started the process of applying this to our nfs clients and i've fired it up on the server (rhel3qu4 beta kernel). we currently do between 200-400Mbit/sec (around 3-4Tbyte per day) so nfs is hammered fairly hard. this is expected to double within the next 12 months. the other bugs i was thinking of were the ones listed in #119. regards, -jason
*** Bug 121803 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html
*** Bug 136423 has been marked as a duplicate of this bug. ***