Description of problem: Commands like cp -p, ls -l on automounted filesystems hang. Strace of cp -p <filename> /usr/local/writable/japatel (nfs mounted dir): ------------------------------------------------------------------------ write(4, "i386.\nInstalling netdump-server-"..., 512) = 512 read(3, "07.i386.\n", 512) = 9 write(4, "07.i386.\n", 9) = 9 read(3, "", 512) = 0 close(4) = 0 close(3) = 0 utime("/usr/local/writeable/japatel/install.log", [2004/04/23-14:22:12, 2004/01/23-18:47:04]) = 0 getxattr("install.log", "system.posix_acl_access", 0xbfff8a70, 132) = -1 EOPNOTSUPP (Operation not supported) setxattr("/usr/local/writeable/japatel/install.log", "system.posix_acl_access", 0x8057c08, 28, ) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- +++ killed by SIGINT +++ The command hangs while executing "setxattr" When we break the command we see the following message in /var/log/messages. Apr 23 21:29:06 stagp14 kernel: RPC: buffer allocation failed for task ec387cb4 Checked the /proc/meminfo and /proc/slabinfo and they seem to be fine. (Data posted below. They were taken when the problem occurs) After this condition there no go but to reboot the machine. Kernel installed : ----------------- [root@stagp14 root]# uname -a Linux stagp14 2.4.21-9.ELsmp #1 SMP Thu Jan 8 17:08:56 EST 2004 i686 i686 i386 GNU/Linux Mod Utils installed: ------------------- [root@stagp14 root]# rpm -qa | grep modutils modutils-2.4.25-11.EL autofs installed: ---------------- [root@stagp14 root]# rpm -qa | grep autofs autofs-4.1.0-3 meminfo : --------- total: used: free: shared: buffers: cached: Mem: 6074359808 4538081280 1536278528 0 309878784 3981922304 Swap: 2146787328 24576 2146762752 MemTotal: 5931992 kB MemFree: 1500272 kB MemShared: 0 kB Buffers: 302616 kB Cached: 3888572 kB SwapCached: 24 kB Active: 1094420 kB ActiveAnon: 28216 kB ActiveCache: 1066204 kB Inact_dirty: 2964768 kB Inact_laundry: 95724 kB Inact_clean: 95828 kB Inact_target: 850148 kB HighTotal: 5111680 kB HighFree: 1361904 kB LowTotal: 820312 kB LowFree: 138368 kB SwapTotal: 2096472 kB SwapFree: 2096448 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB slabinfo : ---------- slabinfo - version: 1.1 (SMP) kmem_cache 80 80 244 5 5 1 : 1008 252 nfs_write_data 50 50 384 5 5 1 : 496 124 nfs_read_data 180 180 384 18 18 1 : 496 124 nfs_page 300 300 128 10 10 1 : 1008 252 ip_fib_hash 11 224 32 2 2 1 : 1008 252 ext3_xattr 0 0 44 0 0 1 : 1008 252 journal_head 855 13013 48 16 169 1 : 1008 252 revoke_table 1 250 12 1 1 1 : 1008 252 revoke_record 336 336 32 3 3 1 : 1008 252 clip_arp_cache 0 0 256 0 0 1 : 1008 252 ip_mrt_cache 0 0 128 0 0 1 : 1008 252 tcp_tw_bucket 210 210 128 7 7 1 : 1008 252 tcp_bind_bucket 336 336 32 3 3 1 : 1008 252 tcp_open_request 30 30 128 1 1 1 : 1008 252 inet_peer_cache 58 58 64 1 1 1 : 1008 252 secpath_cache 0 0 128 0 0 1 : 1008 252 xfrm_dst_cache 0 0 256 0 0 1 : 1008 252 ip_dst_cache 1365 1365 256 91 91 1 : 1008 252 arp_cache 60 60 256 4 4 1 : 1008 252 flow_cache 0 0 128 0 0 1 : 1008 252 blkdev_requests 3072 3090 128 103 103 1 : 1008 252 kioctx 0 0 128 0 0 1 : 1008 252 kiocb 0 0 128 0 0 1 : 1008 252 dnotify_cache 0 0 20 0 0 1 : 1008 252 file_lock_cache 120 120 96 3 3 1 : 1008 252 async_poll_table 0 0 140 0 0 1 : 1008 252 fasync_cache 0 0 16 0 0 1 : 1008 252 uid_cache 9 224 32 2 2 1 : 1008 252 skbuff_head_cache 1426 1426 168 62 62 1 : 1008 252 sock 355 355 1408 71 71 2 : 240 60 sigqueue 1015 1015 132 35 35 1 : 1008 252 kiobuf 0 0 128 0 0 1 : 1008 252 cdev_cache 2088 2088 64 36 36 1 : 1008 252 bdev_cache 3 116 64 2 2 1 : 1008 252 mnt_cache 232 232 64 4 4 1 : 1008 252 inode_cache 48259 52962 512 7566 7566 1 : 496 124 dentry_cache 40380 40380 128 1346 1346 1 : 1008 252 dquot 0 0 128 0 0 1 : 1008 252 filp 9380 9420 128 314 314 1 : 1008 252 names_cache 12 12 4096 12 12 1 : 240 60 buffer_head 552994 872970 108 24807 24942 1 : 1008 252 mm_struct 160 160 384 16 16 1 : 496 124 vm_area_struct 2324 2576 68 45 46 1 : 1008 252 fs_cache 406 406 64 7 7 1 : 1008 252 files_cache 161 161 512 23 23 1 : 496 124 signal_cache 348 348 64 6 6 1 : 1008 252 sighand_cache 115 115 1408 23 23 2 : 240 60 pte_chain 3042 16650 128 277 555 1 : 1008 252 pae_pgd 406 406 64 7 7 1 : 1008 252 size-131072(DMA) 0 0 131072 0 0 32 : 0 0 size-131072 0 0 131072 0 0 32 : 0 0 size-65536(DMA) 0 0 65536 0 0 16 : 0 0 size-65536 2 2 65536 2 2 16 : 0 0 size-32768(DMA) 0 0 32768 0 0 8 : 0 0 size-32768 8 8 32768 8 8 8 : 0 0 size-16384(DMA) 0 0 16384 0 0 4 : 0 0 size-16384 20 20 16384 20 20 4 : 0 0 size-8192(DMA) 0 0 8192 0 0 2 : 0 0 size-8192 6 6 8192 6 6 2 : 0 0 size-4096(DMA) 0 0 4096 0 0 1 : 240 60 size-4096 649 649 4096 649 649 1 : 240 60 size-2048(DMA) 0 0 2048 0 0 1 : 240 60 size-2048 246 306 2048 137 153 1 : 240 60 size-1024(DMA) 0 0 1024 0 0 1 : 496 124 size-1024 584 584 1024 146 146 1 : 496 124 size-512(DMA) 0 0 512 0 0 1 : 496 124 size-512 576 576 512 72 72 1 : 496 124 size-256(DMA) 0 0 256 0 0 1 : 1008 252 size-256 1095 1095 256 73 73 1 : 1008 252 size-128(DMA) 0 0 128 0 0 1 : 1008 252 size-128 2730 2730 128 91 91 1 : 1008 252 size-64(DMA) 0 0 128 0 0 1 : 1008 252 size-64 4410 4410 128 147 147 1 : 1008 252 size-32(DMA) 0 0 64 0 0 1 : 1008 252 size-32 580 580 64 10 10 1 : 1008 252 Version-Release number of selected component (if applicable): RHEL 3 U1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi Van, I don't know where you got autofs-4.1.0-3, but we don't support it in any of our distributions. We could take autofs out of the loop and try again, so we can at least see if the problem may strictly be NFS related. Would you mind trying that? If that doesn't elicit the problem, then we'll see about getting you closer to a supported autofs configuration. Thanks! Jeff
Jeff, Do we have a reason to believe that autofs may be at fault here? The RPC message above leads Oracle and myself to believe otherwise. If we have a good reason to believe autofs may be an issue or any evidence that points that way, we'd like to know before trying to duplicate with a different autofs.
It doesn't look to be autofs at first glance. That's partly why I suggested taking autofs out of the loop. Please give it a try.
Jeff, Turning off autofs on these systems will essentially make them useless. Autofs4 is used *heavily* on them as an integral part of the environment. Can we please begin debugging this problem as they're hitting it often. Regarding autofs, if you can give a good reason why it's an autofs issue, I'll be happy to treat it as such, but otherwise taking autofs out of the loop isn't really an option :/
Ok, that's too bad. I will talk with our NFS maintainer and see if we can narrow things down based on your bug report.
It appears it could be a memory fragmentation problem. Would it be possible to get an AltSysRq-m output?
No problem. I'll ask our guys to do that next time they hit the issue (shouldn't be long).
Where do we stand on this BUG? Please verify that this is still a problem with the latest RHEL3-U3 kernel. We have made VM changes to help deal with the memory fragmentation issue that are included in U3. Larry
Created attachment 102373 [details] patch to failover to ZONE_DMA in case of fragmentation Sorry, I should have pushed harder on this. The fallback issue is not resolved in 17.EL, I've attached the one liner patch to resolve this issue... patch from wli Greg
Van, I think this problem has ben fixed in RHEL3-U4, can you verify this so we can close this bug? Larry
From what I've seen, this is fixed with the reduced-size ACLs for NFS in U4. Is there a bug number we can reference here for this? Then this bug can be closed. Cheers, Greg
Thanks for the info, Greg. I'm closing this as a dup of bug 118839. *** This bug has been marked as a duplicate of 118839 ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html