Bug 121803

Summary: cp -p, ls -l on automounted filesystems hang
Product: Red Hat Enterprise Linux 3 Reporter: Van Okamura <van.okamura>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: dkl, greg.marsden, jmoyer, lwoodman, mark.fasheh, nhorman, petrides, riel, robinson
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-03 03:10:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to failover to ZONE_DMA in case of fragmentation none

Description Van Okamura 2004-04-27 22:31:23 UTC
Description of problem:
Commands like cp -p, ls -l on automounted filesystems hang. 

Strace of cp -p <filename> /usr/local/writable/japatel (nfs mounted dir): 
------------------------------------------------------------------------ 
write(4, "i386.\nInstalling netdump-server-"..., 512) = 512 
read(3, "07.i386.\n", 512)              = 9 
write(4, "07.i386.\n", 9)              = 9 
read(3, "", 512)                        = 0 
close(4)                                = 0 
close(3)                                = 0 
utime("/usr/local/writeable/japatel/install.log", [2004/04/23-14:22:12, 
2004/01/23-18:47:04]) = 0 
getxattr("install.log", "system.posix_acl_access", 0xbfff8a70, 132) = -1 
EOPNOTSUPP (Operation not supported) 
setxattr("/usr/local/writeable/japatel/install.log", 
"system.posix_acl_access", 0x8057c08, 28, ) = ? ERESTARTSYS (To be
restarted) 
--- SIGINT (Interrupt) @ 0 (0) --- 
+++ killed by SIGINT +++ 

The command hangs while executing "setxattr" 

When we break the command we see the following message in
/var/log/messages. 

Apr 23 21:29:06 stagp14 kernel: RPC: buffer allocation failed for task 
ec387cb4 

Checked the /proc/meminfo and /proc/slabinfo and they seem to be fine.
(Data 
posted below. They were taken when the problem occurs) 

After this condition there no go but to reboot the machine. 

Kernel installed : 
----------------- 
[root@stagp14 root]# uname -a 
Linux stagp14 2.4.21-9.ELsmp #1 SMP Thu Jan 8 17:08:56 EST 2004 i686
i686 i386 
GNU/Linux 

Mod Utils installed: 
------------------- 
[root@stagp14 root]# rpm -qa | grep modutils 
modutils-2.4.25-11.EL 

autofs installed: 
---------------- 
[root@stagp14 root]# rpm -qa | grep autofs 
autofs-4.1.0-3 

meminfo : 
--------- 

        total:    used:    free:  shared: buffers:  cached: 
Mem:  6074359808 4538081280 1536278528        0 309878784 3981922304 
Swap: 2146787328    24576 2146762752 
MemTotal:      5931992 kB 
MemFree:      1500272 kB 
MemShared:          0 kB 
Buffers:        302616 kB 
Cached:        3888572 kB 
SwapCached:        24 kB 
Active:        1094420 kB 
ActiveAnon:      28216 kB 
ActiveCache:  1066204 kB 
Inact_dirty:  2964768 kB 
Inact_laundry:  95724 kB 
Inact_clean:    95828 kB 
Inact_target:  850148 kB 
HighTotal:    5111680 kB 
HighFree:      1361904 kB 
LowTotal:      820312 kB 
LowFree:        138368 kB 
SwapTotal:    2096472 kB 
SwapFree:      2096448 kB 
HugePages_Total:    0 
HugePages_Free:      0 
Hugepagesize:    2048 kB 

slabinfo : 
---------- 

slabinfo - version: 1.1 (SMP) 
kmem_cache            80    80    244    5    5    1 : 1008  252 
nfs_write_data        50    50    384    5    5    1 :  496  124 
nfs_read_data        180    180    384  18  18    1 :  496  124 
nfs_page            300    300    128  10  10    1 : 1008  252 
ip_fib_hash          11    224    32    2    2    1 : 1008  252 
ext3_xattr            0      0    44    0    0    1 : 1008  252 
journal_head        855  13013    48  16  169    1 : 1008  252 
revoke_table          1    250    12    1    1    1 : 1008  252 
revoke_record        336    336    32    3    3    1 : 1008  252 
clip_arp_cache        0      0    256    0    0    1 : 1008  252 
ip_mrt_cache          0      0    128    0    0    1 : 1008  252 
tcp_tw_bucket        210    210    128    7    7    1 : 1008  252 
tcp_bind_bucket      336    336    32    3    3    1 : 1008  252 
tcp_open_request      30    30    128    1    1    1 : 1008  252 
inet_peer_cache      58    58    64    1    1    1 : 1008  252 
secpath_cache          0      0    128    0    0    1 : 1008  252 
xfrm_dst_cache        0      0    256    0    0    1 : 1008  252 
ip_dst_cache        1365  1365    256  91  91    1 : 1008  252 
arp_cache            60    60    256    4    4    1 : 1008  252 
flow_cache            0      0    128    0    0    1 : 1008  252 
blkdev_requests    3072  3090    128  103  103    1 : 1008  252 
kioctx                0      0    128    0    0    1 : 1008  252 
kiocb                  0      0    128    0    0    1 : 1008  252 
dnotify_cache          0      0    20    0    0    1 : 1008  252 
file_lock_cache      120    120    96    3    3    1 : 1008  252 
async_poll_table      0      0    140    0    0    1 : 1008  252 
fasync_cache          0      0    16    0    0    1 : 1008  252 
uid_cache              9    224    32    2    2    1 : 1008  252 
skbuff_head_cache  1426  1426    168  62  62    1 : 1008  252 
sock                355    355  1408  71  71    2 :  240  60 
sigqueue            1015  1015    132  35  35    1 : 1008  252 
kiobuf                0      0    128    0    0    1 : 1008  252 
cdev_cache          2088  2088    64  36  36    1 : 1008  252 
bdev_cache            3    116    64    2    2    1 : 1008  252 
mnt_cache            232    232    64    4    4    1 : 1008  252 
inode_cache        48259  52962    512 7566 7566    1 :  496  124 
dentry_cache      40380  40380    128 1346 1346    1 : 1008  252 
dquot                  0      0    128    0    0    1 : 1008  252 
filp                9380  9420    128  314  314    1 : 1008  252 
names_cache          12    12  4096  12  12    1 :  240  60 
buffer_head      552994 872970    108 24807 24942    1 : 1008  252 
mm_struct            160    160    384  16  16    1 :  496  124 
vm_area_struct      2324  2576    68  45  46    1 : 1008  252 
fs_cache            406    406    64    7    7    1 : 1008  252 
files_cache          161    161    512  23  23    1 :  496  124 
signal_cache        348    348    64    6    6    1 : 1008  252 
sighand_cache        115    115  1408  23  23    2 :  240  60 
pte_chain          3042  16650    128  277  555    1 : 1008  252 
pae_pgd              406    406    64    7    7    1 : 1008  252 
size-131072(DMA)      0      0 131072    0    0  32 :    0    0 
size-131072            0      0 131072    0    0  32 :    0    0 
size-65536(DMA)        0      0  65536    0    0  16 :    0    0 
size-65536            2      2  65536    2    2  16 :    0    0 
size-32768(DMA)        0      0  32768    0    0    8 :    0    0 
size-32768            8      8  32768    8    8    8 :    0    0 
size-16384(DMA)        0      0  16384    0    0    4 :    0    0 
size-16384            20    20  16384  20  20    4 :    0    0 
size-8192(DMA)        0      0  8192    0    0    2 :    0    0 
size-8192              6      6  8192    6    6    2 :    0    0 
size-4096(DMA)        0      0  4096    0    0    1 :  240  60 
size-4096            649    649  4096  649  649    1 :  240  60 
size-2048(DMA)        0      0  2048    0    0    1 :  240  60 
size-2048            246    306  2048  137  153    1 :  240  60 
size-1024(DMA)        0      0  1024    0    0    1 :  496  124 
size-1024            584    584  1024  146  146    1 :  496  124 
size-512(DMA)          0      0    512    0    0    1 :  496  124 
size-512            576    576    512  72  72    1 :  496  124 
size-256(DMA)          0      0    256    0    0    1 : 1008  252 
size-256            1095  1095    256  73  73    1 : 1008  252 
size-128(DMA)          0      0    128    0    0    1 : 1008  252 
size-128            2730  2730    128  91  91    1 : 1008  252 
size-64(DMA)          0      0    128    0    0    1 : 1008  252 
size-64            4410  4410    128  147  147    1 : 1008  252 
size-32(DMA)          0      0    64    0    0    1 : 1008  252 
size-32              580    580    64  10  10    1 : 1008  252 


Version-Release number of selected component (if applicable):
RHEL 3 U1

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Jeff Moyer 2004-05-03 21:10:32 UTC
Hi Van,

I don't know where you got autofs-4.1.0-3, but we don't support it in
any of our distributions.

We could take autofs out of the loop and try again, so we can at least
see if the problem may strictly be NFS related.  Would you mind trying
that?

If that doesn't elicit the problem, then we'll see about getting you
closer to a supported autofs configuration.

Thanks!

Jeff

Comment 3 Jp Robinson 2004-05-05 21:13:11 UTC
Jeff, 

Do we have a reason to believe that autofs may be at fault here?

The RPC message above leads Oracle and myself to believe otherwise. If
we have a good reason to believe autofs may be an issue or any
evidence that points that way, we'd like to know before trying to
duplicate with a different autofs.



Comment 4 Jeff Moyer 2004-05-05 21:41:43 UTC
It doesn't look to be autofs at first glance.  That's partly why I
suggested taking autofs out of the loop.  Please give it a try.

Comment 6 Mark Fasheh 2004-05-05 22:10:46 UTC
Jeff,
Turning off autofs on these systems will essentially make them
useless. Autofs4 is used *heavily* on them as an integral part of the
environment. Can we please begin debugging this problem as they're
hitting it often.

Regarding autofs, if you can give a good reason why it's an autofs
issue, I'll be happy to treat it as such, but otherwise taking autofs
out of the loop isn't really an option :/


Comment 7 Jeff Moyer 2004-05-05 22:20:34 UTC
Ok, that's too bad.

I will talk with our NFS maintainer and see if we can narrow things
down based on your bug report.

Comment 8 Steve Dickson 2004-05-06 14:23:47 UTC
It appears it could be a memory fragmentation problem.
Would it be possible to get an AltSysRq-m output?


Comment 9 Mark Fasheh 2004-05-06 17:45:34 UTC
No problem. I'll ask our guys to do that next time they hit the issue
(shouldn't be long).


Comment 10 Larry Woodman 2004-07-30 14:10:43 UTC
Where do we stand on this BUG?  Please verify that this is still a
problem with the latest RHEL3-U3 kernel.  We have made VM changes to
help deal with the memory fragmentation issue that are included in U3.

Larry


Comment 11 Greg Marsden 2004-08-02 20:57:28 UTC
Created attachment 102373 [details]
patch to failover to ZONE_DMA in case of fragmentation

Sorry, I should have pushed harder on this.

The fallback issue is not resolved in 17.EL, I've attached the 
one liner patch to resolve this issue...

patch from wli

Greg

Comment 12 Larry Woodman 2004-11-29 20:09:23 UTC
Van, I think this problem has ben fixed in RHEL3-U4, can you verify
this so we can close this bug?

Larry


Comment 13 Greg Marsden 2004-11-30 00:20:59 UTC
From what I've seen, this is fixed with the reduced-size ACLs for NFS
in U4. Is there a bug number we can reference here for this? Then this
bug can be closed.
Cheers,
Greg

Comment 14 Ernie Petrides 2004-12-03 03:10:45 UTC
Thanks for the info, Greg.  I'm closing this as a dup of bug 118839.

*** This bug has been marked as a duplicate of 118839 ***

Comment 15 John Flanagan 2004-12-20 20:55:03 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html