Bug 65772 - close() hangs on file in NFS-mounted dir using
Summary: close() hangs on file in NFS-mounted dir using
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Ben LaHaise
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-05-31 16:16 UTC by Erik Williamson
Modified: 2007-04-18 16:42 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-06-03 22:09:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2002:110 0 normal SHIPPED_LIVE Updated kernel with bugfixes available 2002-06-10 04:00:00 UTC

Description Erik Williamson 2002-05-31 16:16:39 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.0 (X11; Linux i686; U;) Gecko/20020516

Description of problem:
With home  directory mounted via NFS on a Solaris 8 server, If I try to use 'ar'
(in this case, there's other apps as well), the program freezes when attempting
to close the output file (found this out using strace).  Note that if I attempt
to do the same task in a directory mounted on a linux box, all is well.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. copy some object files ( .o ) to an nfs share on a solaris (8) box
2. run 'ar rc outlib.a *.o'
3. wait!
	

Actual Results:  'ar' successfully creates its temporary file in the directory,
yet when it tries to close(), it hangs.  The rest of the machine is responsive,
though.  'ps' shows that the ar process is in state 'D'.  I can magically
un-hang the process by ssh-ing to the box, and the process completes
successfully.  Wierd, huh?

Additional info:

Sometimes this works though, but I can't figure out why!  Sometimes it doesn't
work with a single file, sometimes it does.  sometimes if there's an existing
output file, and I'm adding a to the archive, it works.  While it almost
consistantly fails, sometimes it works...

Anyhow, thanks for the help!

Comment 1 Erik Williamson 2002-05-31 16:26:18 UTC
Sorry, I forgot to mention that I can prform the same task on RH 7.1 & 7.2 boxes
(completely patched) with the same dir mounted on the same server, and it works
just ducky.

Thanks Again!
Erik.

Comment 2 Ben LaHaise 2002-06-03 19:03:54 UTC
What network card/driver is being used?  It sounds like the driver is missing a
wakeup, which in turn causes NFS traffic to be delayed.

Comment 3 Erik Williamson 2002-06-03 19:23:28 UTC
The systems affected Are Dell Precision 530's with  - lsmod tells me they're
using the 3c59x module.

Here's what dmesg says:
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
04:0b.0: 3Com PCI 3c905C Tornado at 0xec80. Vers LK1.1.16

FWIW, this is an on-board NIC.

Cheers & thanks for getting on this so quick!
e.


Comment 4 Ben LaHaise 2002-06-03 19:30:56 UTC
Hmm, the 3c59x driver is in pretty good shape.  What is rsize/wsize set to? 
(cat /proc/mounts)  Try limiting them to 4K if they're set larger, and see if
that makes a difference.  The next 2.4.18 kernel erratum includes patches to
default to 
a smaller [rw]size.

Comment 5 Erik Williamson 2002-06-03 22:09:08 UTC
Beautiful - the smaller [rw]size fixed it - Thanks for the help!  

When do you anticipate the kernel release to be?

Thanks - e.


Comment 6 Ben LaHaise 2002-06-04 00:54:18 UTC
I can't give an exact timeframe other than "soon".  The errata kernel will be
2.4.18-4 or higher; please reopen if the fix included in that kernel doesn't work.


Note You need to log in before you can comment on or make changes to this bug.