Bug 53758

Summary: Linux Error: eepro100: cmd_wait for(0xffffff80) timedout with(0xffffff80)!
Product: [Retired] Red Hat Linux Reporter: David S. Brown <dsbrown>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: alan
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David S. Brown 2001-09-17 21:17:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

Description of problem:
Intel eepro100 on STL2 MB with quant. 2 1000MHz processors, 1 GB Memory. 
Running: Linux l2 2.2.19-6.2.7smp #1 SMP Thu Jun 14 07:42:45 EDT 2001 i686 
unknown

lsmod
Module                  Size  Used by
nfs                    73472  40 (autoclean)
lockd                  45040   1 (autoclean) [nfs]
sunrpc                 63824   1 (autoclean) [nfs lockd]
e1000                  23408   1 (autoclean)
eepro100               17392   1 (autoclean)
aic7xxx               132960   4

gives error:
Linux Error: eepro100: cmd_wait for(0xffffff80) timedout with(0xffffff80)!

and then sometimes..
huge numbers of: 
kernel: nfs: task 291659 can't get a request slot 

This is similar but not exactly like: http://www.tux.org/hypermail/linux-
eepro100/2001-May/0010.html
http://www.cs.helsinki.fi/linux/linux-kernel/2001-00/0792.html

If I see the nfs task slot message the machine will hang, if I don't see 
that message the machine may be slow on nfs, but will probably work, in a 
half hearted way.

One or both errors are consistent if the machine is up over 24 hours.

I called RedHat Telephone support, they said, yes they've seen this, and 
no they don't have a solution.

So, its up to you.  They claim this is hard to fix because its 
intermittent, but not for me.  I am flogging this machine with very high 
traffic in a test environment.  Its pretty easy to reproduce if I flogg it 
enough.
 

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1.Simlulate my environment (as dictated above)
2.Flogg it with the compile and link of 25,000 files everything being 
mounted via nfs.
3. Repeat above for 24 - 48 hours
	

Actual Results:  Machine will spit out one or both error messages above, 
machine may lock up ethernet.  Reboot, or possibly ifdown eth0; ifup eth0 
will fix.

Expected Results:  obvious.

Additional info:

I can't upgrade beyond 6.2, this is a test machine, to test interaction of 
our product with RH6.2.

Comment 1 Arjan van de Ven 2001-09-17 21:21:21 UTC
2.2.19 also has a "e100" driver, for the same cards. Could you try this driver
instead ? (see /etc/modules.conf or /etc/conf.modules for where to change the
driver used)

Comment 2 David S. Brown 2001-09-17 21:58:31 UTC
I have made the suggested change and will test for another 48 hours.

--dsbrown

Comment 3 David S. Brown 2001-09-19 18:44:40 UTC
Tested under e100 module.  

I no longer see: eepro100: cmd_wait for(0xffffff80) timedout with(0xffffff80)!

At least at this point, 48 hours later.

But I still see: 

kernel: nfs: task 291659 can't get a request slot 

I also got a very nasty:

Stuck on TLB IPI wait (CPU#3)
followed by a non-responsive termial, I had to power-off reset.
(those may be ones in the error message I can't read my writing)

So, does this mean RedHat thinks it has Four(4) processors on my Two(2) 
processor box?

I suspect a kernel problem?  A lot of the similar errors I've read on Bugzilla 
mention a context switch problem. 



Comment 4 Arjan van de Ven 2001-09-19 18:52:07 UTC
Stuck on TLB IPI wait (CPU#3)


that is the internal count of the CPU, basically bioses number CPU's but only
number 0 is needed, the rest is "free form"... 

The message is often a hardware problem; passing "noapic" on the kernel
commandline (eg lilo prompt) seems to often work around it.

Comment 5 Bugzilla owner 2004-09-30 15:39:10 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/