Bug 128109

Summary: Dell PowerEdge 400SC grinds to a halt during startup.
Product: Red Hat Enterprise Linux 3 Reporter: Mike Zanker <past.bell9759>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: anderson, jburke, linville, lwoodman, petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-03 15:31:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Zanker 2004-07-18 07:00:14 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET 
CLR 1.0.3705; .NET CLR 1.1.4322)

Description of problem:
System begins booting normally. Then, when individual services are 
starting (named, ntpd, etc.) the system will suddenly run extremely 
slowly with individual services taking a few minutes each to start. 
After 10-15 minutes when the system has finally started it is 
impossible to log in - login eventually times out before the 
Password: prompt. The only way to recover is to power off the machine 
resulting in possible file system corruption - Ctrl-Alt-Del has no 
effect.
The system has a single P4 2.4 hyperthreading processor. With HT 
enabled and running an SMP kernel, this slow startup happens every 
time. With the standard kernel I *think* it only happens after 
a "shutdown -r". If I "shutdown -h" so that the power is switched 
off, it starts normally.



Version-Release number of selected component (if applicable):
kernel-2.4.21-15.0.3.EL

How reproducible:
Sometimes

Steps to Reproduce:
1. Enable hyperthreading in BIOS
2. Boot with SMP kernel, or "warm boot" with standard kernel.
3. Problem occurs when services are starting
    

Actual Results:  Problem as described above

Expected Results:  System should start normally

Additional info:

When the system is in this very slow state it responds normally to 
network pings.
I can't see anything obvious in /var/log/messages during one of these 
very slow startups other than the amount of time it takes for 
services to log their startup messages.

Comment 2 Mike Zanker 2004-08-08 16:31:35 UTC
Had to reboot the box today, so did a shutdown -h to avoid above 
issue. Left it a couple of minutes then powered on. Startup 
progressed normally until services were starting. It then went slow, 
as described above. Powered off, left a few minutes, powered on 
again. Same problem. Went through this procedure a few times. On the 
5th bootup it started normally. After about 5 minutes I noticed that 
it suddenly started behaving slowly again because I couldn't ssh in 
from another machine. My existing ssh session was still OK, so I did 
a shutdown -h. The machine took 20 minutes to shutdown - each service 
was taking 1-2 minutes to stop. After next bootup it was OK and is 
still OK after 2 hours.
I don't think that this is a Dell issue - I am running Enterprise 3 
on an old PII 400 MHz machine, too, and this same problem has 
happened once on shutdown.

Comment 4 Mike Zanker 2004-09-03 19:54:44 UTC
No improvements with the kernel from Update 3 which I installed 
earlier. Cannot reboot the box at all now - have tried 5 times so far 
but it grinds to halt during startup as described above :(
I downloaded and burned a KNOPPIX 3.4 CD earlier today - it boots 
fine.

Comment 6 Dave Anderson 2005-06-07 12:38:16 UTC
I'm sure I don't know what to ask either...

Jeff Burke -- do we have this type of Dell machine in-house?



Comment 7 Mike Zanker 2005-06-07 12:47:27 UTC
I've now discovered that if I disconnect the network cable during boot up then 
it starts normally, every time. Only with the network cable connected does the 
slowness occur. The network interface is detected as

e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection

lspci shows:

02:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 02)

It's plugged into a cheap Belkin 8-port 10/100 switch.

Comment 8 Dave Anderson 2005-06-07 12:54:09 UTC
Maybe John has some ideas?


Comment 9 John W. Linville 2005-06-07 13:25:31 UTC
If you boot-up w/ the network unplugged, then plug-in the network and ifup the 
interface, do you still get the problem? 
 
Have you tried other network cards with this box, plugged-in to the same 
network port?  And/or other boxes plugged-in to the same port?  Do they behave 
correctly? 
 
If you boot-up w/ the card plugged-in (so you get the "slowness"), then unplug 
the card, does the slowness disappear?  If the slowness disappears after 
unplugging (or if you can survive the slowness long enough), please post the 
contents of /proc/interrupts. 
 
I would tend to suspect that there is a problem w/ the card, perhaps resulting 
in an inordinately large amount of interrupts being processed?  Just a guess, 
really... 

Comment 10 Jeff Burke 2005-06-07 13:39:14 UTC
  It Comment #2 It was said "I don't think that this is a Dell issue - I am
running Enterprise 3 on an old PII 400 MHz machine, too, and this same problem
has happened once on shutdown." is this still true?

  Are both of these systems plugged into the same "cheap Belkin 8-port 10/100
switch"? If so can you move them to a different switch. Also by the sounds of it 
the belkin does not have a managment interface. If it does could you get the
port statistics for the system that are having the issues.

  I would also like to confirm that this issue happens regardless if you do a 
shutdown -h or a shutdown -r correct.

  Could you also check if your system is running DKMS.
      /sbin/chkconfig --list | grep dkms
  If it is could you send the status of the DKMS application.
      /usr/sbin/dkms status


Comment 11 Mike Zanker 2005-06-07 14:26:42 UTC
It has happened on the pII 400 machine just once more, when I rebooted after 
installing Update 5. This machine is connected to a Netgear 10/100 hub on a 
different LAN segment. However, it sorted itself out and, after about 45 
minutes, was accepting SSH sessions again.

Back to the Dell machine - the Belkin is unmanaged.
It happens less frequently with a shutdown -h (including power-off). It also 
seems to happen more when the environment is hot. Sounds odd, I know, but it 
has happened less in winter when the room is around 20C than summer when it's 
up to 30C.

I have a second card in the Dell - a cheap RealTek. This one doesn't appear to 
cause any problems but it is plugged into a separate hub. 

I'll try a different switch and report back.

Comment 12 Ernie Petrides 2005-06-07 15:44:22 UTC
Reassigning this to John Linville and reverting state to NEEDINFO.

Comment 13 John W. Linville 2005-08-03 15:31:27 UTC
I'm going to close this based on inactivity.  Please reopen if the problem 
remains and include the information discussed in comment 9 through comment 11.  
Thanks!