From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322) Description of problem: System begins booting normally. Then, when individual services are starting (named, ntpd, etc.) the system will suddenly run extremely slowly with individual services taking a few minutes each to start. After 10-15 minutes when the system has finally started it is impossible to log in - login eventually times out before the Password: prompt. The only way to recover is to power off the machine resulting in possible file system corruption - Ctrl-Alt-Del has no effect. The system has a single P4 2.4 hyperthreading processor. With HT enabled and running an SMP kernel, this slow startup happens every time. With the standard kernel I *think* it only happens after a "shutdown -r". If I "shutdown -h" so that the power is switched off, it starts normally. Version-Release number of selected component (if applicable): kernel-2.4.21-15.0.3.EL How reproducible: Sometimes Steps to Reproduce: 1. Enable hyperthreading in BIOS 2. Boot with SMP kernel, or "warm boot" with standard kernel. 3. Problem occurs when services are starting Actual Results: Problem as described above Expected Results: System should start normally Additional info: When the system is in this very slow state it responds normally to network pings. I can't see anything obvious in /var/log/messages during one of these very slow startups other than the amount of time it takes for services to log their startup messages.
Had to reboot the box today, so did a shutdown -h to avoid above issue. Left it a couple of minutes then powered on. Startup progressed normally until services were starting. It then went slow, as described above. Powered off, left a few minutes, powered on again. Same problem. Went through this procedure a few times. On the 5th bootup it started normally. After about 5 minutes I noticed that it suddenly started behaving slowly again because I couldn't ssh in from another machine. My existing ssh session was still OK, so I did a shutdown -h. The machine took 20 minutes to shutdown - each service was taking 1-2 minutes to stop. After next bootup it was OK and is still OK after 2 hours. I don't think that this is a Dell issue - I am running Enterprise 3 on an old PII 400 MHz machine, too, and this same problem has happened once on shutdown.
No improvements with the kernel from Update 3 which I installed earlier. Cannot reboot the box at all now - have tried 5 times so far but it grinds to halt during startup as described above :( I downloaded and burned a KNOPPIX 3.4 CD earlier today - it boots fine.
I'm sure I don't know what to ask either... Jeff Burke -- do we have this type of Dell machine in-house?
I've now discovered that if I disconnect the network cable during boot up then it starts normally, every time. Only with the network cable connected does the slowness occur. The network interface is detected as e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection lspci shows: 02:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) It's plugged into a cheap Belkin 8-port 10/100 switch.
Maybe John has some ideas?
If you boot-up w/ the network unplugged, then plug-in the network and ifup the interface, do you still get the problem? Have you tried other network cards with this box, plugged-in to the same network port? And/or other boxes plugged-in to the same port? Do they behave correctly? If you boot-up w/ the card plugged-in (so you get the "slowness"), then unplug the card, does the slowness disappear? If the slowness disappears after unplugging (or if you can survive the slowness long enough), please post the contents of /proc/interrupts. I would tend to suspect that there is a problem w/ the card, perhaps resulting in an inordinately large amount of interrupts being processed? Just a guess, really...
It Comment #2 It was said "I don't think that this is a Dell issue - I am running Enterprise 3 on an old PII 400 MHz machine, too, and this same problem has happened once on shutdown." is this still true? Are both of these systems plugged into the same "cheap Belkin 8-port 10/100 switch"? If so can you move them to a different switch. Also by the sounds of it the belkin does not have a managment interface. If it does could you get the port statistics for the system that are having the issues. I would also like to confirm that this issue happens regardless if you do a shutdown -h or a shutdown -r correct. Could you also check if your system is running DKMS. /sbin/chkconfig --list | grep dkms If it is could you send the status of the DKMS application. /usr/sbin/dkms status
It has happened on the pII 400 machine just once more, when I rebooted after installing Update 5. This machine is connected to a Netgear 10/100 hub on a different LAN segment. However, it sorted itself out and, after about 45 minutes, was accepting SSH sessions again. Back to the Dell machine - the Belkin is unmanaged. It happens less frequently with a shutdown -h (including power-off). It also seems to happen more when the environment is hot. Sounds odd, I know, but it has happened less in winter when the room is around 20C than summer when it's up to 30C. I have a second card in the Dell - a cheap RealTek. This one doesn't appear to cause any problems but it is plugged into a separate hub. I'll try a different switch and report back.
Reassigning this to John Linville and reverting state to NEEDINFO.
I'm going to close this based on inactivity. Please reopen if the problem remains and include the information discussed in comment 9 through comment 11. Thanks!