Bug 10203
Summary: | Kernel panic on SMP HP Netserver | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Sergio Tadini <sergio.tadini> |
Component: | kernel | Assignee: | Michael K. Johnson <johnsonm> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 6.0 | CC: | dautrevaux, sergio.tadini |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2000-09-05 09:11:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sergio Tadini
2000-03-16 10:13:32 UTC
I've about the same kind of problem; I'm trying to add the second processor to a Linux box on a dual PIII/Xeon-550MHz Intel C440GX+ board, and I get a bunch of problems; the machine runs perfectly for about 24 hours (and it's incredible how fast it compiles :-)), then freezes ?-(. I'm usually no able to get any indication as to why it crashed (as it seems to like crashing in the middle of the nightly builds :-)), but occasionnaly it crshes in the day, and then I get the following behaviour: As long as you are not accessing an NFS mounted file system, for example logging as root from the system console, all is working perfectly, but as soon as you try to access one, you're dead :-( As long as it is working I get occasional complaints like these: svc: unknown program 100227 (me 100003) svc: unknown version (3) Note however that I also get these messages in single processor mode, so I'm not sure they are related to the problem. When freezed you from time to time see the following message on the system console: nfs: task 37637 can't get a request slot where the task number may change from message to message (I've seen at least 37638 and 37639) At this point the CPU is idle (top reports 1 running process and 99.8% idle CPU, with about 60Mb free memory out of a total 1Gb and no swap at all; swap is not even configured). Note that all these messages are related to NFS accesses to filesystems exported from a Solaris-2.6/PC system (running on an dual PII-450 SMP platform). I was using kernel 2.2.12-5 from RedHat-6.0, then 2.2.12-32 from RedHat-6.1 in uniprocessor mode, then switched to 2.2.12-32smp and now kernel 2.2.14-8smp (as provided by Ed Schlunder on http://www.ajusd.org/~edward/silkhat- 6.1/i386/kernel-smp-2.2.14-8.i686.rpm) on my RedHat-6.1 install. I get 'svc:' messages on all configs and crashes on all SMP kernels. Is there any other workaround than unplugging the second PII-550? even if it were aesthetic I don't thing my boss will appreciate I display a 1K$ proc on the wall over my desk :-( A significant amount of SMP work was done for 2.2.16 - has the 2.2.16 errata kernel helped > I just install it today (taking advantage of the fact that the whole team is now on holidays) to experiment with kernel-2.2.16-3smp and I keep you informed of th eresult; however I am also leaving for about two weeks so don't expect anything new before, except if it starts crashing faster than usual :-) Thanks for the good job :-) It's now about one month I'm running the 2.2.16 kernel errata in SMP mode and I've never crashed! So this seems to have cured my problem. Note that I still get the "kernel: svc: " messages from NFS however so that was not related to the SMP crashes at all :-) The svc message is logged when the solaris box tries to talk NFSv3 to us. Its probably a bit of excess verbosity on the Linux side to log this I agree. Glad to hear its happier. Reopen the bug if it turns out to be luck only Back to my problem of SMP kernel crashing. As said above I've updated to kernel-2.2.16-3smp in July and all works fine till about end of September. I then got one or two "silent" crashes during October: not fun but still not too bad except when it crashes during a week-end rebuild! But now, I'm starting a new phase in our projects and I have HUGE make batches, running for several days, getting the source files from a Solaris-7 box and putting all resulting files on a local SCSI disk, and it crshes about twice a day consistently since then, with the kernel freezed with the dreaded nfs: task xxxxx can't get a request slot (replace xxxxx by your favorite task ID) Note that since July several users are compiling in parallel, but their current directory was also NFS-mounted from the Solaris box and we seldomly crash; the difference now is that the current directory for the make runs is local and only the source files are picked (using VPATH) from an NFS-mounted tree. So it seems that the errata kernel do not fix this problem; IIRC I got this problem when testing the build environment I'm now using and stop compiling locally at about the time I install the errata. I'm afraid I've not enforced strictly enough the "all other things equal" paradigm :-o |