Red Hat Bugzilla – Bug 186584
Oops after network setup during boot
Last modified: 2015-01-04 17:26:07 EST
On a new FC5 system, I cannot boot into anything other than single user mode.
During a boot to runlevel 3 (with no "rhgb quiet" boot), I get oopses during or
right after network setup.
This is an Abit AV8 socket 939 mboard. I am using the built-in gigabit
(via_velocity, listed as eth2) as well as a Compaq dual 10/100 PCI card (e100,
creates eth0 and eth1). I have only configured eth0 and eth2 at this point.
When I boot in single user mode, I can ifup eth0 and ifup eth2, but at that
point, pretty much anything causes an oops. I captured one with the full boot
messages (will attach); it oopsed when I ran "ps".
I tried loading the updates/testing/5 kernel and got the same behavior.
Created attachment 126639 [details]
boot log and oops
Created attachment 126655 [details]
log from 64 bit kernel
Here's a boot log from a minimal 64 bit install. The common thing I see is the
"last sysfs file: /class/net/eth2/address" (eth2 is the via_velocity
can you give that box a going over with memtest86 ?
The value it oopsed on is kinda strange, and could be a random bit flip that
would force us to bypass a null ptr check.
I'll give it a try after I post this. However, this box has been running FC3
(32 bit) for a while okay (I did the FC5 installs to alternate partitions). I
also figured that if it was a RAM thing, I wouldn't have oopses at essentially
the same stage of boot on both 32 bit and 64 bit installs (since RAM use would
be significantly different).
I have the same motherboard, and hence the same onboard nic, and it seems to
work fine under my FC5 installation.
memtest86+ ran for 13 hours with no errors.
I'll try pulling the e100 NIC and see what happens.
Created attachment 126741 [details]
Okay, I pulled the e100 NIC. The only card plugged in is the video (OEM ATI
Radeon 9200 AGP). I moved the ifcfg-eth out of the way and renamed
ifcfg-eth2 to eth0 (and modified the file), and modified modprobe.conf to only
reference eth0 (as via_velocity).
It still crashed during boot. To get a log, I booted in single user mode and
started running S* scripts from rc3.d manually. After starting S10network, I
got a line in dmesg output about sed segfaulting. I continued to S12syslog and
got a general protection fault.
The system was still running at that point (I was able to copy off the dmesg
output), so I went on with no further problems until I started bluetooth. At
that point, the kernel paniced (and it scrolled off the screen).
I'm attaching the log up through the syslog GPF. If it is really needed, I can
try to get the later panic, but it'll take some time (the system has a serial
port but nothing else has one so I'll have to find a USB adapter for another
system). Let me know if you need it or if you can make some sense from the
attached dmesg output.
If there's anything else I can try, let me know.
Okay, I found the culprit. The network bit was a red herring; the real problem
was something that started earlier: cpuspeed. When I disable the "Cool 'n
Quiet" option in my BIOS, the system boots with no problems.
I exchanged some more email with Jay Cliburn and compared systems. We are
running the same BIOS version (and I also tried a new version just released last
week). He's got a 3000+ CPU while I've got a 3200+. Cool 'n Quiet works for
him (and cpuspeed works under Linux), while mine crashes.
I guess chalk this up to "user error" (although I'll blame the manufacturer).
When I built the system, I installed my 2 sticks of RAM in slots 3 and 4. When
I move them to slots 1 and 2, the system appears to work. I can't fully switch
to FC5 just yet (probably this weekend), but on FC3 I went ahead and loaded
powernow-k8 and started cpuspeed, and it is working there as well.
I blame Abit because they put a sticker over the RAM slot labels on the
motherboard. I can see DIM on each slot but no numbers.
Bugzilla needs a PEBKAM resolution state.