Bug 186584

Summary: Oops after network setup during boot
Product: [Fedora] Fedora Reporter: Chris Adams <linux>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-28 01:42:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
boot log and oops
none
log from 64 bit kernel
none
dmesg output none

Description Chris Adams 2006-03-24 15:56:10 UTC
On a new FC5 system, I cannot boot into anything other than single user mode. 
During a boot to runlevel 3 (with no "rhgb quiet" boot), I get oopses during or
right after network setup.

This is an Abit AV8 socket 939 mboard.  I am using the built-in gigabit
(via_velocity, listed as eth2) as well as a Compaq dual 10/100 PCI card (e100,
creates eth0 and eth1).  I have only configured eth0 and eth2 at this point.

When I boot in single user mode, I can ifup eth0 and ifup eth2, but at that
point, pretty much anything causes an oops.  I captured one with the full boot
messages (will attach); it oopsed when I ran "ps".

I tried loading the updates/testing/5 kernel and got the same behavior.

Comment 1 Chris Adams 2006-03-24 15:56:11 UTC
Created attachment 126639 [details]
boot log and oops

Comment 2 Chris Adams 2006-03-24 17:44:28 UTC
Created attachment 126655 [details]
log from 64 bit kernel

Here's a boot log from a minimal 64 bit install.  The common thing I see is the
"last sysfs file: /class/net/eth2/address" (eth2 is the via_velocity
interface).

Comment 3 Dave Jones 2006-03-24 23:31:00 UTC
can you give that box a going over with memtest86 ?
The value it oopsed on is kinda strange, and could be a random bit flip that
would force us to bypass a null ptr check.


Comment 4 Chris Adams 2006-03-24 23:39:11 UTC
I'll give it a try after I post this.  However, this box has been running FC3
(32 bit) for a while okay (I did the FC5 installs to alternate partitions).  I
also figured that if it was a RAM thing, I wouldn't have oopses at essentially
the same stage of boot on both 32 bit and 64 bit installs (since RAM use would
be significantly different).


Comment 5 Jay Cliburn 2006-03-25 02:43:13 UTC
I have the same motherboard, and hence the same onboard nic, and it seems to
work fine under my FC5 installation.

Comment 6 Chris Adams 2006-03-25 13:16:22 UTC
memtest86+ ran for 13 hours with no errors.

I'll try pulling the e100 NIC and see what happens.


Comment 7 Chris Adams 2006-03-25 22:03:58 UTC
Created attachment 126741 [details]
dmesg output

Okay, I pulled the e100 NIC.  The only card plugged in is the video (OEM ATI
Radeon 9200 AGP).  I moved the ifcfg-eth[01] out of the way and renamed
ifcfg-eth2 to eth0 (and modified the file), and modified modprobe.conf to only
reference eth0 (as via_velocity).

It still crashed during boot.  To get a log, I booted in single user mode and
started running S* scripts from rc3.d manually.  After starting S10network, I
got a line in dmesg output about sed segfaulting.  I continued to S12syslog and
got a general protection fault.

The system was still running at that point (I was able to copy off the dmesg
output), so I went on with no further problems until I started bluetooth.  At
that point, the kernel paniced (and it scrolled off the screen).

I'm attaching the log up through the syslog GPF.  If it is really needed, I can
try to get the later panic, but it'll take some time (the system has a serial
port but nothing else has one so I'll have to find a USB adapter for another
system).  Let me know if you need it or if you can make some sense from the
attached dmesg output.

If there's anything else I can try, let me know.

Comment 8 Chris Adams 2006-03-27 00:12:58 UTC
Okay, I found the culprit.  The network bit was a red herring; the real problem
was something that started earlier: cpuspeed.  When I disable the "Cool 'n
Quiet" option in my BIOS, the system boots with no problems.

Comment 9 Chris Adams 2006-03-27 02:17:52 UTC
I exchanged some more email with Jay Cliburn and compared systems.  We are
running the same BIOS version (and I also tried a new version just released last
week).  He's got a 3000+ CPU while I've got a 3200+.  Cool 'n Quiet works for
him (and cpuspeed works under Linux), while mine crashes.


Comment 10 Chris Adams 2006-03-28 01:42:25 UTC
I guess chalk this up to "user error" (although I'll blame the manufacturer).

When I built the system, I installed my 2 sticks of RAM in slots 3 and 4.  When
I move them to slots 1 and 2, the system appears to work.  I can't fully switch
to FC5 just yet (probably this weekend), but on FC3 I went ahead and loaded
powernow-k8 and started cpuspeed, and it is working there as well.

I blame Abit because they put a sticker over the RAM slot labels on the
motherboard.  I can see DIM on each slot but no numbers.

Bugzilla needs a PEBKAM resolution state.