Bug 186584

Summary:

Oops after network setup during boot

Product:

[Fedora] Fedora

Reporter:

Chris Adams <linux>

Component:

kernel

Assignee:

Dave Jones <davej>

Status:

CLOSED NOTABUG

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

pfrields, wtogami

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-03-28 01:42:25 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
boot log and oops	none
log from 64 bit kernel	none
dmesg output	none

Description Chris Adams 2006-03-24 15:56:10 UTC

On a new FC5 system, I cannot boot into anything other than single user mode. 
During a boot to runlevel 3 (with no "rhgb quiet" boot), I get oopses during or
right after network setup.

This is an Abit AV8 socket 939 mboard.  I am using the built-in gigabit
(via_velocity, listed as eth2) as well as a Compaq dual 10/100 PCI card (e100,
creates eth0 and eth1).  I have only configured eth0 and eth2 at this point.

When I boot in single user mode, I can ifup eth0 and ifup eth2, but at that
point, pretty much anything causes an oops.  I captured one with the full boot
messages (will attach); it oopsed when I ran "ps".

I tried loading the updates/testing/5 kernel and got the same behavior.

Comment 1 Chris Adams 2006-03-24 15:56:11 UTC

Created attachment 126639 [details]
boot log and oops

Comment 2 Chris Adams 2006-03-24 17:44:28 UTC

Created attachment 126655 [details]
log from 64 bit kernel

Here's a boot log from a minimal 64 bit install.  The common thing I see is the
"last sysfs file: /class/net/eth2/address" (eth2 is the via_velocity
interface).

Comment 3 Dave Jones 2006-03-24 23:31:00 UTC

can you give that box a going over with memtest86 ?
The value it oopsed on is kinda strange, and could be a random bit flip that
would force us to bypass a null ptr check.

Comment 4 Chris Adams 2006-03-24 23:39:11 UTC

I'll give it a try after I post this.  However, this box has been running FC3
(32 bit) for a while okay (I did the FC5 installs to alternate partitions).  I
also figured that if it was a RAM thing, I wouldn't have oopses at essentially
the same stage of boot on both 32 bit and 64 bit installs (since RAM use would
be significantly different).

Comment 5 Jay Cliburn 2006-03-25 02:43:13 UTC

I have the same motherboard, and hence the same onboard nic, and it seems to
work fine under my FC5 installation.

Comment 6 Chris Adams 2006-03-25 13:16:22 UTC

memtest86+ ran for 13 hours with no errors.

I'll try pulling the e100 NIC and see what happens.

Comment 7 Chris Adams 2006-03-25 22:03:58 UTC

Created attachment 126741 [details]
dmesg output

Okay, I pulled the e100 NIC.  The only card plugged in is the video (OEM ATI
Radeon 9200 AGP).  I moved the ifcfg-eth[01] out of the way and renamed
ifcfg-eth2 to eth0 (and modified the file), and modified modprobe.conf to only
reference eth0 (as via_velocity).

It still crashed during boot.  To get a log, I booted in single user mode and
started running S* scripts from rc3.d manually.  After starting S10network, I
got a line in dmesg output about sed segfaulting.  I continued to S12syslog and
got a general protection fault.

The system was still running at that point (I was able to copy off the dmesg
output), so I went on with no further problems until I started bluetooth.  At
that point, the kernel paniced (and it scrolled off the screen).

I'm attaching the log up through the syslog GPF.  If it is really needed, I can
try to get the later panic, but it'll take some time (the system has a serial
port but nothing else has one so I'll have to find a USB adapter for another
system).  Let me know if you need it or if you can make some sense from the
attached dmesg output.

If there's anything else I can try, let me know.

Comment 8 Chris Adams 2006-03-27 00:12:58 UTC

Okay, I found the culprit.  The network bit was a red herring; the real problem
was something that started earlier: cpuspeed.  When I disable the "Cool 'n
Quiet" option in my BIOS, the system boots with no problems.

Comment 9 Chris Adams 2006-03-27 02:17:52 UTC

I exchanged some more email with Jay Cliburn and compared systems.  We are
running the same BIOS version (and I also tried a new version just released last
week).  He's got a 3000+ CPU while I've got a 3200+.  Cool 'n Quiet works for
him (and cpuspeed works under Linux), while mine crashes.

Comment 10 Chris Adams 2006-03-28 01:42:25 UTC

I guess chalk this up to "user error" (although I'll blame the manufacturer).

When I built the system, I installed my 2 sticks of RAM in slots 3 and 4.  When
I move them to slots 1 and 2, the system appears to work.  I can't fully switch
to FC5 just yet (probably this weekend), but on FC3 I went ahead and loaded
powernow-k8 and started cpuspeed, and it is working there as well.

I blame Abit because they put a sticker over the RAM slot labels on the
motherboard.  I can see DIM on each slot but no numbers.

Bugzilla needs a PEBKAM resolution state.