Bug 242181 - eth0 hangs on system boot
Summary: eth0 hangs on system boot
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: i386
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Jeff Garzik
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-06-02 00:15 UTC by David
Modified: 2013-07-03 02:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-01-08 22:29:07 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description David 2007-06-02 00:15:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4

Description of problem:
eth 0 hangs on bootup. I have to physically remove the ethernet cable just for 2 seconds and bootup resumes. Mouse and screen are frozen completely.

Version-Release number of selected component (if applicable):
2.6-21-1.3194.

How reproducible:
Always


Steps to Reproduce:
1.Boot and it freezes
2.unplug ethernet cable and replug in
3.boots and runs perfectly

Actual Results:
Its frozen does nothing

Expected Results:


Additional info:
Sorry this bug is for F7 current release, i386. 00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)

It ran perfectly on FC6.

Comment 1 Mike McGuire 2007-06-03 00:10:21 UTC
Happens here on my laptop, if the cable is plugged in I have to remove it, if it
is unplugged during boot I have to plug it in to unfreeze the boot process.  If
I disable the Network Manager services it boots normally whether the cable is
plug in of unplugged

Comment 2 David Wald 2007-06-03 11:21:23 UTC
I get this also on a Gateway Desktop with Linksys Gigabit card.  If I do
interactive boot, skip network init, and then manually start the card, works OK.

Comment 3 David 2007-06-03 22:18:16 UTC
I just updated my last fc6 machine to f7, and it has the bug as well.  So I have
got 2 machines out of 4 that do this.

Its the same ethernet card Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit
Ethernet.

I even got inventive and moved one realtek from a good server to one of these
bad servers, it still failed and the good one still was okay.  Also tried
shuffling the PCI cards around to change the order, still the same.

Seems its going to be some issue based on motherboard chipset / cpu / bios / etc
against certain brands / models of ethernet adapters.

Comment 4 guyfeuillet 2007-06-04 13:20:21 UTC
problem idem ,
d-link DGE-528T gigabit ethernet adaptater 
Starting F7 ( no such issue in FC6 and previous releases),my modem being
connected, or extinct, I obtain systematic blocking at the stage of “starting
networking” ( bringing interface eth: 0), only basic solution is to extinguish
and relight the modem; if the modem is extinct I relight it simply: then I get
the following line - > determining IP information for eth 0… done, it is not
ideal, is it a real bug or a personnal bad network configuration?
Sorry for my english language

Comment 5 Christopher Johnson 2007-06-04 20:37:15 UTC
I have an r8169 as eth1, and tulip as eth0.  Booting kernel-2.6.21-1.3194.fc7
(686 arch) setting eth1 to start on boot the system hangs at "Determining IP
information for eth1" during boot, but if I set eth0 to start at boot, and eth1
to not start at boot, then the system boots normally.

If I boot kernel-xen-2.6.20-2925.9.fc7 (686 arch) the hang does not occur
starting  the r6189 interface at boot.

Comment 6 Christopher Johnson 2007-06-04 20:42:48 UTC
Bugs 242301 and 242357 appear to be duplicates of this.

Comment 7 David 2007-06-06 22:12:20 UTC
I read them and it looks the same.  I filed this bug way before those. I
disagree its a driver bug as stated in 242357, as just read above different
brands are affected.

My guess its to do with the ARP test when the device fires up.  It so reminds me
of a bug that surfaced in FC6 a while ago on a policy update, that basically
caused eth to fail and had to be restarted manually after boot up.

However I consider this bug is a high priority, as if you happen to be effected
by this bug, your server won't fire up without physically removing and
replugging in the offending eth card.

If the server is on a remote location that is very bad.

Comment 8 Mike McGuire 2007-06-07 19:33:02 UTC
I don't know if you are getting the same error, but here is a dump of the log:

Jun  2 20:13:52 localhost kernel: r8169: eth0: link up
Jun  2 20:14:10 localhost kernel: r8169: eth0: link up
Jun  2 20:14:10 localhost kernel: BUG: soft lockup detected on CPU#0!
Jun  2 20:14:10 localhost kernel:  [<c0451f3e>] softlockup_tick+0xa5/0xb4
Jun  2 20:14:10 localhost kernel:  [<c042e930>] update_process_times+0x3b/0x5e
Jun  2 20:14:10 localhost kernel:  [<c043d2bd>] tick_sched_timer+0x78/0xbb
Jun  2 20:14:10 localhost kernel:  [<c0439df5>] hrtimer_interrupt+0x12b/0x1b6
Jun  2 20:14:10 localhost kernel:  [<c043d245>] tick_sched_timer+0x0/0xbb
Jun  2 20:14:10 localhost kernel:  [<c0408534>] timer_interrupt+0x2c/0x32
Jun  2 20:14:10 localhost kernel:  [<c04521aa>] handle_IRQ_event+0x1a/0x3f
Jun  2 20:14:10 localhost kernel:  [<c04535ea>] handle_level_irq+0x81/0xc7
Jun  2 20:14:10 localhost kernel:  [<c04072c7>] do_IRQ+0xb8/0xd1
Jun  2 20:14:10 localhost kernel:  [<c04058ff>] common_interrupt+0x23/0x28
Jun  2 20:14:10 localhost kernel:  [<c04058ff>] common_interrupt+0x23/0x28
Jun  2 20:14:10 localhost kernel:  [<c0561704>] yenta_interrupt+0x13/0xb4
Jun  2 20:14:10 localhost kernel:  [<c04521aa>] handle_IRQ_event+0x1a/0x3f
Jun  2 20:14:10 localhost kernel:  [<c04535ea>] handle_level_irq+0x81/0xc7
Jun  2 20:14:10 localhost kernel:  [<c0453569>] handle_level_irq+0x0/0xc7
Jun  2 20:14:10 localhost kernel:  [<c04072bb>] do_IRQ+0xac/0xd1
Jun  2 20:14:10 localhost kernel:  [<c04058ff>] common_interrupt+0x23/0x28
Jun  2 20:14:10 localhost kernel:  [<c042b2dc>] __do_softirq+0x54/0xba
Jun  2 20:14:10 localhost kernel:  [<c04071b7>] do_softirq+0x59/0xb1
Jun  2 20:14:10 localhost kernel:  [<c0453569>] handle_level_irq+0x0/0xc7
Jun  2 20:14:10 localhost kernel:  [<c042b194>] irq_exit+0x38/0x6b
Jun  2 20:14:10 localhost kernel:  [<c04072cc>] do_IRQ+0xbd/0xd1
Jun  2 20:14:10 localhost kernel:  [<c04058ff>] common_interrupt+0x23/0x28
Jun  2 20:14:10 localhost kernel:  [<f8b0007b>] rtl8169_init_one+0x5c7/0x9d7 
[r8169]
Jun  2 20:14:10 localhost kernel:  [<c060171d>] _spin_unlock_irqrestore+0x8/0x9
Jun  2 20:14:10 localhost kernel:  [<f8aff1f7>] rtl8169_open+0x139/0x194 
[r8169]
Jun  2 20:14:10 localhost kernel:  [<c05a2f8d>] dev_open+0x2b/0x62
Jun  2 20:14:10 localhost kernel:  [<c05a19e1>] dev_change_flags+0x47/0xe4
Jun  2 20:14:10 localhost kernel:  [<c05a977b>] rtnl_setlink+0x264/0x365
Jun  2 20:14:10 localhost kernel:  [<c05a9517>] rtnl_setlink+0x0/0x365
Jun  2 20:14:10 localhost kernel:  [<c05a8dad>] rtnetlink_rcv_msg+0x1c1/0x1e6
Jun  2 20:14:10 localhost kernel:  [<c05b4e19>] netlink_run_queue+0x50/0xbe
Jun  2 20:14:10 localhost kernel:  [<c05a8bec>] rtnetlink_rcv_msg+0x0/0x1e6
Jun  2 20:14:10 localhost kernel:  [<c05a8bab>] rtnetlink_rcv+0x25/0x3d
Jun  2 20:14:10 localhost kernel:  [<c05b51b6>] netlink_data_ready+0x12/0x4c
Jun  2 20:14:10 localhost kernel:  [<c05b426a>] netlink_sendskb+0x19/0x30
Jun  2 20:14:10 localhost kernel:  [<c05b5198>] netlink_sendmsg+0x277/0x283
Jun  2 20:14:10 localhost kernel:  [<c0599180>] sock_sendmsg+0xd0/0xeb
Jun  2 20:14:10 localhost kernel:  [<c0436e71>] 
autoremove_wake_function+0x0/0x35
Jun  2 20:14:10 localhost kernel:  [<c0436e71>] 
autoremove_wake_function+0x0/0x35
Jun  2 20:14:10 localhost kernel:  [<c04e7100>] copy_from_user+0x3a/0x66
Jun  2 20:14:10 localhost kernel:  [<c059932d>] sys_sendmsg+0x192/0x1f7
Jun  2 20:14:10 localhost kernel:  [<c0599e0d>] sys_recvmsg+0x1b9/0x1cd
Jun  2 20:14:10 localhost kernel:  [<c04e7350>] copy_to_user+0x3c/0x50
Jun  2 20:14:10 localhost kernel:  [<c0599c3c>] move_addr_to_user+0x50/0x68
Jun  2 20:14:13 localhost kernel:  [<c059a0d6>] sys_getsockname+0x9f/0xb0
Jun  2 20:14:13 localhost kernel:  [<c06016f4>] _spin_lock_bh+0x8/0x18
Jun  2 20:14:13 localhost kernel:  [<c059adb6>] release_sock+0x12/0x9d
Jun  2 20:14:13 localhost kernel:  [<c059a4fc>] sys_socketcall+0x240/0x261
Jun  2 20:14:13 localhost kernel:  [<c0404f70>] syscall_call+0x7/0xb
Jun  2 20:14:13 localhost kernel:  =======================
Jun  2 20:14:13 localhost kernel: r8169: eth0: link down

It makes no difference if it is set for Static or DHCP, like mentioned a hard 
lock until either the cable is removed or cable is plugged in.


Comment 9 Seth R 2007-06-08 23:55:12 UTC
Same issue here ... 
using the r816 driver .. 

00:0b.0 Ethernet controller: Linksys Gigabit Network Adapter (rev 10)
        Subsystem: Linksys EG1032 v3 Instant Gigabit Network Adapter
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 11
        I/O ports at e800 [size=256]
        Memory at df000000 (32-bit, non-prefetchable) [size=256]
        [virtual] Expansion ROM at 20000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2




Comment 10 David 2007-06-09 00:05:33 UTC
Any progress on this bug?  Its a very early bug # against F7 and its important
as if the server is rebooted or suffers a power failure it will not start
without physical intervention.

Comment 11 Mike McGuire 2007-06-09 12:48:34 UTC
OK, I just found a little bit more.  When booting Fedora 7, If I select "I" for
interactive mode, when it gets to the line:

Start Service Network   Y/N (C)ontinue   I select Yes and error happens

FATAL:   Module not found   (Next Line)

Bringing up loopback interface      (OK)

FATAL:   Module not found


But the computer boots without plugging the ethernet caable in...

Comment 12 Mike McGuire 2007-06-09 13:00:31 UTC
But again starting with the cable plugged in, I go to "I"nteractive mode, and it
get to the line:

Start NetworkManagerDispatcher and it hangs until I disconnect the cable, it
will probably get past this part if I shut off the Service for Dispatcher

Comment 13 Jonathan Jordan 2007-06-09 16:58:38 UTC
I found a driver for the 8169 chipset on the Realtek website that was released
May 23, but when I try to compile it, I get a few errors. You can find the
driver here,
http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=4&Level=5&Conn=4&DownTypeID=3&GetDown=false&Downloads=true#5,7,8,10,982

Can someone try to install it?

Comment 14 David 2007-06-10 01:21:31 UTC
There is no point trying this, as its not just realtek that has this issue.  Its
something else besides the actual driver.  Its something in Fedora that does an
ARP test on bootup / starting.

Therefore rebuilding the driver is pointless exercise, as with all these issues
some people run the same card and have no issues, others do.


Comment 15 David 2007-06-11 23:48:58 UTC
I changed this to kernel the bug is one of the first filed against F7 and its
still new :(

I think its arpwatch, its something in the ethernet startup

Its got to be fixed, its really urgent as you cant restart your server unless
you are physically there and unplug/replug the ethernet cable

Comment 16 Mike McGuire 2007-06-14 01:22:51 UTC
David,

They might be focused on this one as it is related....

http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242572

Comment 17 David 2007-06-14 01:48:05 UTC
Yes its similar, except they are totally focused on the realtek. Other cards are effected. Its also not the kernel. I forced on the fc6 kernel and it still does it. Its network manager or arpwatch.

Comment 18 David 2007-06-18 22:51:18 UTC
I changed this to kernel.  After a bunch of package updates, it was still
hanging on boot.  However I forced on the fc6 kernel and the machine boots
floorlessly and no eth0 hang up.

Therefore its the f7 kernel.

Comment 19 David 2007-07-04 23:13:01 UTC
Any update?  Can someone at redhat at least assign this bug?  I know it requires
a new kernel to be tested.

Comment 20 Seth R 2007-07-05 03:55:43 UTC
Its no longer an issue with the latest f7 kernel, at least for me. 


Comment 21 David 2007-07-18 00:26:34 UTC
Nope its still broken.  I have tried a few kernels from updates-testing and no
change at all.

I still have two machines doing this.  Note you need to be careful, I have found
that intermittently the two machines I have doing this boot up, but this is
regardless of any F7 kernel version.

Comment 22 Andy Gospodarek 2007-07-18 19:50:45 UTC
Reproduced this on an installed system.  

Looks like the trick might be a 32-bit UP system with an smp kernel (the default
f7 kernel is smp).  A UP kernel might die too, but we don't build one of those
anymore, so I can't say if that makes a difference.

I did notice that it doesn't *always* hang while booting, but it seems to happen
most of the time.  Seeing soft-lockups like that makes me wonder if some code
was added recently that works well on true SMP systems but not on UP ones.

Bug 242572 seems to be a dup of this....

Comment 23 Christopher Brown 2007-09-13 19:39:41 UTC
Hello folks,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

The bug mentioned in #22 looks resolved and indeed appears a dupe.

If the problem has gone away then please close this bug or I'll do so in a few
days if there is no additional information lodged.

Cheers
Chris

Comment 24 David 2008-01-08 22:29:07 UTC
Chris,

Yes it was fixed and did not ever resurface on Fedora 7.  I also never have seen
it in Fedora 8.


Note You need to log in before you can comment on or make changes to this bug.