Bug 190571 - REGRESSION: e100 hang on HP Integrity
Summary: REGRESSION: e100 hang on HP Integrity
Keywords:
Status: CLOSED DUPLICATE of bug 190162
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: ia64
OS: Linux
urgent
high
Target Milestone: ---
: ---
Assignee: Thomas Graf
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-05-03 17:25 UTC by Doug Chapman
Modified: 2014-06-18 08:29 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-05-12 15:19:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Doug Chapman 2006-05-03 17:25:17 UTC
Description of problem:
The latest kernels have an issue where the e100 causes a hang on bootup.  I am
not clear exactly when this issue was introduced since we have been seeing
serial console hangs over the past few weeks.

I see the hang in 2 different ways:
If the system is up and running an older good kernel and I do a warm reboot it
hangs as it brings up the network device.

If I cold boot the system (power on or virtual reset button via the MP) it hangs
at the udev step in the boot:


It does not look like this was caused by the latest e100 driver update that went
in to 2.6.9-34.15 since I did have kernels working after that.  I am in the
process of building from source so I can pull out specific patches to determine
the culprit.

Version-Release number of selected component (if applicable):
kernel-2.6.9-35 and possibly earlier.

How reproducible:
100% on the rx2600 system

Steps to Reproduce:
1. install update kernel
2. reset system via the MP
3. system hangs at the udev step in the bootup

  
Actual results:


Expected results:


Additional info:

Comment 1 Doug Chapman 2006-05-03 17:28:19 UTC
More info:

I did a sysrq-c to get a stack trace.  This is when the system hung after a
reset in the udev step.  I don't see anything e100 specific here but I have
verified that if I remove the e100.ko module the system boots cleanly.


 [<a000000100016d00>] show_stack+0x80/0xa0
                                sp=e00000003dd1f440 bsp=e00000003dd19758
 [<a000000100016d50>] dump_stack+0x30/0x60
                                sp=e00000003dd1f610 bsp=e00000003dd19740
 [<a000000100077280>] panic+0x660/0x6a0
                                sp=e00000003dd1f610 bsp=e00000003dd196c0
 [<a00000010003e8c0>] die+0x1c0/0x240
                                sp=e00000003dd1f670 bsp=e00000003dd19680
 [<a0000001000643a0>] ia64_do_page_fault+0x8c0/0xbc0
                                sp=e00000003dd1f670 bsp=e00000003dd19618
 [<a00000010000f560>] ia64_leave_kernel+0x0/0x260
                                sp=e00000003dd1f720 bsp=e00000003dd19618
 [<a000000100325f10>] sysrq_handle_crash+0x10/0x20
                                sp=e00000003dd1f8f0 bsp=e00000003dd19600
 [<a000000100326530>] __handle_sysrq+0xf0/0x280
                                sp=e00000003dd1f8f0 bsp=e00000003dd195b0
 [<a000000100326700>] handle_sysrq+0x40/0x60
                                sp=e00000003dd1f8f0 bsp=e00000003dd19580
 [<a0000001003453b0>] receive_chars+0x3d0/0x800
                                sp=e00000003dd1f8f0 bsp=e00000003dd194b0
 [<a000000100345ff0>] serial8250_interrupt+0x1b0/0x240
                                sp=e00000003dd1f900 bsp=e00000003dd19450
 [<a000000100013050>] handle_IRQ_event+0x90/0x120
                                sp=e00000003dd1f910 bsp=e00000003dd19410
 [<a000000100013b90>] do_IRQ+0x2d0/0x560
                                sp=e00000003dd1f910 bsp=e00000003dd193a0
 [<a000000100015c70>] ia64_handle_irq+0xf0/0x1e0
                                sp=e00000003dd1f910 bsp=e00000003dd19358
 [<a00000010000f560>] ia64_leave_kernel+0x0/0x260
                                sp=e00000003dd1f910 bsp=e00000003dd19358
 [<a000000100008ca0>] ia64_spinlock_contention+0x20/0x60
                                sp=e00000003dd1fae0 bsp=e00000003dd19358
 [<a000000100590740>] __lock_text_start+0x40/0x60
                                sp=e00000003dd1fae0 bsp=e00000003dd19350
 [<a000000100493fc0>] net_rx_action+0x380/0x460
                                sp=e00000003dd1fae0 bsp=e00000003dd192e8
 [<a000000100084a30>] __do_softirq+0x1f0/0x240
                                sp=e00000003dd1faf0 bsp=e00000003dd19258
 [<a000000100084af0>] do_softirq+0x70/0xc0
                                sp=e00000003dd1faf0 bsp=e00000003dd191f0
 [<a000000100015d30>] ia64_handle_irq+0x1b0/0x1e0
                                sp=e00000003dd1faf0 bsp=e00000003dd191a8
 [<a00000010000f560>] ia64_leave_kernel+0x0/0x260
                                sp=e00000003dd1faf0 bsp=e00000003dd191a8
 [<a0000001000f5ef0>] clear_page_tables+0x150/0x680
                                sp=e00000003dd1fcc0 bsp=e00000003dd19078
 [<a000000100103b60>] exit_mmap+0x180/0x4c0
                                sp=e00000003dd1fd70 bsp=e00000003dd19020
 [<a000000100072c80>] mmput+0x100/0x1a0
                                sp=e00000003dd1fe30 bsp=e00000003dd19000
 [<a00000010007cbe0>] __exit_mm+0x340/0x460
                                sp=e00000003dd1fe30 bsp=e00000003dd18fb8
 [<a00000010007e9e0>] do_exit+0x200/0x640
                                sp=e00000003dd1fe30 bsp=e00000003dd18f30
 [<a00000010007ef30>] do_group_exit+0x90/0x1c0
                                sp=e00000003dd1fe30 bsp=e00000003dd18f00
 [<a00000010007f080>] sys_exit_group+0x20/0x40
                                sp=e00000003dd1fe30 bsp=e00000003dd18ea8
 [<a00000010000f400>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000003dd1fe30 bsp=e00000003dd18ea8
 [<a000000000010640>] 0xa000000000010640
                                sp=e00000003dd20000 bsp=e00000003dd18ea8


Comment 2 John W. Linville 2006-05-03 17:49:03 UTC
Adding e100 maintainers to CC:... 
 
John, can you speculate on why e100 might have problems w/ ia64? 

Comment 3 John Ronciak 2006-05-03 18:08:11 UTC
Can the driver be loaded after the system is booted up without the driver being
loaded at boot?  This might show an error during the laod of the driver which
might point ot something.

Without seeing an error from loading it's hard to even guess at this.  Does this
happen with a kernel.org kernel? I know that might be hard for you to test since
the RH kernel is so different.

What PRO/100 NIC is it, lspci -vv?  Don't think i twould matter but you never
know.  Did older kernels work fine on this exact system?

Comment 4 Doug Chapman 2006-05-03 18:24:49 UTC
John,

answers to your questions above...

When I try to load the driver after the system is up it just locks up the
system.  No errors, just hung.

I have not yet tried a kernel.org kernel.

It has been working until todays kernel build.  Nothing specific to the e100
driver has changed so it must be a side effect of another change.

Here is the lspci -vv info for the card:

00:03.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 0d)
        Subsystem: Hewlett-Packard Company: Unknown device 1274
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 128 (2000ns min, 14000ns max)
        Interrupt: pin A routed to IRQ 53
        Region 0: Memory at 0000000080020000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at 0d00 [size=64]
        Region 2: Memory at 0000000080000000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-



Comment 5 Doug Chapman 2006-05-03 19:39:27 UTC
We have found the cause of this.  It appears to be the netpoll-bonding patch. 
We don't understand why this hangs here (and not on other configurations) but
backing it out does fix the problem.




Note You need to log in before you can comment on or make changes to this bug.