Bug 498854 - Kernel reports badness while runing ifup in Fedora11 Alpha on JS22 blade
Kernel reports badness while runing ifup in Fedora11 Alpha on JS22 blade
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
11
ppc64 All
low Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-03 22:40 EDT by IBM Bug Proxy
Modified: 2009-10-27 05:51 EDT (History)
3 users (show)

See Also:
Fixed In Version: 2.6.29.6-213.fc11
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-22 17:58:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Comand sequence and calltraces produced. Machine hangs when 'reboot' command is executed. (17.06 KB, text/plain)
2009-05-03 22:40 EDT, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 51563 None None None Never

  None (edit)
Description IBM Bug Proxy 2009-05-03 22:40:35 EDT
Problem Description :
=================
JS22 blade installed with Fedora 11 Alpha gives a call trace as shown below. Machine contains 2 Host
Ethernet Adapters(eth0 and eth2).  Machine hangs when 'reboot' command is executed.

root@mjs22lp1 ~]# ifup eth0

[ INFO: possible circular locking dependency detected ]
2.6.29-0.66.rc3.fc11.ppc64 #1
-------------------------------------------------------
ip/2515 is trying to acquire lock:
 (&ehea_fw_handles.lock){--..}, at: [<d00000000051eae8>] .ehea_up+0x6c/0x730 [ehea]

but task is already holding lock:
 (&port->port_lock){--..}, at: [<d00000000051f378>] .ehea_open+0x3c/0x118 [ehea]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&port->port_lock){--..}:
       [<c0000000000c292c>] .lock_acquire+0x54/0x80
       [<c000000000575ff8>] .mutex_lock_nested+0x1ac/0x49c
       [<d00000000051f378>] .ehea_open+0x3c/0x118 [ehea]
       [<c0000000004bcf5c>] .dev_open+0xe8/0x168
       [<c0000000004bc64c>] .dev_change_flags+0x10c/0x214
       [<c0000000004c7528>] .do_setlink+0x36c/0x484
       [<c0000000004c9188>] .rtnl_newlink+0x38c/0x5b4
       [<c0000000004c8db0>] .rtnetlink_rcv_msg+0x264/0x2b0
       [<c0000000004e022c>] .netlink_rcv_skb+0x74/0x108
       [<c0000000004c8b28>] .rtnetlink_rcv+0x38/0x5c
       [<c0000000004dfa7c>] .netlink_unicast+0x304/0x40c
       [<c0000000004dfe58>] .netlink_sendmsg+0x2d4/0x314
       [<c0000000004a6f7c>] .sock_sendmsg+0xe0/0x11c
       [<c0000000004a71ac>] .SyS_sendmsg+0x1f4/0x288
       [<c0000000004d1870>] .compat_sys_socketcall+0x1ec/0x238
       [<c0000000000085f0>] syscall_exit+0x0/0x40

-> #1 (rtnl_mutex){--..}:
       [<c0000000000c292c>] .lock_acquire+0x54/0x80
       [<c000000000575ff8>] .mutex_lock_nested+0x1ac/0x49c
       [<c0000000004c8ad8>] .rtnl_lock+0x20/0x38
       [<c0000000004bdb28>] .register_netdev+0x1c/0x80
       [<d00000000051d2e8>] .ehea_setup_single_port+0x2d8/0x41c [ehea]
       [<d00000000052423c>] .ehea_probe_adapter+0x300/0x3e8 [ehea]
       [<c0000000004a4124>] .of_platform_device_probe+0x80/0xb8
       [<c000000000380d24>] .driver_probe_device+0x114/0x1fc
       [<c000000000380ea0>] .__driver_attach+0x94/0xd8
       [<c0000000003802e8>] .bus_for_each_dev+0x7c/0xdc
       [<c000000000380ab0>] .driver_attach+0x28/0x40
       [<c00000000037f9ac>] .bus_add_driver+0xcc/0x280
       [<c0000000003811a4>] .driver_register+0xe4/0x1bc
       [<c0000000004a3fdc>] .of_register_driver+0x44/0x58
       [<c000000000024c70>] .ibmebus_register_driver+0x30/0x4c
       [<d0000000005244f8>] .ehea_module_init+0x1d4/0x2374 [ehea]
       [<c000000000009434>] .do_one_initcall+0x9c/0x1dc
       [<c0000000000ce684>] .SyS_init_module+0xd8/0x234
       [<c0000000000085f0>] syscall_exit+0x0/0x40

-> #0 (&ehea_fw_handles.lock){--..}:
       [<c0000000000c292c>] .lock_acquire+0x54/0x80
       [<c000000000575ff8>] .mutex_lock_nested+0x1ac/0x49c
       [<d00000000051eae8>] .ehea_up+0x6c/0x730 [ehea]
       [<d00000000051f3a0>] .ehea_open+0x64/0x118 [ehea]
       [<c0000000004bcf5c>] .dev_open+0xe8/0x168
       [<c0000000004bc64c>] .dev_change_flags+0x10c/0x214
       [<c0000000004c7528>] .do_setlink+0x36c/0x484
       [<c0000000004c9188>] .rtnl_newlink+0x38c/0x5b4
       [<c0000000004c8db0>] .rtnetlink_rcv_msg+0x264/0x2b0
       [<c0000000004e022c>] .netlink_rcv_skb+0x74/0x108
       [<c0000000004c8b28>] .rtnetlink_rcv+0x38/0x5c
       [<c0000000004dfa7c>] .netlink_unicast+0x304/0x40c
       [<c0000000004dfe58>] .netlink_sendmsg+0x2d4/0x314
       [<c0000000004a6f7c>] .sock_sendmsg+0xe0/0x11c
       [<c0000000004a71ac>] .SyS_sendmsg+0x1f4/0x288
       [<c0000000004d1870>] .compat_sys_socketcall+0x1ec/0x238
       [<c0000000000085f0>] syscall_exit+0x0/0x40

other info that might help us debug this:

2 locks held by ip/2515:
 #0:  (rtnl_mutex){--..}, at: [<c0000000004c8b18>] .rtnetlink_rcv+0x28/0x5c
 #1:  (&port->port_lock){--..}, at: [<d00000000051f378>] .ehea_open+0x3c/0x118 [ehea]

stack backtrace:
Call Trace:
[c000000051156bf0] [c0000000000117d8] .show_stack+0x6c/0x16c (unreliable)
[c000000051156ca0] [c0000000000c0cb0] .print_circular_bug_tail+0xd8/0xfc
[c000000051156d70] [c0000000000c2178] .__lock_acquire+0x1080/0x17e0
[c000000051156e70] [c0000000000c292c] .lock_acquire+0x54/0x80
[c000000051156f00] [c000000000575ff8] .mutex_lock_nested+0x1ac/0x49c
[c000000051157010] [d00000000051eae8] .ehea_up+0x6c/0x730 [ehea]
[c000000051157120] [d00000000051f3a0] .ehea_open+0x64/0x118 [ehea]
[c0000000511571c0] [c0000000004bcf5c] .dev_open+0xe8/0x168
[c000000051157250] [c0000000004bc64c] .dev_change_flags+0x10c/0x214
[c0000000511572f0] [c0000000004c7528] .do_setlink+0x36c/0x484
[c0000000511573d0] [c0000000004c9188] .rtnl_newlink+0x38c/0x5b4
[c0000000511575e0] [c0000000004c8db0] .rtnetlink_rcv_msg+0x264/0x2b0
[c000000051157690] [c0000000004e022c] .netlink_rcv_skb+0x74/0x108
[c000000051157720] [c0000000004c8b28] .rtnetlink_rcv+0x38/0x5c
[c0000000511577b0] [c0000000004dfa7c] .netlink_unicast+0x304/0x40c
[c000000051157880] [c0000000004dfe58] .netlink_sendmsg+0x2d4/0x314
[c000000051157970] [c0000000004a6f7c] .sock_sendmsg+0xe0/0x11c
[c000000051157b70] [c0000000004a71ac] .SyS_sendmsg+0x1f4/0x288
[c000000051157d90] [c0000000004d1870] .compat_sys_socketcall+0x1ec/0x238
[c000000051157e30] [c0000000000085f0] syscall_exit+0x0/0x40
ehea: eth0: Physical port up
ehea: External switch port is backup port
eth0: no IPv6 routers present

Machine : JS22 blade
CPU Type: power6
Model Type: 7998-61X

Issue is reproducible.

Attachment:  ifup.log for command sequence I followed.

=Comment: #4=================================================
ANOOP VIJAYAN <anoop.vijayan@in.ibm.com> - 

I think the more important issues here are the list corruption while doing ifdown-ifup

[root@mjs22lp1 ~]# ifup eth0
list_add corruption. next->prev should be prev (c000000055868060), but was (null).
(next=c0000000558689c0).
------------[ cut here ]------------
Badness at lib/list_debug.c:26

and the soft-lockup (hang) while doing rmmod ehea
[root@mjs22lp1 ~]# rmmod ehea
BUG: soft lockup - CPU#0 stuck for 61s! [rmmod:8837]
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nfsd lockd exportfs auth_rpcgss nfs_acl sco
bridge stp llc bnep l2cap bluetooth sunrpc ipv6 ext2 dm_multipath uinput ibmveth ehea(-) ibmvscsic
scsi_transport_srp scsi_tgt ext4 jbd2 crc16 [last unloaded: scsi_wait_scan]
irq event stamp: 0
hardirqs last  enabled at (0): [<(null)>] (null)
hardirqs last disabled at (0): [<c00000000008d2ac>] .copy_process+0x534/0x1178
softirqs last  enabled at (0): [<c00000000008d2ac>] .copy_process+0x534/0x1178
softirqs last disabled at (0): [<(null)>] (null)
NIP: c0000000004b8944 LR: c0000000004b8990 CTR: 0000000000000000
REGS: c00000004a1df4d0 TRAP: 0901   Tainted: G        W   (2.6.29-0.66.rc3.fc11.ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24000422  XER: 00000000
TASK = c0000000583e0000[8837] 'rmmod' THREAD: c00000004a1dc000 CPU: 0
GPR00: c0000000004b8990 c00000004a1df750 c000000000f509a8 0000000000000000
GPR04: c00000000014d62c c000000000149a24 0000000000000000 c000000000f8dd28
GPR08: c0000000583e0b20 0000000000000000 0000000000000000 c0000000517189c0
GPR12: d000000000630d00 c000000000f97400
NIP [c0000000004b8944] .netif_napi_del+0x84/0x90
LR [c0000000004b8990] .free_netdev+0x40/0xc8
Call Trace:
[c00000004a1df750] [c0000000004b8990] .free_netdev+0x40/0xc8 (unreliable)
[c00000004a1df7e0] [d00000000062658c] .ehea_shutdown_single_port+0x64/0x90 [ehea]
[c00000004a1df870] [d00000000062fe50] .ehea_remove+0x4c/0x124 [ehea]
[c00000004a1df910] [c0000000004a3e48] .of_platform_device_remove+0x40/0x58
[c00000004a1df980] [c0000000003808f4] .__device_release_driver+0xb8/0xfc
[c00000004a1dfa10] [c000000000380a00] .driver_detach+0xc8/0xfc
[c00000004a1dfaa0] [c00000000037f7c0] .bus_remove_driver+0xbc/0x114
[c00000004a1dfb30] [c000000000381078] .driver_unregister+0x58/0x78
[c00000004a1dfbc0] [c0000000004a3f84] .of_unregister_driver+0x14/0x28
[c00000004a1dfc30] [c000000000024c2c] .ibmebus_unregister_driver+0x10/0x24
[c00000004a1dfca0] [d00000000062fd88] .ehea_module_exit+0x3c/0xb8 [ehea]
[c00000004a1dfd30] [c0000000000cead4] .SyS_delete_module+0x244/0x2ec
[c00000004a1dfe30] [c0000000000085f0] syscall_exit+0x0/0x40
Instruction dump:
fb890000 4bff71b1 60000000 7fa9eb78 2fa90000 7d234b78 409effe4 38210090
f93f0088 e8010010 eb81ffe0 7c0803a6 <eba1ffe8> ebe1fff8 4e800020 fba1ffe8

=Comment: #11=================================================
Jan-Bernd Themann <THEMANN@de.ibm.com> - 

A patch that solves this problem has been posted to the mailing list and has been accepted by David
Miller:

Patch: http://www.spinics.net/lists/netdev/msg91156.html
Applied: http://www.spinics.net/lists/netdev/msg91286.html
Comment 1 IBM Bug Proxy 2009-05-03 22:40:45 EDT
Created attachment 342270 [details]
Comand sequence and calltraces produced. Machine hangs when &apos;reboot&apos; command is executed.
Comment 2 Chuck Ebbert 2009-05-07 20:55:04 EDT
Please test something more recent than the alpha. There have been at least two releases since then: beta and preview.
Comment 3 IBM Bug Proxy 2009-05-08 01:40:32 EDT
------- Comment From pavan.naregundi@in.ibm.com 2009-05-08 01:30 EDT-------
(In reply to comment #19)
> Please test something more recent than the alpha. There have been at least two
> releases since then: beta and preview.
>

Test on preview build. List corruption bug is still present.

http://pastebin.com/f6875ed2a

Thanks
Pavan
Comment 4 IBM Bug Proxy 2009-05-08 02:40:32 EDT
------- Comment From THEMANN@de.ibm.com 2009-05-08 02:34 EDT-------
Hi,

which eHEA driver version is currently used (shown in dmesg when the eHEA module is loaded)?

Regards,
Jan-Bernd
Comment 5 IBM Bug Proxy 2009-05-08 03:10:27 EDT
------- Comment From pavan.naregundi@in.ibm.com 2009-05-08 03:00 EDT-------
(In reply to comment #21)
> Hi,
>
> which eHEA driver version is currently used (shown in dmesg when the eHEA
> module is loaded)?
>
> Regards,
> Jan-Bernd
>

# dmesg | grep -i ehea
IBM eHEA ethernet device driver (Release EHEA_0096)
ehea: eth0: Jumbo frames are disabled
ehea: eth0 -> logical port id #2
ehea: eth0: Physical port up
ehea: External switch port is backup port
Comment 6 Chuck Ebbert 2009-05-08 11:49:13 EDT
Fixed by upstream commit 52e21b1bd96444c452f6eab7dc438a8a898aa14a ("ehea: fix circular locking problem")
Comment 7 IBM Bug Proxy 2009-06-01 02:42:48 EDT
------- Comment From anoop.vijayan@in.ibm.com 2009-06-01 02:32 EDT-------
Redhat, this patch is still not present in rawhide and the list corruption issue recreates.

(In reply to comment #23)
> Fixed by upstream commit 52e21b1bd96444c452f6eab7dc438a8a898aa14a ("ehea: fix
> circular locking problem")
>
Comment 8 Chuck Ebbert 2009-06-03 03:38:39 EDT
This should have been a release blocker but it never got added to the list. It can't be fixed until the first kernel update after release now.
Comment 9 Chuck Ebbert 2009-06-08 18:41:30 EDT
Fix went in kernel-2.6.29.4-172
Comment 10 Bug Zapper 2009-06-09 11:05:29 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 11 Fedora Update System 2009-06-17 07:53:36 EDT
kernel-2.6.29.5-191.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.29.5-191.fc11
Comment 12 Fedora Update System 2009-06-19 09:44:09 EDT
kernel-2.6.29.5-191.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-6768
Comment 13 IBM Bug Proxy 2009-06-22 01:41:33 EDT
------- Comment From pavan.naregundi@in.ibm.com 2009-06-22 01:32 EDT-------
Tested on 2.6.29.5-191.fc11.ppc64. List corruption issue still persist on this kernel.

Steps to reproduce:

1. Assume eth0 is down
a. ifup eth0
b. ifdown eth0
c. ifup eth0     // ifup for the second time gives the below call trace.

=======================
------------[ cut here ]------------
Badness at lib/list_debug.c:26
NIP: c0000000002ece88 LR: c0000000002ece84 CTR: 0000000000000001
REGS: c000000078ad2a40 TRAP: 0700   Not tainted  (2.6.29.5-191.fc11.ppc64)
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24020484  XER: 20000020
TASK = c000000078bada80[1616] 'ip' THREAD: c000000078ad0000 CPU: 3
GPR00: c0000000002ece84 c000000078ad2cc0 c000000000e6d2b8 000000000000006f
GPR04: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000010
GPR08: 00000000000036d8 0000000000000000 00000000000036da 00000000006c7000
GPR12: 0000000024020482 c000000000ea2a00 0000000000000000 0000000000000000
GPR16: c00000007b7c1c20 c000000078ad32c0 c000000078ad3340 fffffffffffff000
GPR20: c00000007b7c1c00 c000000000fa05b0 0000000000000000 c00000007a8d8618
GPR24: 0000000000000000 0000000000000040 d0000000009eb3a0 c00000007a8d8800
GPR28: c00000007a8d8060 c00000007a8d8800 c000000000e0a150 c000000078ad2cc0
NIP [c0000000002ece88] .__list_add+0x50/0xac
LR [c0000000002ece84] .__list_add+0x4c/0xac
Call Trace:
[c000000078ad2cc0] [c0000000002ece84] .__list_add+0x4c/0xac (unreliable)
[c000000078ad2d60] [c000000000553948] .netif_napi_add+0x6c/0xc0
[c000000078ad2e10] [d0000000009d9d4c] .ehea_init_port_res+0x32c/0x418 [ehea]
[c000000078ad2ec0] [d0000000009d9f60] .ehea_up+0x128/0x6e4 [ehea]
[c000000078ad2fe0] [d0000000009da730] .ehea_open+0x70/0x128 [ehea]
[c000000078ad3080] [c000000000559368] .dev_open+0xf8/0x170
[c000000078ad3120] [c0000000005589f0] .dev_change_flags+0xec/0x1ec
[c000000078ad31d0] [c0000000005645a0] .do_setlink+0x340/0x460
[c000000078ad32b0] [c00000000056638c] .rtnl_newlink+0x35c/0x534
[c000000078ad34c0] [c000000000565fe4] .rtnetlink_rcv_msg+0x25c/0x2a8
[c000000078ad3570] [c000000000580a8c] .netlink_rcv_skb+0x84/0x120
[c000000078ad3610] [c000000000565d68] .rtnetlink_rcv+0x38/0x58
[c000000078ad36a0] [c00000000058028c] .netlink_unicast+0x310/0x41c
[c000000078ad3780] [c000000000580680] .netlink_sendmsg+0x2e8/0x32c
[c000000078ad3880] [c0000000005412f4] .sock_sendmsg+0xf8/0x138
[c000000078ad3a90] [c000000000541540] .SyS_sendmsg+0x20c/0x2a4
[c000000078ad3cd0] [c00000000056ebd4] .compat_sys_sendmsg+0x40/0x5c
[c000000078ad3d70] [c000000000570538] .compat_sys_socketcall+0x200/0x244
[c000000078ad3e30] [c0000000000085f0] syscall_exit+0x0/0x40
Instruction dump:
7c3f0b78 ebc2cbf8 7cbd2b78 e8a50008 7c9c2378 7c7b1b78 7fa52000 41be0018
e87e8010 7fa6eb78 483494c1 60000000 <0fe00000> e8bc0000 7fa5e800 41be001c
===================================

Thanks
Pavan
Comment 14 IBM Bug Proxy 2009-06-22 15:03:42 EDT
------- Comment From THEMANN@de.ibm.com 2009-06-22 11:30 EDT-------
(In reply to comment #30)
I checked the source code of this kernel.

Patches I found that seems to be missing:

http://lkml.org/lkml/2009/1/21/191
http://lkml.org/lkml/2009/1/21/192
http://lkml.org/lkml/2009/1/21/190

http://lkml.org/lkml/2009/3/12/313    (circular locking)
http://lkml.org/lkml/2009/2/11/131    (list_del)

Especially the last two are important for this bug. Up to fedora if they want to include the other 3 patches as well. I would recommend that.

Regards,
Jan-Bernd
Comment 15 Fedora Update System 2009-06-24 15:22:43 EDT
kernel-2.6.29.5-191.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 16 Chuck Ebbert 2009-07-06 13:47:33 EDT
Added these fixes and changed version to 0096.4 in kernel-2.6.29.6-212:

51621fbdb1ea8709ab67170b54e71be6d9fa29ad
ehea: Fix: Remove adapter from adapter list in error path

3faf2693bd6800c2521799f6a9ae174d9f080ed2
ehea: Fix mem allocations which require page alignment
Comment 17 Fedora Update System 2009-07-08 08:13:36 EDT
kernel-2.6.29.6-213.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.29.6-213.fc11
Comment 18 IBM Bug Proxy 2009-07-13 04:30:33 EDT
------- Comment From pavan.naregundi@in.ibm.com 2009-07-13 04:24 EDT-------
(In reply to comment #33)
> Added these fixes and changed version to 0096.4 in kernel-2.6.29.6-212:
>
> 51621fbdb1ea8709ab67170b54e71be6d9fa29ad
> ehea: Fix: Remove adapter from adapter list in error path
>
> 3faf2693bd6800c2521799f6a9ae174d9f080ed2
> ehea: Fix mem allocations which require page alignment
>

Updated to kernel-2.6.29.6-213.fc11.ppc64. Doing a ifup produced kernel panic as shown below.
=============
# ifup eth0
Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in: sunrpc nf_conntrack_ipv6 ip6t_REJECT ip6table_filter ip6_tables ipv6 dm_multipath ehea ibmvscsic scsi_transport_srp scsi_tgt [last unloaded: scsi_wait_scan]
NIP: d0000000009dbcf8 LR: c000000000554098 CTR: d0000000009dbcf8
REGS: c00000000ffffb50 TRAP: 0700   Not tainted  (2.6.29.6-213.fc11.ppc64)
MSR: 8000000000089032 <EE,ME,IR,DR>  CR: 48000022  XER: 00000001
TASK = c000000000dc75d0[0] 'swapper' THREAD: c000000000e6c000 CPU: 0
GPR00: d0000000009dbcf8 c00000000ffffdd0 d0000000009f3720 c00000007b7407c8
GPR04: 0000000000000040 0000000000000850 0000000000000860 000000000009f580
GPR08: 00000000006ac000 d0000000009eb390 c000000000ea2400 d0000000009dc020
GPR12: 00000000000000c0 c000000000ea2400 0000000000375400 0000000000000000
GPR16: 0000000001000000 0000000002a046d8 0000000000000040 0000000000000000
GPR20: c00000007b7407c8 00000000fffcbb7d 0000000000000001 ffffffffffffffff
GPR24: 0000000000000000 0000000000000000 c000000000fc5820 000000000000012c
GPR28: 0000000000000000 c000000000fc5800 c000000000e190f8 c00000000ffffdd0
NIP [d0000000009dbcf8] .ehea_poll+0x0/0x328 [ehea]
LR [c000000000554098] .net_rx_action+0x124/0x2ac
Call Trace:
[c00000000ffffdd0] [c0000000005541ac] .net_rx_action+0x238/0x2ac (unreliable)
[c00000000ffffeb0] [c0000000000ad63c] .__do_softirq+0xf8/0x1e8
[c00000000fffff90] [c00000000002f3c0] .call_do_softirq+0x14/0x24
[c000000000e6f850] [c00000000000e624] .do_softirq+0xa0/0x104
[c000000000e6f8f0] [c0000000000acff8] .irq_exit+0x74/0xcc
[c000000000e6f970] [c00000000000e1ac] .do_IRQ+0x1e0/0x258
[c000000000e6fa30] [c000000000004d28] hardware_interrupt_entry+0x28/0x2c
--- Exception: 501 at .raw_local_irq_restore+0xa4/0xc0
LR = .cpu_idle+0x13c/0x1e0
[c000000000e6fd20] [0000000000375400] 0x375400 (unreliable)
[c000000000e6fdc0] [c000000000014e04] .cpu_idle+0x13c/0x1e0
[c000000000e6fe60] [c00000000062ebf0] .rest_init+0x94/0xb0
[c000000000e6fee0] [c0000000008b3ce0] .start_kernel+0x484/0x4a8
[c000000000e6ff90] [c000000000008408] .start_here_common+0x2c/0xa4
Instruction dump:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 <00000000> 00000000 00000000 00000000
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..
====================
Comment 19 IBM Bug Proxy 2009-07-13 06:50:54 EDT
------- Comment From TKLEIN@de.ibm.com 2009-07-13 06:42 EDT-------
*** Bug 52807 has been marked as a duplicate of this bug. ***
Comment 20 IBM Bug Proxy 2009-07-13 07:02:22 EDT
------- Comment From pavan.naregundi@in.ibm.com 2009-07-13 06:52 EDT-------
I could also reproduce the List corruption issue.

=============
list_add corruption. next->prev should be prev (c0000000fef00060), but was (null). (next=c0000000fef00800).
------------[ cut here ]------------
Badness at lib/list_debug.c:26
NIP: c0000000002eceac LR: c0000000002ecea8 CTR: 0000000000000001
REGS: c0000000fcad2a40 TRAP: 0700   Not tainted  (2.6.29.6-213.fc11.ppc64)
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24020484  XER: 20000020
TASK = c0000000fa72cf80[1774] 'ip' THREAD: c0000000fcad0000 CPU: 0
GPR00: c0000000002ecea8 c0000000fcad2cc0 c000000000e6d2c0 000000000000006f
GPR04: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000010
GPR08: 0000000000003364 0000000000000000 0000000000003366 00000000006ad000
GPR12: 0000000024020482 c000000000ea2400 0000000000000000 0000000000000000
GPR16: c0000000fee2a020 c0000000fcad32c0 c0000000fcad3340 fffffffffffff000
GPR20: c0000000fee2a000 c000000000fa06b0 0000000000000000 c0000000fef00618
GPR24: 0000000000000000 0000000000000040 d000000000685390 c0000000fef00800
GPR28: c0000000fef00060 c0000000fef00800 c000000000e0a150 c0000000fcad2cc0
NIP [c0000000002eceac] .__list_add+0x50/0xac
LR [c0000000002ecea8] .__list_add+0x4c/0xac
Call Trace:
[c0000000fcad2cc0] [c0000000002ecea8] .__list_add+0x4c/0xac (unreliable)
[c0000000fcad2d60] [c000000000553a60] .netif_napi_add+0x6c/0xc0
[c0000000fcad2e10] [d000000000673d10] .ehea_init_port_res+0x32c/0x418 [ehea]
[c0000000fcad2ec0] [d000000000673f24] .ehea_up+0x128/0x6e0 [ehea]
[c0000000fcad2fe0] [d0000000006746f0] .ehea_open+0x70/0x128 [ehea]
[c0000000fcad3080] [c000000000559480] .dev_open+0xf8/0x170
[c0000000fcad3120] [c000000000558b08] .dev_change_flags+0xec/0x1ec
[c0000000fcad31d0] [c0000000005646b8] .do_setlink+0x340/0x460
[c0000000fcad32b0] [c0000000005664a4] .rtnl_newlink+0x35c/0x534
[c0000000fcad34c0] [c0000000005660fc] .rtnetlink_rcv_msg+0x25c/0x2a8
[c0000000fcad3570] [c000000000580ba4] .netlink_rcv_skb+0x84/0x120
[c0000000fcad3610] [c000000000565e80] .rtnetlink_rcv+0x38/0x58
[c0000000fcad36a0] [c0000000005803a4] .netlink_unicast+0x310/0x41c
[c0000000fcad3780] [c000000000580798] .netlink_sendmsg+0x2e8/0x32c
[c0000000fcad3880] [c00000000054140c] .sock_sendmsg+0xf8/0x138
[c0000000fcad3a90] [c000000000541658] .SyS_sendmsg+0x20c/0x2a4
[c0000000fcad3cd0] [c00000000056ecec] .compat_sys_sendmsg+0x40/0x5c
[c0000000fcad3d70] [c000000000570650] .compat_sys_socketcall+0x200/0x244
[c0000000fcad3e30] [c0000000000085f0] syscall_exit+0x0/0x40
Instruction dump:
7c3f0b78 ebc2cbf8 7cbd2b78 e8a50008 7c9c2378 7c7b1b78 7fa52000 41be0018
e87e8010 7fa6eb78 483495b9 60000000 <0fe00000> e8bc0000 7fa5e800 41be001c
ehea: eth0: Logical port up: 100Mbps Full Duplex
ehea: eth0: Physical port up
ehea: External switch port is backup port
SELinux: initialized (dev 0:14, type nfs), uses genfs_contexts
===================
Comment 21 IBM Bug Proxy 2009-07-14 11:10:48 EDT
------- Comment From HERING2@de.ibm.com 2009-07-14 11:05 EDT-------
Hello,

I had a look at the list_add problem described here. It seems that the Fedora kernel configuration has the DEBUG_LIST flag enabled, which causes to show up the stack trace. This is no serious error, however, we need to investigate where the improper list handling comes from.

As the Kernel Panic problem is not related to this one, we will track that problem in the reopened bug #52807.

Regards

Hannes
Comment 22 IBM Bug Proxy 2009-07-14 11:51:12 EDT
------- Comment From HERING2@de.ibm.com 2009-07-14 11:40 EDT-------
Hello,

as the problem shown up here is not the same as in bug #52807, I suggested to open a new bug for the problem. Bug #52807 seems to be solved.

Regards

Hannes
Comment 23 Fedora Update System 2009-07-16 03:12:33 EDT
kernel-2.6.29.6-213.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7617
Comment 24 Fedora Update System 2009-07-22 17:57:15 EDT
kernel-2.6.29.6-213.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 25 IBM Bug Proxy 2009-10-27 05:51:05 EDT
------- Comment From pavan.naregundi@in.ibm.com 2009-10-27 05:41 EDT-------
Closing this bug as the issue is fixed.

Note You need to log in before you can comment on or make changes to this bug.