DDN Lustre customer has network problems, which seems to trigger an ixgbe bug. A NULL pointer reference then happens and the entirely locks up with a kernel panic. LustreError: 8094:0:(ost_handler.c:1094:ost_brw_write()) client csum 4a41e0df, server csum 4739e073 LustreError: 168-f: datafs-OST0009: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-10.128.130.174@tcp inum 2107295/28010499 object 3891970/0 extent [47185920-48234495] LustreError: 8094:0:(ost_handler.c:1169:ost_brw_write()) client csum 4a41e0df, original server csum 4739e073, server csum now 4739e073 LustreError: 10973:0:(ost_handler.c:1094:ost_brw_write()) client csum 3bd08cb7, server csum c7e08c7a LustreError: 168-f: datafs-OST000d: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-10.128.128.202@tcp inum 7963256/28017359 object 3892458/0 extent [0-1048575] LustreError: 10973:0:(ost_handler.c:1169:ost_brw_write()) client csum 3bd08cb7, original server csum c7e08c7a, server csum now c7e08c7a Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff800273b8>] eth_type_trans+0x3d/0xf0 PGD 0 Oops: 0000 [1] SMP last sysfs file: /class/infiniband_mad/umad0/port CPU 11 Modules linked in: hidp(U) l2cap(U) bluetooth(U) obdfilter(U) lquota(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ptlrpc(U) ib_srp(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U) bnx2i(U) libiscsi2(U) cnic(U) uio(U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) cxgb3(U) mlx4_en(U) mlx4_ib(U) ib_mthca(U) ib_mad(U) ib_core(U) ldiskfs(U) crc16(U) ksocklnd(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) dm_round_robin(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) i2c_core(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) sr_mod(U) cdrom(U) ixgbe(U) sg(U) mlx4_core(U) 8021q(U) dca(U) serio_raw(U) bnx2(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) ata_piix(U) libata(U) shpchp(U) megaraid_sas(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 0, comm: swapper Tainted: G 2.6.18-164.11.1.el5_lustre.1.8.2 #1 RIP: 0010:[<ffffffff800273b8>] [<ffffffff800273b8>] eth_type_trans+0x3d/0xf0 RSP: 0018:ffff8101b5703dc8 EFLAGS: 00010202 RAX: 00000000000005dc RBX: ffff8101920cdbc0 RCX: 0000000000000000 RDX: ffff810313ff07c0 RSI: ffff810328baa000 RDI: ffff8101920cdbc0 RBP: 0000000000000000 R08: ffff81011ed87000 R09: 00000000313ddbbc R10: 0000000000000000 R11: 0000000000000000 R12: ffff810313ff07d0 R13: ffff81032879b120 R14: 0000000000000063 R15: ffffc200109d8360 FS: 0000000000000000(0000) GS:ffff8101afe3e540(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff8101b56fe000, task ffff8101afe3f100) Stack: ffffffff88250043 000000002879b120 ffff8101b5703f08 0000000b109dfd68 ffff81000902e780 0000004000000000 ffff8101b5703eac ffff81032fda3800 ffff810328baa500 ffff8101af9be000 ffff810313ff07c0 0000007d00000000 Call Trace: <IRQ> [<ffffffff88250043>] :ixgbe:ixgbe_clean_rx_irq+0x523/0xbe0 [<ffffffff88251fd3>] :ixgbe:ixgbe_clean_rxonly+0x83/0x1a0 [<ffffffff8825ac22>] :ixgbe:__kc_adapter_clean+0x32/0x60 [<ffffffff8000c845>] net_rx_action+0xac/0x1e0 [<ffffffff8001231d>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006cb3c>] do_softirq+0x2c/0x85 [<ffffffff8006c9c4>] do_IRQ+0xec/0xf5 [<ffffffff8005d615>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80198732>] acpi_processor_idle_simple+0x17d/0x30e [<ffffffff80197e78>] acpi_safe_halt+0x25/0x36 [<ffffffff80198695>] acpi_processor_idle_simple+0xe0/0x30e [<ffffffff801985b5>] acpi_processor_idle_simple+0x0/0x30e [<ffffffff8004947e>] cpu_idle+0x95/0xb8 [<ffffffff80077474>] start_secondary+0x498/0x4a7 Code: f6 01 01 74 3e 48 8d 86 c8 01 00 00 66 8b 50 02 66 8b 40 04 RIP [<ffffffff800273b8>] eth_type_trans+0x3d/0xf0 RSP <ffff8101b5703dc8> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception &B [send break] &. [terminated ipmitool] You have new mail in /var/mail/root ESC[1mESC[31mgaribaldi:/opt/impi.log # ESC[0;10mexit exit
This appears to be a bug with the Intel ixgbe driver downloaded from SF. The function '__kc_adapter_clean' is from that driver and is not include in the driver provided as part of the Red Hat kernel RPM. I will be happy to look at this if the ixgbe driver provided by Red Hat also has this problem. Please re-open this bug if it does. Thanks!