Heads-up from QE. without bonding, normal user could trigger a crash like this, Unable to handle kernel NULL pointer dereference at 00000000000000a8 RIP: [<ffffffff888096a2>] :sctp:sctp_sock_rfree+0x19/0x2a PGD 3470ff067 PUD 34a820067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:06:00.1/irq CPU 0 Modules linked in: md5 sctp autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev ixgbe i2c_i801 8021q sr_mod cdrom tpm_tis i2c_core tpm tpm_bios dca sg pcspkr i7core_edac edac_mc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 21814, comm: acc Not tainted 2.6.18-268.el5 #1 RIP: 0010:[<ffffffff888096a2>] [<ffffffff888096a2>] :sctp:sctp_sock_rfree+0x19/0x2a RSP: 0018:ffffffff804a9d18 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff8101f5561cc0 RCX: 00000000000006a4 RDX: ffff8101dc83a980 RSI: 00000000000005ac RDI: ffff810363856318 RBP: ffff8103638562c0 R08: 000000000000016d R09: ffff8101e7c616c0 R10: ffff8101e68bf2c0 R11: ffff8101e2859f60 R12: 0000000000000000 R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 FS: 00002b93fd25a6e0(0000) GS:ffffffff8042a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000000a8 CR3: 000000034a771000 CR4: 00000000000006a0 Process acc (pid: 21814, threadinfo ffff810345e12000, task ffff81036edbf080) Stack: ffffffff80230e6f ffff810367fcb628 ffff8103638562c0 ffff810367fcb358 ffffffff80028da5 ffff810367fcb628 ffffffff88808bdc ffff810367fca000 ffffffff888028d1 ffffffff804a9df0 ffff810367fca000 ffff810367aac580 Call Trace: <IRQ> [<ffffffff80230e6f>] skb_release_head_state+0xab/0xf8 [<ffffffff80028da5>] __kfree_skb+0x9/0x1a [<ffffffff88808bdc>] :sctp:sctp_ulpq_free+0x49/0x88 [<ffffffff888028d1>] :sctp:sctp_association_free+0x54/0x118 [<ffffffff887ff93e>] :sctp:sctp_do_sm+0x11d/0xe61 [<ffffffff8006ebc9>] do_timer_jiffy+0x9/0x1d [<ffffffff888007d4>] :sctp:sctp_generate_t5_shutdown_guard_event+0x0/0xa [<ffffffff88800790>] :sctp:sctp_generate_timeout_event+0x7b/0xab [<ffffffff80099bb0>] run_timer_softirq+0x18d/0x23a [<ffffffff80012562>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d636>] do_softirq+0x2c/0x7d [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff800b3ba1>] audit_log_lost+0xb/0x9d [<ffffffff800b44c9>] audit_log_start+0x1a8/0x3a3 [<ffffffff8000f470>] __alloc_pages+0x78/0x308 [<ffffffff8002b59d>] flush_tlb_page+0xae/0xdc [<ffffffff8012d729>] avc_audit+0x74/0x9b4 [<ffffffff8000769e>] find_get_page+0x21/0x51 [<ffffffff8012e0af>] avc_has_perm+0x46/0x58 [<ffffffff8012e983>] inode_has_perm+0x56/0x63 [<ffffffff800a4e94>] ktime_get_ts+0x1a/0x4e [<ffffffff8012ea24>] file_has_perm+0x94/0xa3 [<ffffffff80131510>] selinux_file_permission+0x9f/0xb4 [<ffffffff80016b6b>] vfs_write+0xa7/0x174 [<ffffffff8001745b>] sys_write+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 48 83 b8 a8 00 00 00 00 74 06 01 8a e4 00 00 00 c3 55 48 89 RIP [<ffffffff888096a2>] :sctp:sctp_sock_rfree+0x19/0x2a RSP <ffffffff804a9d18> CR2: 00000000000000a8 <0>Kernel panic - not syncing: Fatal exception The panic backtrace is different from that rhel-6.1 bug. for rhel-5.7, you don't need a bonded interface to trigger the issue.
Upstream commit: http://git.kernel.org/linus/ea2bc483ff5caada7c4aa0d5fbf87d3a6590273d
Commit 68ef2a9129 introduced proper receive memory management but left out a chunk of upstream ea2bc483ff5 which was not needed at the time of the backport. A few months later, commit 73f34f99 backported the updated socket memory accounting which would have required the missing chunk. This resulted in the memory of the chunks on the reassmbly and lobby queue not being reclaimed when migrating a socket. Also the queues are assumed to be purged later on.
Statement: This issue did not affect the Linux kernel as shipped with Red Hat Enterprise Linux 4 as it did not backport the upstream commit 3ab224be6d6. It did not affect the Linux kernels as shipped with Red Hat Enterprise Linux 6, and Red Hat Enterprise MRG as they have backported the upstream commit ea2bc483ff5 that Red Hat Enterprise Linux 5 did not. This has been addressed in Red Hat Enterprise Linux 5 via https://rhn.redhat.com/errata/RHSA-2011-1212.html.
This issue has been addressed in following products: Red Hat Enterprise Linux 5 Via RHSA-2011:1212 https://rhn.redhat.com/errata/RHSA-2011-1212.html
Created kernel tracking bugs for this issue Affects: fedora-all [bug 748680]
This issue has been addressed in following products: Red Hat Enterprise Linux 5.6.Z - Server Only Via RHSA-2011:1813 https://rhn.redhat.com/errata/RHSA-2011-1813.html