Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 552886

Summary: [RHEL5] ip_mc_sf_allow() has a lock problem
Product: Red Hat Enterprise Linux 5 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Flavio Leitner <fleitner>
Status: CLOSED ERRATA QA Contact: Network QE <network-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: haliu, kzhang, skito
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 572202 578932 (view as bug list) Environment:
Last Closed: 2011-01-13 20:58:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 502912, 572202, 578932, 600363    

Description Flavio Leitner 2010-01-06 13:13:48 UTC
Description of problem:
Almost all igmp functions accessing inet->mc_list are protected by
rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
remove an entry while ip_mc_sf_allow is running causing a crash.

--- snip ---
eth5.101: del 33:33:00:00:00:fb mcast address from master interface
eth5.101: del 01:00:5e:00:00:fb mcast address from master interface
eth5.100: del 33:33:00:00:00:fb mcast address from master interface
eth5.100: del 01:00:5e:00:00:fb mcast address from master interface
BUG: unable to handle kernel paging request at virtual address 005e0005
printing eip:
c05f2194
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file: /devices/pci0000:7f/0000:7f:06.0/irq
Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp l2cap bluetooth sunrpc 8021q ipv
6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core but
ton battery asus_acpi ac parport_pc lp parport sg e1000(U) pcspkr dm_raid45 dm_message dm_
region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod hfcldd(FU) sd_mod scs
i_mod hfcldd_conf(U) hraslog_link(U) ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<c05f2194>]    Tainted: GF     VLI
EFLAGS: 00210202   (2.6.18-128.el5PAE #1)
EIP is at ip_mc_sf_allow+0x20/0x79
eax: 005e0001   ebx: f6cb9100   ecx: 00000008   edx: fb0000e0
esi: 5acb10ac   edi: f63414e9   ebp: f7ae3200   esp: c0732ea4
ds: 007b   es: 007b   ss: 0068
Process xlinpack_xeon32 (pid: 5194, ti=c0732000 task=cfd2daa0 task.ti=f5d30000)
Stack: f6cb9108 f6cb9100 c05ea1eb 00000008 d03e9034 5acb10ac fb0000e0 e9140000
      00000008 e91414e9 f7ae3200 c06ab4a8 00000000 00000000 c05ce1d5 f7ae3200
      00000000 f7ae3200 d03e9020 c05ce042 f7ae3200 c07d6988 c06ab560 00000008
Call Trace:
[<c05ea1eb>] udp_rcv+0x1f4/0x514
[<c05ce1d5>] ip_local_deliver+0x159/0x204
[<c05ce042>] ip_rcv+0x46f/0x4a9
[<c05b397d>] netif_receive_skb+0x30c/0x330
[<f8924be0>] e1000_clean_rx_irq+0xf0/0x3e0 [e1000]
[<f8924af0>] e1000_clean_rx_irq+0x0/0x3e0 [e1000]
[<f8922bd4>] e1000_clean+0xf4/0x340 [e1000]
[<c04074d6>] do_IRQ+0xb5/0xc3
[<c05b52d4>] net_rx_action+0x92/0x175
[<c042900f>] __do_softirq+0x87/0x114
[<c04073d7>] do_softirq+0x52/0x9c
[<c04059d7>] apic_timer_interrupt+0x1f/0x24
=======================
Code: 81 c4 8c 00 00 00 5b 5e 5f 5d c3 56 89 ce 53 89 c3 8b 4c 24 0c 89 d0 25 f0 00 00 00
3d e0 00 00 00 75 59 8b 83 84 01 00 00 eb 0c <39> 50 04 75 05 39 48 0c 74 08 8b 00 85 c0 75 f0 eb 3f 8b 50 14
EIP: [<c05f2194>] ip_mc_sf_allow+0x20/0x79 SS:ESP 0068:c0732ea4
<0>Kernel panic - not syncing: Fatal exception in interrupt
--- snip ---

The proposed patch is still under discussing at netdev:
http://www.spinics.net/lists/netdev/msg116969.html


Version-Release number of selected component (if applicable):
2.6.18-128.el5PAE
but happens upstream too

How reproducible:
Customer reproduces at boot time, but I have a python script 
to send/receive multicasting while playing multicasting groups.


Steps to Reproduce:
1. run ./repro --sender on one system
2. run ./repro --recv
3. wait for the oops.

Comment 1 Flavio Leitner 2010-01-06 13:20:16 UTC
Created attachment 381982 [details]
reproducer

Comment 2 Flavio Leitner 2010-02-03 12:12:33 UTC
Patch is accepted upstream:
http://www.spinics.net/lists/netdev/msg119908.html

This is the commit on net-next-2.6
commit c85bb41e93184bf5494dde6d8fe5a81b564c84c8
Author: Flavio Leitner <fleitner>
Date:   Tue Feb 2 07:32:29 2010 -0800

    igmp: fix ip_mc_sf_allow race [v5]
    
    Almost all igmp functions accessing inet->mc_list are protected by
    rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
    so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
    remove an entry while ip_mc_sf_allow is running causing a crash.
    
    Signed-off-by: Flavio Leitner <fleitner>
    Signed-off-by: David S. Miller <davem>


next step is backport this to RHEL versions.
Flavio

Comment 6 RHEL Program Management 2010-05-20 12:44:28 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Jarod Wilson 2010-05-25 21:11:05 UTC
in kernel-2.6.18-200.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 14 errata-xmlrpc 2011-01-13 20:58:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html