Bug 552886
| Summary: | [RHEL5] ip_mc_sf_allow() has a lock problem | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Flavio Leitner <fleitner> | |
| Component: | kernel | Assignee: | Flavio Leitner <fleitner> | |
| Status: | CLOSED ERRATA | QA Contact: | Network QE <network-qe> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 5.3 | CC: | haliu, kzhang, skito | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 572202 578932 (view as bug list) | Environment: | ||
| Last Closed: | 2011-01-13 20:58:31 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 502912, 572202, 578932, 600363 | |||
Created attachment 381982 [details]
reproducer
Patch is accepted upstream: http://www.spinics.net/lists/netdev/msg119908.html This is the commit on net-next-2.6 commit c85bb41e93184bf5494dde6d8fe5a81b564c84c8 Author: Flavio Leitner <fleitner> Date: Tue Feb 2 07:32:29 2010 -0800 igmp: fix ip_mc_sf_allow race [v5] Almost all igmp functions accessing inet->mc_list are protected by rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group remove an entry while ip_mc_sf_allow is running causing a crash. Signed-off-by: Flavio Leitner <fleitner> Signed-off-by: David S. Miller <davem> next step is backport this to RHEL versions. Flavio This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |
Description of problem: Almost all igmp functions accessing inet->mc_list are protected by rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group remove an entry while ip_mc_sf_allow is running causing a crash. --- snip --- eth5.101: del 33:33:00:00:00:fb mcast address from master interface eth5.101: del 01:00:5e:00:00:fb mcast address from master interface eth5.100: del 33:33:00:00:00:fb mcast address from master interface eth5.100: del 01:00:5e:00:00:fb mcast address from master interface BUG: unable to handle kernel paging request at virtual address 005e0005 printing eip: c05f2194 *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:7f/0000:7f:06.0/irq Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp l2cap bluetooth sunrpc 8021q ipv 6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core but ton battery asus_acpi ac parport_pc lp parport sg e1000(U) pcspkr dm_raid45 dm_message dm_ region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod hfcldd(FU) sd_mod scs i_mod hfcldd_conf(U) hraslog_link(U) ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<c05f2194>] Tainted: GF VLI EFLAGS: 00210202 (2.6.18-128.el5PAE #1) EIP is at ip_mc_sf_allow+0x20/0x79 eax: 005e0001 ebx: f6cb9100 ecx: 00000008 edx: fb0000e0 esi: 5acb10ac edi: f63414e9 ebp: f7ae3200 esp: c0732ea4 ds: 007b es: 007b ss: 0068 Process xlinpack_xeon32 (pid: 5194, ti=c0732000 task=cfd2daa0 task.ti=f5d30000) Stack: f6cb9108 f6cb9100 c05ea1eb 00000008 d03e9034 5acb10ac fb0000e0 e9140000 00000008 e91414e9 f7ae3200 c06ab4a8 00000000 00000000 c05ce1d5 f7ae3200 00000000 f7ae3200 d03e9020 c05ce042 f7ae3200 c07d6988 c06ab560 00000008 Call Trace: [<c05ea1eb>] udp_rcv+0x1f4/0x514 [<c05ce1d5>] ip_local_deliver+0x159/0x204 [<c05ce042>] ip_rcv+0x46f/0x4a9 [<c05b397d>] netif_receive_skb+0x30c/0x330 [<f8924be0>] e1000_clean_rx_irq+0xf0/0x3e0 [e1000] [<f8924af0>] e1000_clean_rx_irq+0x0/0x3e0 [e1000] [<f8922bd4>] e1000_clean+0xf4/0x340 [e1000] [<c04074d6>] do_IRQ+0xb5/0xc3 [<c05b52d4>] net_rx_action+0x92/0x175 [<c042900f>] __do_softirq+0x87/0x114 [<c04073d7>] do_softirq+0x52/0x9c [<c04059d7>] apic_timer_interrupt+0x1f/0x24 ======================= Code: 81 c4 8c 00 00 00 5b 5e 5f 5d c3 56 89 ce 53 89 c3 8b 4c 24 0c 89 d0 25 f0 00 00 00 3d e0 00 00 00 75 59 8b 83 84 01 00 00 eb 0c <39> 50 04 75 05 39 48 0c 74 08 8b 00 85 c0 75 f0 eb 3f 8b 50 14 EIP: [<c05f2194>] ip_mc_sf_allow+0x20/0x79 SS:ESP 0068:c0732ea4 <0>Kernel panic - not syncing: Fatal exception in interrupt --- snip --- The proposed patch is still under discussing at netdev: http://www.spinics.net/lists/netdev/msg116969.html Version-Release number of selected component (if applicable): 2.6.18-128.el5PAE but happens upstream too How reproducible: Customer reproduces at boot time, but I have a python script to send/receive multicasting while playing multicasting groups. Steps to Reproduce: 1. run ./repro --sender on one system 2. run ./repro --recv 3. wait for the oops.