Bug 250842

Summary: oopses when multicasting with connection oriented socket
Product: Red Hat Enterprise Linux 4 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Anton Arapov <anton>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: high    
Version: 4.5CC: jbaron, nobody, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:14:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 259261, 422551, 430698    
Attachments:
Description Flags
Proposed patch none

Description Issue Tracker 2007-08-03 21:13:59 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2007-08-03 21:14:05 UTC
Description of problem:

The attached code runs as root because it sends ICMP but otherwise
is a normal userspace process; its purpose is to find a 'partner'
host in the same network for some automated network testing later.

After the code has run, latest RHEL4 kernels are panicing.  We are sure we
have bugs in the code but so far unable to find any.

How reproducible:
very likely

Steps to Reproduce:
Run "make && ./pair" on two machines behind the same switch.

Actual results:
Jun 11 11:14:32 lxb7913.cern.ch login: Unable to handle kernel NULL pointer dereference at 0000000000000130 RIP: 
Jun 11 11:14:32 <ffffffff80162798>{free_block+179}
Jun 11 11:14:32 PML4 220d23067 PGD 21f847067 PMD 0 
Jun 11 11:14:32 Oops: 0002 [1] SMP 
Jun 11 11:14:32 CPU 3 
Jun 11 11:14:33 Modules linked in: edac_mc ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 i2c_dev i2c_core md5 ipv6 ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac joydev uhci_hcd ehci_hcd e1000 ext3 jbd ata_piix libata sd_mod scsi_mod
Jun 11 11:14:33 Pid: 13, comm: events/3 Not tainted 2.6.9-55.ELsmp
Jun 11 11:14:33 RIP: 0010:[<ffffffff80162798>] <ffffffff80162798>{free_block+179}
Jun 11 11:14:33 RSP: 0018:00000100cfc5bde8  EFLAGS: 00010006
Jun 11 11:14:33 RAX: 0000000000000128 RBX: 00000100cfedd1c0 RCX: 0000010000012000
Jun 11 11:14:33 RDX: 0000000000000000 RSI: 000001022605d700 RDI: 000001021bea5000
Jun 11 11:14:33 RBP: 00000100cfe1c0d0 R08: 00000100cfe02090 R09: 0000000000000000
Jun 11 11:14:33 R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000001
Jun 11 11:14:33 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff80492740
Jun 11 11:14:33 FS:  0000000000000000(0000) GS:ffffffff804ed880(0000) knlGS:0000000000000000
Jun 11 11:14:33 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jun 11 11:14:34 CR2: 0000000000000130 CR3: 0000000006974000 CR4: 00000000000006e0
Jun 11 11:14:34 Process events/3 (pid: 13, threadinfo 00000100cfc5a000, task 00000100cfc31030)
Jun 11 11:14:34 Stack: 00000100cfe1c0c0 00000100cfe1c0d0 0000000000000001 00000100cfe1c0c0 
Jun 11 11:14:34        00000100cfedd1c0 ffffffff801628ae 000001000107de40 000001000107de48 
Jun 11 11:14:34        00000100cfedd2b0 ffffffff80163774 
Jun 11 11:14:34 Call Trace:<ffffffff801628ae>{drain_array_locked+99} <ffffffff80163774>{cache_reap+162} 
Jun 11 11:14:34        <ffffffff801636d2>{cache_reap+0} <ffffffff80147d1a>{worker_thread+419} 
Jun 11 11:14:34        <ffffffff801342a4>{default_wake_function+0} <ffffffff801342f5>{__wake_up_common+67} 
Jun 11 11:14:34        <ffffffff801342a4>{default_wake_function+0} <ffffffff80147b77>{worker_thread+0} 
Jun 11 11:14:34        <ffffffff8014ba3f>{kthread+200} <ffffffff80110f47>{child_rip+8} 
Jun 11 11:14:34        <ffffffff8014b977>{kthread+0} <ffffffff80110f3f>{child_rip+0} 
Jun 11 11:14:34        
Jun 11 11:14:34 
Jun 11 11:14:34 Code: 48 89 50 08 48 89 02 48 2b 7e 18 48 c7 06 00 01 10 00 48 c7 
Jun 11 11:14:34 RIP <ffffffff80162798>{free_block+179} RSP <00000100cfc5bde8>
Jun 11 11:14:34 CR2: 0000000000000130
Jun 11 11:14:34  <0>Kernel panic - not syncing: Oops
Jun 11 11:15:05  NMI Watchdog detected LOCKUP, CPU=1, registers:
Jun 11 11:15:05 CPU 1 
Jun 11 11:15:05 Modules linked in: edac_mc ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 i2c_dev i2c_core md5 ipv6 ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac joydev uhci_hcd ehci_hcd e1000 ext3 jbd ata_piix libata sd_mod scsi_mod
Jun 11 11:15:05 Pid: 10704, comm: fsprobe Not tainted 2.6.9-55.ELsmp
Jun 11 11:15:05 RIP: 0010:[<ffffffff8030cf66>] <ffffffff8030cf66>{.text.lock.spinlock+5}
Jun 11 11:15:05 RSP: 0018:000001022143f718  EFLAGS: 00000086
Jun 11 11:15:05 RAX: 0000000000000010 RBX: 00000100cfedd268 RCX: 0000000000000001
Jun 11 11:15:05 RDX: 0000000000000000 RSI: 0000000000000850 RDI: 00000100cfedd268
Jun 11 11:15:05 RBP: 00000100cfeb90c0 R08: 0000000000003e9f R09: 0000010220791ba8
Jun 11 11:15:05 R10: 0000010227610a28 R11: 0000010227610a28 R12: 00000100cfedd208
Jun 11 11:15:05 R13: 00000100cfedd1c0 R14: 0000000000000850 R15: 000001022143f8ec
Jun 11 11:15:06 FS:  0000002a95584b00(0000) GS:ffffffff804ed780(0000) knlGS:0000000000000000
Jun 11 11:15:06 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 11 11:15:06 CR2: 0000002a95780b70 CR3: 0000000037e88000 CR4: 00000000000006e0
Jun 11 11:15:06 Process fsprobe (pid: 10704, threadinfo 000001022143e000, task 00000102190c57f0)
Jun 11 11:15:06 Stack: 000000000000000c ffffffff80161f75 00000100cfedd1c0 00000100cfedd1c0 
Jun 11 11:15:06        0000000000000850 000001021bb00190 0000010224f888c0 00000102269714c0 
Jun 11 11:15:06        000001022143f8ec ffffffff80161e47 
Jun 11 11:15:06 Call Trace:<ffffffff80161f75>{cache_alloc_refill+93} <ffffffff80161e47>{__kmalloc+123} 
Jun 11 11:15:06        <ffffffffa0058acc>{:jbd:__jbd_kmalloc+21} <ffffffffa0054543>{:jbd:do_get_write_access+1046} 
Jun 11 11:15:06        <ffffffff8015a97c>{find_get_page+65} <ffffffff8017b62c>{__find_get_block_slow+255} 
Jun 11 11:15:06        <ffffffffa0054780>{:jbd:journal_get_undo_access+50} 
Jun 11 11:15:06        <ffffffffa0065aba>{:ext3:ext3_try_to_allocate_with_rsv+84} 
Jun 11 11:15:06        <ffffffffa006613e>{:ext3:ext3_new_block+497} <ffffffffa00683e6>{:ext3:ext3_alloc_block+7} 
Jun 11 11:15:06        <ffffffffa0069fcb>{:ext3:ext3_get_block_handle+881} 
Jun 11 11:15:07        <ffffffff8016ae3b>{get_user_pages+1339} <ffffffffa006a564>{:ext3:ext3_direct_io_get_blocks+164} 
Jun 11 11:15:07        <ffffffff8019b4f8>{__blockdev_direct_IO+1820} <ffffffffa0069950>{:ext3:ext3_mark_iloc_dirty+803} 
Jun 11 11:15:07        <ffffffffa006b532>{:ext3:ext3_direct_IO+251} <ffffffffa006a4c0>{:ext3:ext3_direct_io_get_blocks+0} 
Jun 11 11:15:07        <ffffffff8015c2b6>{generic_file_direct_IO+78} <ffffffff8015c343>{generic_file_direct_write+103} 
Jun 11 11:15:07        <ffffffff8015c660>{__generic_file_aio_write_nolock+662} 
Jun 11 11:15:07        <ffffffff8015c943>{generic_file_aio_write_nolock+32} 
Jun 11 11:15:07        <ffffffff8015ca0d>{generic_file_aio_write+126} <ffffffffa0066f01>{:ext3:ext3_file_write+22} 
Jun 11 11:15:07        <ffffffff80179da3>{do_sync_write+178} <ffffffff8030bcf5>{thread_return+88} 
Jun 11 11:15:07        <ffffffff80135c64>{autoremove_wake_function+0} <ffffffff80179e9e>{vfs_write+207} 
Jun 11 11:15:07        <ffffffff80179f86>{sys_write+69} <ffffffff8011026a>{system_call+126} 
Jun 11 11:15:07        
Jun 11 11:15:07 
Jun 11 11:15:08 Code: 7e f9 e9 60 fc ff ff f3 90 83 3b 00 7e f9 e9 ce fc ff ff e8 
Jun 11 11:15:08 Kernel panic - not syncing: nmi watchdog
Jun 11 11:15:08  <1>Unable to handle kernel NULL pointer dereference at 00000000000000ff RIP: 
Jun 11 11:15:08 [<00000000000000ff>]
Jun 11 11:15:08 PML4 220d23067 PGD 21f847067 PMD 0 
Jun 11 11:15:08 Oops: 0000 [2] SMP 
Jun 11 11:15:08 CPU 1 
Jun 11 11:15:08 Modules linked in: edac_mc ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 i2c_dev i2c_core md5 ipv6 ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac joydev uhci_hcd ehci_hcd e1000 ext3 jbd ata_piix libata sd_mod scsi_mod
Jun 11 11:15:08 Pid: 10704, comm: fsprobe Not tainted 2.6.9-55.ELsmp
Jun 11 11:15:08 RIP: 0010:[<00000000000000ff>] [<00000000000000ff>]
Jun 11 11:15:08 RSP: 0018:0000010037e9bfa0  EFLAGS: 00010006
Jun 11 11:15:08 RAX: 000001022143ffd8 RBX: 0000000000000000 RCX: 0000000000000002
Jun 11 11:15:08 RDX: 00000000000000ff RSI: 0000000000000000 RDI: 0000000000000002
Jun 11 11:15:08 RBP: 0000010037e8bcf8 R08: 0000000000000008 R09: 0000000000000000
Jun 11 11:15:08 R10: 0000000000000000 R11: 0000000000000002 R12: 00000100cfedd208
Jun 11 11:15:09 R13: 00000100cfedd1c0 R14: 0000000000000850 R15: 000001022143f8ec
Jun 11 11:15:09 FS:  0000002a95584b00(0000) GS:ffffffff804ed780(0000) knlGS:0000000000000000
Jun 11 11:15:09 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 11 11:15:09 CR2: 00000000000000ff CR3: 0000000037e88000 CR4: 00000000000006e0
Jun 11 11:15:09 Process fsprobe (pid: 10704, threadinfo 000001022143e000, task 00000102190c57f0)
Jun 11 11:15:09 Stack: ffffffff8011c6b2 ffffffff8031fbd4 ffffffff80110b69 0000010037e8bcf8  <EOI> 
Jun 11 11:15:09        000001022143f8ec 0000000000000850 00000100cfedd1c0 00000100cfedd208 
Jun 11 11:15:09        0000010037e8bf58 ffffffff8031fbd4 
Jun 11 11:15:09 Call Trace:<IRQ> <ffffffff8011c6b2>{smp_call_function_interrupt+64} 
Jun 11 11:15:09        <ffffffff80110b69>{call_function_interrupt+133}  <EOI> <ffffffff8011c665>{smp_send_stop+76} 
Jun 11 11:15:09        <ffffffff80137ea2>{panic+235} <ffffffff80111860>{show_stack+241} 
Jun 11 11:15:09        <ffffffff8011198a>{show_registers+277} <ffffffff80111c91>{die_nmi+130} 
Jun 11 11:15:10        <ffffffff8011d16d>{nmi_watchdog_tick+210} <ffffffff80144ac3>{notifier_call_chain+31} 
Jun 11 11:15:10        <ffffffff8011255e>{default_do_nmi+112} <ffffffff8011d223>{do_nmi+115} 
Jun 11 11:15:10        <ffffffff80111173>{paranoid_exit+0} <ffffffff8030d37c>{bad_gs+259} 
Jun 11 11:15:10        
Jun 11 11:15:10 
Jun 11 11:15:10 Code:  Bad RIP value.
Jun 11 11:15:10 RIP [<00000000000000ff>] RSP <0000010037e9bfa0>
Jun 11 11:15:10 CR2: 00000000000000ff
Jun 11 11:15:10  <0>Kernel panic - not syncing: Oops
Jun 11 11:15:10  Badness in panic at kernel/panic.c:118
Jun 11 11:15:10 
Jun 11 11:15:10 Call Trace:<IRQ> <ffffffff80137fc6>{panic+527} <ffffffff80110833>{ret_from_intr+0} 
Jun 11 11:15:10        <ffffffff80111aec>{oops_end+38} <ffffffff80111b07>{oops_end+65} 
Jun 11 11:15:10        <ffffffff80124651>{do_page_fault+1125} <ffffffff801326d3>{activate_task+124} 
Jun 11 11:15:10        <ffffffff80132bfe>{try_to_wake_up+876} <ffffffff80110d91>{error_exit+0} 
Jun 11 11:15:10        <ffffffff8011c6b2>{smp_call_function_interrupt+64} 
Jun 11 11:15:10        <ffffffff80110b69>{call_function_interrupt+133}  <EOI> <ffffffff8011c665>{smp_send_stop+76} 
Jun 11 11:15:10        <ffffffff80137ea2>{panic+235} <ffffffff80111860>{show_stack+241} 
Jun 11 11:15:11        <ffffffff8011198a>{show_registers+277} <ffffffff80111c91>{die_nmi+130} 
Jun 11 11:15:11        <ffffffff8011d16d>{nmi_watchdog_tick+210} <ffffffff80144ac3>{notifier_call_chain+31} 
Jun 11 11:15:11        <ffffffff8011255e>{default_do_nmi+112} <ffffffff8011d223>{do_nmi+115} 
Jun 11 11:15:11        <ffffffff80111173>{paranoid_exit+0} <ffffffff8030d37c>{bad_gs+259} 
Jun 11 11:15:11        
Jun 11 11:15:11 Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987
Jun 11 11:15:11 
Jun 11 11:15:11 Call Trace:<IRQ> <ffffffff802434c3>{i8042_panic_blink+238} <ffffffff80137f74>{panic+445} 
Jun 11 11:15:12        <ffffffff80110833>{ret_from_intr+0} <ffffffff80111aec>{oops_end+38} 
Jun 11 11:15:12        <ffffffff80111b07>{oops_end+65} <ffffffff80124651>{do_page_fault+1125} 
Jun 11 11:15:12        <ffffffff801326d3>{activate_task+124} <ffffffff80132bfe>{try_to_wake_up+876} 
Jun 11 11:15:12        <ffffffff80110d91>{error_exit+0} <ffffffff8011c6b2>{smp_call_function_interrupt+64} 
Jun 11 11:15:12        <ffffffff80110b69>{call_function_interrupt+133}  <EOI> <ffffffff8011c665>{smp_send_stop+76} 
Jun 11 11:15:12        <ffffffff80137ea2>{panic+235} <ffffffff80111860>{show_stack+241} 
Jun 11 11:15:12        <ffffffff8011198a>{show_registers+277} <ffffffff80111c91>{die_nmi+130} 
Jun 11 11:15:12        <ffffffff8011d16d>{nmi_watchdog_tick+210} <ffffffff80144ac3>{notifier_call_chain+31} 
Jun 11 11:15:12        <ffffffff8011255e>{default_do_nmi+112} <ffffffff8011d223>{do_nmi+115} 
Jun 11 11:15:12        <ffffffff80111173>{paranoid_exit+0} <ffffffff8030d37c>{bad_gs+259} 
Jun 11 11:15:12        
Jun 11 11:15:12 Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990
Jun 11 11:15:12 
Jun 11 11:15:13 Call Trace:<IRQ> <ffffffff80243555>{i8042_panic_blink+384} <ffffffff80137f74>{panic+445} 
Jun 11 11:15:13        <ffffffff80110833>{ret_from_intr+0} <ffffffff80111aec>{oops_end+38} 
Jun 11 11:15:13        <ffffffff80111b07>{oops_end+65} <ffffffff80124651>{do_page_fault+1125} 
Jun 11 11:15:13        <ffffffff801326d3>{activate_task+124} <ffffffff80132bfe>{try_to_wake_up+876} 
Jun 11 11:15:13        <ffffffff80110d91>{error_exit+0} <ffffffff8011c6b2>{smp_call_function_interrupt+64} 
Jun 11 11:15:13        <ffffffff80110b69>{call_function_interrupt+133}  <EOI> <ffffffff8011c665>{smp_send_stop+76} 
Jun 11 11:15:13        <ffffffff80137ea2>{panic+235} <ffffffff80111860>{show_stack+241} 
Jun 11 11:15:13        <ffffffff8011198a>{show_registers+277} <ffffffff80111c91>{die_nmi+130} 
Jun 11 11:15:13        <ffffffff8011d16d>{nmi_watchdog_tick+210} <ffffffff80144ac3>{notifier_call_chain+31} 
Jun 11 11:15:13        <ffffffff8011255e>{default_do_nmi+112} <ffffffff8011d223>{do_nmi+115} 
Jun 11 11:15:13        <ffffffff80111173>{paranoid_exit+0} <ffffffff8030d37c>{bad_gs+259} 
Jun 11 11:15:13        
Jun 11 11:15:13 Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992
Jun 11 11:15:13 
Jun 11 11:15:14 Call Trace:<IRQ> <ffffffff802435ba>{i8042_panic_blink+485} <ffffffff80137f74>{panic+445} 
Jun 11 11:15:14        <ffffffff80110833>{ret_from_intr+0} <ffffffff80111aec>{oops_end+38} 
Jun 11 11:15:14        <ffffffff80111b07>{oops_end+65} <ffffffff80124651>{do_page_fault+1125} 
Jun 11 11:15:14        <ffffffff801326d3>{activate_task+124} <ffffffff80132bfe>{try_to_wake_up+876} 
Jun 11 11:15:14        <ffffffff80110d91>{error_exit+0} <ffffffff8011c6b2>{smp_call_function_interrupt+64} 
Jun 11 11:15:14        <ffffffff80110b69>{call_function_interrupt+133}  <EOI> <ffffffff8011c665>{smp_send_stop+76} 
Jun 11 11:15:14        <ffffffff80137ea2>{panic+235} <ffffffff80111860>{show_stack+241} 
Jun 11 11:15:14        <ffffffff8011198a>{show_registers+277} <ffffffff80111c91>{die_nmi+130} 
Jun 11 11:15:14        <ffffffff8011d16d>{nmi_watchdog_tick+210} <ffffffff80144ac3>{notifier_call_chain+31} 
Jun 11 11:15:14        <ffffffff8011255e>{default_do_nmi+112} <ffffffff8011d223>{do_nmi+115} 
Jun 11 11:15:14        <ffffffff80111173>{paranoid_exit+0} <ffffffff8030d37c>{bad_gs+259} 
Jun 11 11:15:14        
06/11/07 11:16:00 lxb7913 Heartbeat Stopped lxb7913.cern.ch
Jun 11 11:21:55 Badness in i8042_panic_blink at drivers/input/serio/i8042.c:989
Jun 11 11:21:55 
Jun 11 11:21:55 Call Trace:<IRQ> <ffffffff802434c3>{i8042_panic_blink+238} <ffffffff80137f74>{panic+445} 
Jun 11 11:21:55        <ffffffff80110833>{ret_from_intr+0} <ffffffff80111aec>{oops_end+38} 
Jun 11 11:21:55        <ffffffff80111b07>{oops_end+65} <ffffffff80124651>{do_page_fault+1125} 
Jun 11 11:21:55        <ffffffff801326d3>{activate_task+124} <ffffffff80132bfe>{try_to_wake_up+876} 
Jun 11 11:21:55        <ffffffff80110d91>{error_exit+0} <ffffffff8011c6b2>{smp_call_function_interrupt+64} 
Jun 11 11:21:55        <ffffffff80110b69>{call_function_interrupt+133}  <EOI> <ffffffff8011c665>{smp_send_stop+76} 
Jun 11 11:21:55        <ffffffff80137ea2>{panic+235} <ffffffff80111860>{show_stack+241} 
Jun 11 11:21:56        <ffffffff8011198a>{show_registers+277} <ffffffff80111c91>{die_nmi+130} 
Jun 11 11:21:56        <ffffffff8011d16d>{nmi_watchdog_tick+210} <ffffffff80144ac3>{notifier_call_chain+31} 
Jun 11 11:21:56        <ffffffff8011255e>{default_do_nmi+112} <ffffffff8011d223>{do_nmi+115} 
Jun 11 11:21:56        <ffffffff80111173>{paranoid_exit+0} <ffffffff8030d37c>{bad_gs+259} 
Jun 11 11:21:56        

Expected results:
no panic?

Additional info:

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 2 Issue Tracker 2007-08-03 21:14:09 UTC
File uploaded: pairing.tar.gz

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624
it_file 93334

Comment 3 Issue Tracker 2007-08-03 21:14:12 UTC
File uploaded: pagealloc-debug.log

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624
it_file 95482

Comment 4 Issue Tracker 2007-08-03 21:14:16 UTC
Attaching the oops when the DEBUG_PAGEALLOC is enabled.


This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 5 Issue Tracker 2007-08-03 21:14:20 UTC
Updating:

I've been able to reproduce it on upstream kernel too.

I've done some tests here and passive mode always triggers it, then
removing some code parts to see what triggers and what doesn't I
found that the bug only happens when you receive a connection
(passive mode) bind on multicast. I commented out the accept()
close() pair and didn't reproduced it.

There is an ioctl() setting SO_LINGER which doesn't affect the bug and 
setting socket to O_NONBLOCK doesn't affect too.

I'll check the code path of accept and close if there is any special
situation for multicasting packets.

-Flavio


This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 6 Issue Tracker 2007-08-03 21:14:25 UTC
I've found the bug:

starting from sys_accept() it ends up at tcp_v4_syn_recv_sock() to alloc 
new socket doing as below:
...
  newsk = tcp_create_openreq_child(sk, req, skb);
...

but this function does:
tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct
sk_buff *sk) {
        struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);
...

it's starting to clone the parent socket:
struct sock *inet_csk_clone(struct sock *sk, const struct request_sock
*req,
                            const gfp_t priority)
{
        struct sock *newsk = sk_clone(sk, priority);


sk_clone() ends up at sock_copy(newsk, sk) which does:
memcpy(nsk, osk, osk->sk_prot->obj_size);

Here all struct tcp_sock are copied as-is, including the ->mc_list
pointer
which includes objects with refcounts, so in the end we have two sockets
with the same ->mc_list, but the contents are only for one.

The solution here is to include a new _copy method because it does now:

inet_csk_clone() -> sk_clone() -> sock_copy().

and it should does:

inet_csk_clone() -> inet_sk_clone()  -> sk_clone() -> sock_copy()

and this inet_sk_clone() should bump all objects refcounts in ->mc_list.

I'll cook a patch and test now.
-Flavio


This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 7 Issue Tracker 2007-08-03 21:14:32 UTC
File uploaded: inet_csk_clone-null-mc_list.patch

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624
it_file 97581

Comment 8 Issue Tracker 2007-08-03 21:14:35 UTC
The inet_csk_clone-null-mc_list.patch does ->mc_list = NULL when cloning
the
parent sock. I did just to make sure we are on track and after it I had no

more oopses.

Now I have to figure out if this list should be cloned and the refcnt
bumped
or if we should just nullify it.

-Flavio


This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 9 Issue Tracker 2007-08-03 21:14:38 UTC
Updating:

Looking into other protocol I've found this:
net/sctp/protocol.c:
 sctp_v4_create_accept_sk() {
...
 newinet = inet_sk(newsk);
...
=> newinet->mc_list = NULL;

I sent the patch to upstream but there is no url point to the submit until
now.
-Flavio



This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 10 Issue Tracker 2007-08-03 21:14:42 UTC
Upstream submit url:
http://www.mail-archive.com/netdev%40vger.kernel.org/msg43761.html
-Flavio


This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 11 Issue Tracker 2007-08-03 21:14:44 UTC
Updating:

This is from David Miller answer:
"...
Multicast subscriptions cannot even be used with TCP and DCCP, which
are the only two users of these connection oriented socket functions.
..." 

and he is right, we are trying to use connection oriented socket to
do multicasting work and this is wrong.

I'm sending another patch fixing setsockopt() to return an error when
the socket is connection oriented.

Two things:
1) The customer's application needs to be fixed to use 
SOCK_DGRAM (connectionless) instead of SOCK_STREAM (connection oriented)
See this tips:
http://www.cs.unc.edu/~jeffay/dirt/FAQ/comp249-001-F99/mcast-socket.html

2) Fix the kernel to not allow connection oriented there.
- patch sent, waiting for feedback.

Is that ok with you?
-Flavio


Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 123624

Comment 12 Flavio Leitner 2007-08-03 21:22:15 UTC
Hi,

The latest patch I'm trying to push is this:
http://www.mail-archive.com/netdev%40vger.kernel.org/msg43919.html

but I'm still waiting for feedback.
Could it be on next one?

-Flavio

Comment 13 Flavio Leitner 2007-08-25 13:49:31 UTC
Patch accepted
http://www.mail-archive.com/netdev%40vger.kernel.org/msg46196.html
-Flavio

Comment 14 Anton Arapov 2007-08-28 10:38:33 UTC
Created attachment 175861 [details]
Proposed patch

patch backported to RHEL4 from upstream.

Comment 16 RHEL Program Management 2008-01-16 16:57:26 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Vivek Goyal 2008-01-23 00:10:27 UTC
Committed in 68.8

Comment 22 errata-xmlrpc 2008-07-24 19:14:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html