Bug 444215 - race in inetdev_init causes system crash
race in inetdev_init causes system crash
Status: CLOSED DUPLICATE of bug 456653
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.6
All Linux
high Severity high
: rc
: ---
Assigned To: Ivan Vecera
Martin Jenner
:
Depends On:
Blocks: 461297
  Show dependency treegraph
 
Reported: 2008-04-25 15:52 EDT by Fabio Olive Leite
Modified: 2009-04-06 16:58 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-06 16:58:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Final patch sent to review (2.17 KB, patch)
2008-10-14 10:41 EDT, Ivan Vecera
no flags Details | Diff

  None (edit)
Description Fabio Olive Leite 2008-04-25 15:52:11 EDT
Description of problem:

It is possible to crash the system when adding the first inet address to an
interface and a multicast packet comes in just at the right time. The pointer to
the in_device structure is attached to the net_device structure before in_device
is fully initialized. Can happen on both IPv4 and IPv6.

This has been addressed upstream on commit
30c4cf577fb5b68c16e5750d6bdbd7072e42b279. It has already been fixed on RHEL-5,
as part of another bug.

A beautiful stack trace from one such crashes is below. Notice how a multicast
packet comes in before inetdev_init calls ip_mc_init_dev(). In the packet
reception callchain, ip_check_mc then hits a BUG() in _read_lock() because the
locks for the multicast information are still uninitialized.

Kernel BUG at spinlock:172
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: dm_mirror dm_mod hw_random nfs_acl sunrpc sd_mod scsi_mod
ext3 jbd
Pid: 2031, comm: ip    2.6.9-42.ELsmp
RIP: 0010:[<ffffffff8030b11d>] <ffffffff8030b11d>{_read_lock+12}
RSP: 0018:ffffffff80456c80  EFLAGS: 00010213
RAX: 000001020acd2218 RBX: 0000000000000000 RCX: 0000000000000070
RDX: 000000001e1b8387 RSI: 00000000120000e0 RDI: 000001020acd2218
RBP: 000001020acd2200 R08: ffffffffa01466a0 R09: 0000000000000246
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000120000e0
R13: 0000000000000070 R14: 000000001e1b8387 R15: ffffffffa01466a0
FS:  0000002a95574e00(0000) GS:ffffffff804e5080(0000) knlGS:00000000f7fdf6c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000620008 CR3: 0000000000101000 CR4: 00000000000006e0
Process ip (pid: 2031, threadinfo 000001020b010000, task 00000101079e47f0)
Stack: ffffffff802f2e9c 0000000000000002 000001010a592c00 00000000120000e0
     000001020acd2200 000000001e1b8387 ffffffff802c68a0 000001010a593800
     ffffffffa01281b9 0000000000000001
Call Trace:<IRQ>
     <ffffffff802f2e9c>{ip_check_mc+31}
     <ffffffff802c68a0>{ip_route_input+246}
     <ffffffff802c96a2>{ip_rcv+537}
     <ffffffff802b066b>{netif_receive_skb+791}
     <ffffffff802b0730>{process_backlog+136}
     <ffffffff802b0884>{net_rx_action+203}
     <ffffffff8013c738>{__do_softirq+88}
     <ffffffff8013c7e1>{do_softirq+49}
     <ffffffff80113247>{do_IRQ+328}
     <ffffffff80110833>{ret_from_intr+0}
     <EOI>
     <ffffffff80173ed0>{alloc_pages_current+24}
     <ffffffff8015e141>{__get_free_pages+11}
     <ffffffff8016127c>{kmem_getpages+36}
     <ffffffff80161a11>{cache_alloc_refill+609}
     <ffffffff801616df>{__kmalloc+123}
     <ffffffff801ad805>{proc_create+110}
     <ffffffff801ad723>{proc_register+157}
     <ffffffff801ad8b3>{create_proc_entry+93}
     <ffffffff8013db09>{register_proc_table+180}
     <ffffffff8013db44>{register_proc_table+239}
     <ffffffff8013db44>{register_proc_table+239}
     <ffffffff8013db44>{register_proc_table+239}
     <ffffffff8013db44>{register_proc_table+239}
     <ffffffff8013dc12>{register_sysctl_table+184}
     <ffffffff802edd32>{devinet_sysctl_register+278}
     <ffffffff802ede6b>{inetdev_init+273}
     <ffffffff802edf3d>{inet_rtm_newaddr+165}
     <ffffffff802b74c5>{rtnetlink_rcv+602}
     <ffffffff802c3bbb>{netlink_data_ready+22}
     <ffffffff802c33b1>{netlink_sendskb+113}
     <ffffffff802c3b90>{netlink_sendmsg+694}
     <ffffffff802a7143>{sock_sendmsg+271}
     <ffffffff80135752>{autoremove_wake_function+0}
     <ffffffff802a8ab3>{sys_sendmsg+463}
     <ffffffff801ce6c8>{capable+24}
     <ffffffff80123ed3>{do_page_fault+575}
     <ffffffff8016c6cd>{do_brk+573}
     <ffffffff8011026a>{system_call+126}

Code: 0f 0b 6f 4d 32 80 ff ff ff ff ac 00 f0 83 28 01 0f 88 93 02
RIP <ffffffff8030b11d>{_read_lock+12} RSP <ffffffff80456c80>

Version-Release number of selected component (if applicable):

Customer hit 2.6.9-42.EL, but the code on the latest RHEL-4 kernel I could find
is still the same.

How reproducible:

It's a race condition, so it depends on hitting just the right timing. A
customer has hit it, and I think I was able to crash a box once when trying to
reproduce. Unfortunately the remote console did not get any messages. The box
did lock up though. :)

Steps to Reproduce:

A suggestion to reproduce (might still need some tuning) is to have two boxes on
the same network, one sending multicast packets (using a generic group like
224.0.0.1 (all hosts)) and the other constantly removing and adding the NIC's IP
address. I used the two scripts below to try and automate it.

- send-multi.py:

#!/usr/bin/python
import socket
import time
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1) # needed?
while 1:
  s.sendto("bang!\n", ("224.0.0.1", 12345))
  time.sleep(1)

- flip-addr.sh:

#!/bin/bash
while :; do
  for ((a = 0; a < 1000; a++)); do
    ip addr del $1 dev eth0
    ip addr add $1 dev eth0
    echo -n .
  done
  echo
  echo Waiting 3 seconds to try again
  sleep 3
done

Check the ip address of eth0 on the box an feed it to flip-addr.sh as the only
parameter, in a way the ip command would like to get it, like "192.168.1.12/24".
The python script can just be left running on any box on the same network segment.

Cheers,
Fabio Olive
Comment 3 RHEL Product and Program Management 2008-09-03 09:05:08 EDT
Updating PM score.
Comment 5 Ivan Vecera 2008-09-29 11:57:26 EDT
I have prepared test kernel packages for i686 and x86_64. Could anybody test them?
They are available at:
http://people.redhat.com/ivecera/rhel-4-ivtest/
Comment 6 Ivan Vecera 2008-10-14 10:41:37 EDT
Created attachment 320305 [details]
Final patch sent to review
Comment 9 Linda Wang 2009-04-06 16:58:44 EDT

*** This bug has been marked as a duplicate of bug 456653 ***

Note You need to log in before you can comment on or make changes to this bug.