Bug 444215 - race in inetdev_init causes system crash
Summary: race in inetdev_init causes system crash
Status: CLOSED DUPLICATE of bug 456653
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Ivan Vecera
QA Contact: Martin Jenner
Depends On:
Blocks: 461297
TreeView+ depends on / blocked
Reported: 2008-04-25 19:52 UTC by Fabio Olive Leite
Modified: 2009-04-06 20:58 UTC (History)
4 users (show)

Clone Of:
Last Closed: 2009-04-06 20:58:44 UTC

Attachments (Terms of Use)
Final patch sent to review (2.17 KB, patch)
2008-10-14 14:41 UTC, Ivan Vecera
no flags Details | Diff

Description Fabio Olive Leite 2008-04-25 19:52:11 UTC
Description of problem:

It is possible to crash the system when adding the first inet address to an
interface and a multicast packet comes in just at the right time. The pointer to
the in_device structure is attached to the net_device structure before in_device
is fully initialized. Can happen on both IPv4 and IPv6.

This has been addressed upstream on commit
30c4cf577fb5b68c16e5750d6bdbd7072e42b279. It has already been fixed on RHEL-5,
as part of another bug.

A beautiful stack trace from one such crashes is below. Notice how a multicast
packet comes in before inetdev_init calls ip_mc_init_dev(). In the packet
reception callchain, ip_check_mc then hits a BUG() in _read_lock() because the
locks for the multicast information are still uninitialized.

Kernel BUG at spinlock:172
invalid operand: 0000 [1] SMP
Modules linked in: dm_mirror dm_mod hw_random nfs_acl sunrpc sd_mod scsi_mod
ext3 jbd
Pid: 2031, comm: ip    2.6.9-42.ELsmp
RIP: 0010:[<ffffffff8030b11d>] <ffffffff8030b11d>{_read_lock+12}
RSP: 0018:ffffffff80456c80  EFLAGS: 00010213
RAX: 000001020acd2218 RBX: 0000000000000000 RCX: 0000000000000070
RDX: 000000001e1b8387 RSI: 00000000120000e0 RDI: 000001020acd2218
RBP: 000001020acd2200 R08: ffffffffa01466a0 R09: 0000000000000246
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000120000e0
R13: 0000000000000070 R14: 000000001e1b8387 R15: ffffffffa01466a0
FS:  0000002a95574e00(0000) GS:ffffffff804e5080(0000) knlGS:00000000f7fdf6c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000620008 CR3: 0000000000101000 CR4: 00000000000006e0
Process ip (pid: 2031, threadinfo 000001020b010000, task 00000101079e47f0)
Stack: ffffffff802f2e9c 0000000000000002 000001010a592c00 00000000120000e0
     000001020acd2200 000000001e1b8387 ffffffff802c68a0 000001010a593800
     ffffffffa01281b9 0000000000000001
Call Trace:<IRQ>

Code: 0f 0b 6f 4d 32 80 ff ff ff ff ac 00 f0 83 28 01 0f 88 93 02
RIP <ffffffff8030b11d>{_read_lock+12} RSP <ffffffff80456c80>

Version-Release number of selected component (if applicable):

Customer hit 2.6.9-42.EL, but the code on the latest RHEL-4 kernel I could find
is still the same.

How reproducible:

It's a race condition, so it depends on hitting just the right timing. A
customer has hit it, and I think I was able to crash a box once when trying to
reproduce. Unfortunately the remote console did not get any messages. The box
did lock up though. :)

Steps to Reproduce:

A suggestion to reproduce (might still need some tuning) is to have two boxes on
the same network, one sending multicast packets (using a generic group like (all hosts)) and the other constantly removing and adding the NIC's IP
address. I used the two scripts below to try and automate it.

- send-multi.py:

import socket
import time
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1) # needed?
while 1:
  s.sendto("bang!\n", ("", 12345))

- flip-addr.sh:

while :; do
  for ((a = 0; a < 1000; a++)); do
    ip addr del $1 dev eth0
    ip addr add $1 dev eth0
    echo -n .
  echo Waiting 3 seconds to try again
  sleep 3

Check the ip address of eth0 on the box an feed it to flip-addr.sh as the only
parameter, in a way the ip command would like to get it, like "".
The python script can just be left running on any box on the same network segment.

Fabio Olive

Comment 3 RHEL Product and Program Management 2008-09-03 13:05:08 UTC
Updating PM score.

Comment 5 Ivan Vecera 2008-09-29 15:57:26 UTC
I have prepared test kernel packages for i686 and x86_64. Could anybody test them?
They are available at:

Comment 6 Ivan Vecera 2008-10-14 14:41:37 UTC
Created attachment 320305 [details]
Final patch sent to review

Comment 9 Linda Wang 2009-04-06 20:58:44 UTC

*** This bug has been marked as a duplicate of bug 456653 ***

Note You need to log in before you can comment on or make changes to this bug.