Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 444215 - race in inetdev_init causes system crash
race in inetdev_init causes system crash
Status: CLOSED DUPLICATE of bug 456653
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
high Severity high
: rc
: ---
Assigned To: Ivan Vecera
Martin Jenner
Depends On:
Blocks: 461297
  Show dependency treegraph
Reported: 2008-04-25 15:52 EDT by Fabio Olive Leite
Modified: 2009-04-06 16:58 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-04-06 16:58:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Final patch sent to review (2.17 KB, patch)
2008-10-14 10:41 EDT, Ivan Vecera
no flags Details | Diff

  None (edit)
Description Fabio Olive Leite 2008-04-25 15:52:11 EDT
Description of problem:

It is possible to crash the system when adding the first inet address to an
interface and a multicast packet comes in just at the right time. The pointer to
the in_device structure is attached to the net_device structure before in_device
is fully initialized. Can happen on both IPv4 and IPv6.

This has been addressed upstream on commit
30c4cf577fb5b68c16e5750d6bdbd7072e42b279. It has already been fixed on RHEL-5,
as part of another bug.

A beautiful stack trace from one such crashes is below. Notice how a multicast
packet comes in before inetdev_init calls ip_mc_init_dev(). In the packet
reception callchain, ip_check_mc then hits a BUG() in _read_lock() because the
locks for the multicast information are still uninitialized.

Kernel BUG at spinlock:172
invalid operand: 0000 [1] SMP
Modules linked in: dm_mirror dm_mod hw_random nfs_acl sunrpc sd_mod scsi_mod
ext3 jbd
Pid: 2031, comm: ip    2.6.9-42.ELsmp
RIP: 0010:[<ffffffff8030b11d>] <ffffffff8030b11d>{_read_lock+12}
RSP: 0018:ffffffff80456c80  EFLAGS: 00010213
RAX: 000001020acd2218 RBX: 0000000000000000 RCX: 0000000000000070
RDX: 000000001e1b8387 RSI: 00000000120000e0 RDI: 000001020acd2218
RBP: 000001020acd2200 R08: ffffffffa01466a0 R09: 0000000000000246
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000120000e0
R13: 0000000000000070 R14: 000000001e1b8387 R15: ffffffffa01466a0
FS:  0000002a95574e00(0000) GS:ffffffff804e5080(0000) knlGS:00000000f7fdf6c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000620008 CR3: 0000000000101000 CR4: 00000000000006e0
Process ip (pid: 2031, threadinfo 000001020b010000, task 00000101079e47f0)
Stack: ffffffff802f2e9c 0000000000000002 000001010a592c00 00000000120000e0
     000001020acd2200 000000001e1b8387 ffffffff802c68a0 000001010a593800
     ffffffffa01281b9 0000000000000001
Call Trace:<IRQ>

Code: 0f 0b 6f 4d 32 80 ff ff ff ff ac 00 f0 83 28 01 0f 88 93 02
RIP <ffffffff8030b11d>{_read_lock+12} RSP <ffffffff80456c80>

Version-Release number of selected component (if applicable):

Customer hit 2.6.9-42.EL, but the code on the latest RHEL-4 kernel I could find
is still the same.

How reproducible:

It's a race condition, so it depends on hitting just the right timing. A
customer has hit it, and I think I was able to crash a box once when trying to
reproduce. Unfortunately the remote console did not get any messages. The box
did lock up though. :)

Steps to Reproduce:

A suggestion to reproduce (might still need some tuning) is to have two boxes on
the same network, one sending multicast packets (using a generic group like (all hosts)) and the other constantly removing and adding the NIC's IP
address. I used the two scripts below to try and automate it.

- send-multi.py:

import socket
import time
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1) # needed?
while 1:
  s.sendto("bang!\n", ("", 12345))

- flip-addr.sh:

while :; do
  for ((a = 0; a < 1000; a++)); do
    ip addr del $1 dev eth0
    ip addr add $1 dev eth0
    echo -n .
  echo Waiting 3 seconds to try again
  sleep 3

Check the ip address of eth0 on the box an feed it to flip-addr.sh as the only
parameter, in a way the ip command would like to get it, like "".
The python script can just be left running on any box on the same network segment.

Fabio Olive
Comment 3 RHEL Product and Program Management 2008-09-03 09:05:08 EDT
Updating PM score.
Comment 5 Ivan Vecera 2008-09-29 11:57:26 EDT
I have prepared test kernel packages for i686 and x86_64. Could anybody test them?
They are available at:
Comment 6 Ivan Vecera 2008-10-14 10:41:37 EDT
Created attachment 320305 [details]
Final patch sent to review
Comment 9 Linda Wang 2009-04-06 16:58:44 EDT

*** This bug has been marked as a duplicate of bug 456653 ***

Note You need to log in before you can comment on or make changes to this bug.