Bug 140328 - Kernel oops (non-terminal) on removal of "blackhole" or "unreachable" type routes
Summary: Kernel oops (non-terminal) on removal of "blackhole" or "unreachable" type ro...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 3
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-11-22 12:13 UTC by Timothy Hinchcliffe
Modified: 2015-01-04 22:12 UTC (History)
2 users (show)

Fixed In Version: 1.681_FC3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-11-23 15:22:57 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Copy of the output of demonstration session. (6.68 KB, text/plain)
2004-11-22 12:16 UTC, Timothy Hinchcliffe
no flags Details

Description Timothy Hinchcliffe 2004-11-22 12:13:06 UTC
Description of problem:
When removing "blackhole" or "unreachable" type routes using "ip route
del blackhole 192.168.0.0/16" kernel panics as below but keeps
running. "ip route list" command will lock up and not return.

Version-Release number of selected component (if applicable):
2.6.9-1.667 and 2.6.9-678_FC3

How reproducible:
Every time. Tried on 2 seperate installs on Intel i686 and athlon
hardware.

Steps to Reproduce:
1. ip route add blackhole 192.168.0.0/16
2. ip route del blackhole 192.168.0.0/16
You should see the kernel oops on the console/in logs here.
3. ip route list
Will not return.
  
Actual results:
Kernel panic, and lockup of route commands.

Expected results:
Silent removal of specified blackhole route, route command continue to
work as normal.

Additional info:
Copy of an actual session including Oops attached.
Same results using the old "route" command as opposed to the "ip
route" command.
Same results using the zebra daemon.

Comment 1 Timothy Hinchcliffe 2004-11-22 12:16:26 UTC
Created attachment 107176 [details]
Copy of the output of demonstration session.

Here I go through the steps that cause the bug, and record the kernel oops as
logged to syslog.
This is identical to what I see on the console in run level 1.

Comment 2 Timothy Hinchcliffe 2004-11-23 12:27:43 UTC
Line numbers from the source rpm of kernel 2.6.9-678_FC3:
I belive that the Oops is occouring in line 526 of
include/linux/list.h: "*pprev = next;" because pprev is null.
This was inlined at line 166 of net/ipv4/fib_semantics.c:
"hlist_del(&nh->nh_hash);" which is releasing the next hop hash lists

I am guessing that a blackhole route manages to inject an incomplete
hash entry into the nexthops list with pprev set to null somehow.

Waiting for a kernel to compile on a very slow machine to confirm this
through printk...

Comment 3 Timothy Hinchcliffe 2004-11-23 14:38:52 UTC
I think the cause of the problem is lines 742,743 of
net/ipv4/fib_semantics.c in fib_create_info():
>if (!nh->nh_dev)
>   continue;
Basicly if there is no nh_dev part of the next_hop structure, then the
nh_hash is never initialised so will has pprev set to null.

If this is a valid senario, hlist_del() needs to check nh_dev and only
run __hlist_del() if it is non-null. Otherwise the continue should
become some sort of error and the cause of an invalid nh_dev tracked
down. Or alternativly, the nh_hash needs to be initialized into a no
device type chain.

I *think* this effect the stock 2.6.9 kernel as well. I am unable to
verify that though.

Comment 4 Timothy Hinchcliffe 2004-11-23 15:22:57 UTC
Fixed in kernel-2.6.9-1.681_FC3!

I had just figured out the patch as well!


Note You need to log in before you can comment on or make changes to this bug.