Bug 212122 - 2.6.9-42.19.ELsmp kernel deadlock when IPv6 address is configured on an unplugged interface
Summary: 2.6.9-42.19.ELsmp kernel deadlock when IPv6 address is configured on an unplu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Neil Horman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-10-25 07:58 UTC by Paul Dwyer
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 03:54:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to fix addrconf deadlock (408 bytes, patch)
2006-10-27 17:31 UTC, Neil Horman
no flags Details | Diff
patch to fix double unlock (315 bytes, patch)
2006-11-27 14:18 UTC, Neil Horman
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0304 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5 2007-04-28 18:58:50 UTC

Description Paul Dwyer 2006-10-25 07:58:39 UTC
Description of problem:
When using 2.6.9-42.19.ELsmp kernel from
http://people.redhat.com/~jbaron/rhel4/SRPMS.kernel/ kernel hangs in deadlock in
case an IPv6 address is configured on a interface which has it's cable unplugged.

Version-Release number of selected component (if applicable):
2.6.9-42.19.ELsmp

How reproducible:


Steps to Reproduce:
[root@jalmari ~]# ip link set eth1 down
[root@jalmari ~]# ip link set eth1 up
[root@jalmari ~]# mii-tool eth1
eth1: no link
[root@jalmari ~]# ip address add 2000::11/64 dev eth1
  
Actual results:
[halt sent]
SysRq : Show Regs

Pid: 3837, comm:                   ip
EIP: 0060:[<c02d39c5>] CPU: 0
EIP is at _spin_lock_bh+0x3c/0x42
EFLAGS: 00000286    Not tainted  (2.6.9-42.19.ELsmp)
EAX: cb552000 EBX: cc82f708 ECX: 9b914cf4 EDX: 0008e365
ESI: cc82f6e0 EDI: cc82f708 EBP: cfe80800 DS: 007b ES: 007b
CR0: 8005003b CR2: 09223004 CR3: 0fd17560 CR4: 000006f0
[<d0acb1e6>] addrconf_dad_stop+0x17/0x90 [ipv6]
[<d0accd6c>] addrconf_dad_start+0x84/0x90 [ipv6]
[<d0acc0e3>] inet6_addr_add+0xa6/0xc0 [ipv6]
[<d0acd3a1>] inet6_rtm_newaddr+0x0/0x5b [ipv6]
[<c02880e3>] rtnetlink_rcv+0x226/0x327
[<c0292b56>] netlink_data_ready+0x14/0x44
[<c0292263>] netlink_sendskb+0x52/0x6c
[<c0292971>] netlink_sendmsg+0x271/0x280
[<c027823d>] sock_sendmsg+0xdb/0xf7
[<c0120519>] autoremove_wake_function+0x0/0x2d
[<c027d3c2>] verify_iovec+0x76/0xc2
[<c0279988>] sys_sendmsg+0x1ee/0x23b
[<c014e82e>] handle_mm_fault+0xdc/0x193
[<c014f55e>] vma_link+0x44/0xbc
[<c0150eca>] do_brk+0x1f0/0x22a
[<c0279d8f>] sys_socketcall+0x1df/0x1fb
[<c02d4cf7>] syscall_call+0x7/0xb

Expected results:


Additional info:

Comment 1 Neil Horman 2006-10-26 11:05:50 UTC
If you have the system still available could you please provide a sysrq-t from
when the system is hung please?  It would be helpful to know which process is
holding the semaphore that the above backtrace is blocked on.  Thanks!

Comment 3 Neil Horman 2006-10-27 17:23:53 UTC
Sorry, I didn't even look at the code, I assumed that another process was
holding the lock, although, for the record, there is no patch posted to this
bug.  I'll fix it shortly though.


Comment 4 Neil Horman 2006-10-27 17:31:22 UTC
Created attachment 139597 [details]
patch to fix addrconf deadlock

Comment 8 Jason Baron 2006-11-03 21:59:29 UTC
committed in stream U5 build 42.23. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Although based on comment #7, perhaps we need to revisit this further....

Comment 11 Neil Horman 2006-11-08 15:14:54 UTC
No further movement is needed, really.  The patch has been comitted.  Nokia's
observation regarding the double unlock will cause a minor gripe from the lock
validator, but no real problems.  I'm going to clean that up shortly, but as far
as this bug is concerned, the fix is in place.

Comment 15 David Miller 2006-11-24 01:45:51 UTC
I think the double unlock, if it's there, is a real bug.

If we do the first unlock, another cpu grabs the lock, then we do that
second bogus unlock, this allows a third cpu into the critical section
erroneously which will corrupt data.

It's a bug, and it can very well cause corruption, so we should fix it.


Comment 16 Neil Horman 2006-11-27 13:54:08 UTC
Yeah, after considering it further, I agree.  I'll look at it and post a repo
patch if the double unlock exists.


Comment 17 Neil Horman 2006-11-27 14:18:38 UTC
Created attachment 142170 [details]
patch to fix double unlock

I've submitted this patch to fix the double unlock condition

Comment 18 RHEL Program Management 2006-11-28 03:34:45 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 19 RHEL Program Management 2006-11-28 03:34:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Jason Baron 2006-11-29 22:28:27 UTC
ok, i've integrated the patch from comment #17. A test kernel with this patch is
available from http://people.redhat.com/~jbaron/rhel4/


Comment 22 RHEL Program Management 2006-12-12 16:53:36 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 24 Jay Turner 2007-01-02 13:46:46 UTC
QE ack for RHEL4.5.

Comment 28 Mike Gahagan 2007-03-28 15:44:09 UTC
Both patches are in the -51 kernel and I set an ipv6 address with the network
down with no hang.


Comment 31 Red Hat Bugzilla 2007-05-08 03:54:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html


Note You need to log in before you can comment on or make changes to this bug.