212122 – 2.6.9-42.19.ELsmp kernel deadlock when IPv6 address is configured on an unplugged interface

Bug 212122 - 2.6.9-42.19.ELsmp kernel deadlock when IPv6 address is configured on an unplugged interface

Summary: 2.6.9-42.19.ELsmp kernel deadlock when IPv6 address is configured on an unplu...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-10-25 07:58 UTC by Paul Dwyer
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHBA-2007-0304
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-05-08 03:54:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch to fix addrconf deadlock (408 bytes, patch) 2006-10-27 17:31 UTC, Neil Horman	no flags	Details \| Diff
patch to fix double unlock (315 bytes, patch) 2006-11-27 14:18 UTC, Neil Horman	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0304	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5	2007-04-28 18:58:50 UTC

Description Paul Dwyer 2006-10-25 07:58:39 UTC

Description of problem:
When using 2.6.9-42.19.ELsmp kernel from
http://people.redhat.com/~jbaron/rhel4/SRPMS.kernel/ kernel hangs in deadlock in
case an IPv6 address is configured on a interface which has it's cable unplugged.

Version-Release number of selected component (if applicable):
2.6.9-42.19.ELsmp

How reproducible:


Steps to Reproduce:
[root@jalmari ~]# ip link set eth1 down
[root@jalmari ~]# ip link set eth1 up
[root@jalmari ~]# mii-tool eth1
eth1: no link
[root@jalmari ~]# ip address add 2000::11/64 dev eth1
  
Actual results:
[halt sent]
SysRq : Show Regs

Pid: 3837, comm:                   ip
EIP: 0060:[<c02d39c5>] CPU: 0
EIP is at _spin_lock_bh+0x3c/0x42
EFLAGS: 00000286    Not tainted  (2.6.9-42.19.ELsmp)
EAX: cb552000 EBX: cc82f708 ECX: 9b914cf4 EDX: 0008e365
ESI: cc82f6e0 EDI: cc82f708 EBP: cfe80800 DS: 007b ES: 007b
CR0: 8005003b CR2: 09223004 CR3: 0fd17560 CR4: 000006f0
[<d0acb1e6>] addrconf_dad_stop+0x17/0x90 [ipv6]
[<d0accd6c>] addrconf_dad_start+0x84/0x90 [ipv6]
[<d0acc0e3>] inet6_addr_add+0xa6/0xc0 [ipv6]
[<d0acd3a1>] inet6_rtm_newaddr+0x0/0x5b [ipv6]
[<c02880e3>] rtnetlink_rcv+0x226/0x327
[<c0292b56>] netlink_data_ready+0x14/0x44
[<c0292263>] netlink_sendskb+0x52/0x6c
[<c0292971>] netlink_sendmsg+0x271/0x280
[<c027823d>] sock_sendmsg+0xdb/0xf7
[<c0120519>] autoremove_wake_function+0x0/0x2d
[<c027d3c2>] verify_iovec+0x76/0xc2
[<c0279988>] sys_sendmsg+0x1ee/0x23b
[<c014e82e>] handle_mm_fault+0xdc/0x193
[<c014f55e>] vma_link+0x44/0xbc
[<c0150eca>] do_brk+0x1f0/0x22a
[<c0279d8f>] sys_socketcall+0x1df/0x1fb
[<c02d4cf7>] syscall_call+0x7/0xb

Expected results:


Additional info:

Comment 1 Neil Horman 2006-10-26 11:05:50 UTC

If you have the system still available could you please provide a sysrq-t from
when the system is hung please?  It would be helpful to know which process is
holding the semaphore that the above backtrace is blocked on.  Thanks!

Comment 3 Neil Horman 2006-10-27 17:23:53 UTC

Sorry, I didn't even look at the code, I assumed that another process was
holding the lock, although, for the record, there is no patch posted to this
bug.  I'll fix it shortly though.

Comment 4 Neil Horman 2006-10-27 17:31:22 UTC

Created attachment 139597 [details]
patch to fix addrconf deadlock

Comment 8 Jason Baron 2006-11-03 21:59:29 UTC

committed in stream U5 build 42.23. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Although based on comment #7, perhaps we need to revisit this further....

Comment 11 Neil Horman 2006-11-08 15:14:54 UTC

No further movement is needed, really.  The patch has been comitted.  Nokia's
observation regarding the double unlock will cause a minor gripe from the lock
validator, but no real problems.  I'm going to clean that up shortly, but as far
as this bug is concerned, the fix is in place.

Comment 15 David Miller 2006-11-24 01:45:51 UTC

I think the double unlock, if it's there, is a real bug.

If we do the first unlock, another cpu grabs the lock, then we do that
second bogus unlock, this allows a third cpu into the critical section
erroneously which will corrupt data.

It's a bug, and it can very well cause corruption, so we should fix it.

Comment 16 Neil Horman 2006-11-27 13:54:08 UTC

Yeah, after considering it further, I agree.  I'll look at it and post a repo
patch if the double unlock exists.

Comment 17 Neil Horman 2006-11-27 14:18:38 UTC

Created attachment 142170 [details]
patch to fix double unlock

I've submitted this patch to fix the double unlock condition

Comment 18 RHEL Program Management 2006-11-28 03:34:45 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 19 RHEL Program Management 2006-11-28 03:34:48 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Jason Baron 2006-11-29 22:28:27 UTC

ok, i've integrated the patch from comment #17. A test kernel with this patch is
available from http://people.redhat.com/~jbaron/rhel4/

Comment 22 RHEL Program Management 2006-12-12 16:53:36 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 24 Jay Turner 2007-01-02 13:46:46 UTC

QE ack for RHEL4.5.

Comment 28 Mike Gahagan 2007-03-28 15:44:09 UTC

Both patches are in the -51 kernel and I set an ipv6 address with the network
down with no hang.

Comment 31 Red Hat Bugzilla 2007-05-08 03:54:36 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.