219563 – network-attaching and detaching too fast from xenbus will crash domU

Bug 219563 - network-attaching and detaching too fast from xenbus will crash domU

Summary: network-attaching and detaching too fast from xenbus will crash domU

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Glauber Costa
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-12-13 22:59 UTC by Glauber Costa
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHBA-2007-0959
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-07 19:17:05 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Upstream proposal (1011 bytes, patch) 2006-12-14 16:50 UTC, Glauber Costa	no flags	Details \| Diff
upstream commit (1.32 KB, patch) 2006-12-14 19:38 UTC, Glauber Costa	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0959	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5 Update 1	2007-11-08 00:47:37 UTC

Description Glauber Costa 2006-12-13 22:59:43 UTC

Description of problem:

network-attaching and detaching too fast will lead to domU crashes. The
underlying network device hits a BUG.


How reproducible:
Always

Steps to Reproduce:
1. in dom0, 
   for i in $(seq 1000); 
   do 
     xm network-attach <domid>;  
     xm network-detach <domid> $i;
   done
2. wait some iterations
3. watch domU crash
  
Actual results:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at net/core/dev.c:3341
invalid opcode: 0000 [1] SMP
last sysfs file: /class/net/eth2/address
CPU 0
Modules linked in: xennet ipv6 dm_multipath parport_pc lp parport pcspkr
dm_snapshot dm_zero dm_mirror dm_mod xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 9, comm: xenwatch Not tainted 2.6.18-1.2857.8.2.fc6_glommerxen #1
RIP: e030:[<ffffffff803f6234>]  [<ffffffff803f6234>] unregister_netdevice+0x6e/0x215
RSP: e02b:ffff8800013f1de0  EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff880001a28000 RCX: ffff8800013f1c00
RDX: 0000000000000000 RSI: 0000000000000056 RDI: ffffffff80521de0
RBP: ffffffff80395148 R08: 000000000000000a R09: 0000000000000005
R10: ffffffff8046cdfc R11: ffffffff88174aa7 R12: ffff880000035c80
R13: 0000000000000000 R14: ffff880000035c70 R15: ffff880001a28000
FS:  00002aaaaaabddb0(0000) GS:ffffffff80593000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process xenwatch (pid: 9, threadinfo ffff8800013f0000, task ffff8800013ec7d0)
Stack:  ffff880000035c70  ffff880001a28000  ffffffff80395148  ffffffff803f63ec
 ffff88000212c540  ffffffff881752d7  ffff880002081400  ffff880001a28580
 0000000000000000  ffffffff8020b44c
Call Trace:
 [<ffffffff80395148>] xenwatch_thread+0x0/0x13e
 [<ffffffff803f63ec>] unregister_netdev+0x11/0x17
 [<ffffffff881752d7>] :xennet:backend_changed+0x830/0x865
 [<ffffffff8020b44c>] kfree+0x15/0xbc
 [<ffffffff80393867>] xenbus_read_driver_state+0x26/0x36
 [<ffffffff80395148>] xenwatch_thread+0x0/0x13e
 [<ffffffff80295d9f>] keventd_create_kthread+0x0/0x66
 [<ffffffff803945a1>] xenwatch_handle_callback+0x15/0x48
 [<ffffffff8039526d>] xenwatch_thread+0x125/0x13e
 [<ffffffff80295f53>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80295d9f>] keventd_create_kthread+0x0/0x66
 [<ffffffff8023321d>] kthread+0xfe/0x132
 [<ffffffff8025dec8>] child_rip+0xa/0x12
 [<ffffffff80295d9f>] keventd_create_kthread+0x0/0x66
 [<ffffffff8025fbf3>] thread_return+0x0/0xfb
 [<ffffffff8023311f>] kthread+0x0/0x132
 [<ffffffff8025debe>] child_rip+0x0/0x12


Code: 0f 0b 68 c5 0c 49 80 c2 0d 0d f6 83 98 00 00 00 01 74 08 48
RIP  [<ffffffff803f6234>] unregister_netdevice+0x6e/0x215
 RSP <ffff8800013f1de0>


Expected results:

attach, detach, attach, detach...

Comment 1 Glauber Costa 2006-12-13 23:02:33 UTC

After a lof of research, the reason for that is xenbus state changes being
delivered twice to frontend. He sees XenbusStateClosing twice, disconnect itself
twice, etc. In the second time, it's internal state is not valid anymore, and
the BUG() is hit. I have not yet determined the reason behind the double delivery.

Comment 2 Jay Turner 2006-12-14 02:47:02 UTC

This seems a little corner-case . . . in addition, without some leads to the
solution, I recommend this be deferred to 5.1.

Comment 3 Glauber Costa 2006-12-14 16:50:29 UTC

Created attachment 143657 [details]
Upstream proposal 

Both Keir an Ewan confirms that although undesirable, it is perfectly legal for
messages to be delivered twice. So, it becomes simpler than the path I was
taking (trying to figure out why the message was being delivered twice and
delivering it only once)

Comment 4 Glauber Costa 2006-12-14 19:38:01 UTC

Created attachment 143685 [details]
upstream commit 

This is what was commited upstream.

Comment 5 RHEL Program Management 2007-03-15 02:43:42 UTC

This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 6 Don Zickus 2007-04-23 21:58:45 UTC

in 2.6.18-16.el5

Comment 9 errata-xmlrpc 2007-11-07 19:17:05 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.