Bug 216249 - rmmod xennet crashes domU
rmmod xennet crashes domU
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Rik van Riel
:
Depends On: 213142
Blocks: 197865 222082
  Show dependency treegraph
 
Reported: 2006-11-17 18:41 EST by Chris Lalancette
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-26 15:44:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Chris Lalancette 2006-11-17 18:41:29 EST
+++ This bug was initially created as a clone of Bug #213142 +++

Description of problem:

rmmod xennet crashes domU


Version-Release number of selected component (if applicable):
kernel 2.6.18-1.2798.fc6xen 


How reproducible:
rmmod xennet

WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
------------[ cut here ]------------
kernel BUG at net/core/dev.c:3298!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /class/net/lo/type
Modules linked in: ipt_LOG xt_limit xt_state iptable_filter ip_conntrack_ftp
ip_conntrack nfnetlink ip_tables x_tables xennet ipv6 dm_mirror dm_mod lp
parport_pc parport pcspkr xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0061:[<c05acc81>]    Not tainted VLI
EFLAGS: 00010293   (2.6.18-1.2798.fc6xen #1) 
EIP is at free_netdev+0x1e/0x3b
eax: 00000001   ebx: d6028400   ecx: ffffffff   edx: d6028000
esi: c0acd200   edi: d90ad524   ebp: cb796000   esp: cb796f10
ds: 007b   es: 007b   ss: 0069
Process rmmod (pid: 2193, ti=cb796000 task=c4282dd0 task.ti=cb796000)
Stack: d90a739f d90ad500 c054c4d6 c0acd2cc c0acd224 c05416a4 c0acd224 c0687728 
       d90ad524 c0541999 d90ad524 00000000 c0687550 c0540e3b d90ad524 00000020 
       00000000 c0541aac d90ad700 c0436a25 6e6e6578 d2007465 00000000 d2ffd754 
Call Trace:
 [<d90a739f>] netfront_remove+0x16/0x1a [xennet]
 [<c054c4d6>] xenbus_dev_remove+0x27/0x38
 [<c05416a4>] __device_release_driver+0x60/0x78
 [<c0541999>] driver_detach+0x99/0xc9
 [<c0540e3b>] bus_remove_driver+0x5a/0x78
 [<c0541aac>] driver_unregister+0x8/0x13
 [<c0436a25>] sys_delete_module+0x192/0x1b9
 [<c0404ea7>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb

Leftover inexact backtrace:

 =======================
Code: 97 e6 ff e8 34 a8 05 00 e9 19 f7 e7 ff 89 c2 8b 80 94 02 00 00 85 c0 75 0d
0f b7 42 64 29 c2 89 d0 e9 d7 50 eb ff 83 f8 03 74 08 <0f> 0b e2 0c 37 c6 64 c0
c7 82 94 02 00 00 04 00 00 00 8d 82 f0 
EIP: [<c05acc81>] free_netdev+0x1e/0x3b SS:ESP 0069:cb796f10

-- Additional comment from herbert.xu@redhat.com on 2006-10-30 21:12 EST --
This is a deficiency in the grant table mechanism where it can't currently wait
on  live entries for their destruction.  If upstream's plan to use copying
instead of flipping gets extended to the domU=>dom0 direction this should no
longer be an issue.

The original bug report is for fc6, but I see the exact same problem on the
RHEL-5 kernel (2.6.18-1.2747).  I cloned the bug for tracking purposes, and I
CC'ed both Herbert and Glauber, since there were two separate bugs opened
(213142, and 213147).
Comment 1 Glauber Costa 2006-11-17 19:05:03 EST
Not exactly. The grant table deficiency is responsible for the warnings, but
they're _not_ the reason for the bug to be triggered. When this is fixed
upstream, we most probably will be able to get rid of the part of the patch that
waits for it to be released, and (maybe, depending on how it goes) the backend
change. 

However, the extra mechanism for unload has _nothing_ to do at all with grant
tables. 
Comment 2 Herbert Xu 2006-11-17 19:16:25 EST
Sorry, but my point was that there is no point in attempting to correct crashes
caused by rmmod xennet until such a time when you can wait for the completion of
grant table entries.  IMHO having it silently leak grant table entries is worse
than failing the rmmod.
Comment 3 Glauber Costa 2006-11-17 19:36:42 EST
It does not silently leaks gtes. 
The proposed code waits some time for completion (if the backend releases it, so
it does not leak at all). _If_ it does not, then we get the warning. It does not
supress or ignore any messages.

The current state of it, is to leak the grant tables _and_ BUG() the system. I
cannot see how "just" leaking is any worse than failing rmmod (If it were
silent, I would 100 % agree with you, tough).
Comment 4 Herbert Xu 2006-11-17 19:52:19 EST
Well having a warning show up in dmesg which most people never look at is silent
enough for me.  BTW, your patch only waits on rx ring references when skb data
references are much more likely to be a problem in the real world.
Comment 5 Glauber Costa 2006-11-24 09:18:15 EST
most probably the best solution for that is to set CONFIG_XEN_NETDEV_FRONTEND to
y instead of m , avoiding the module to be unloaded. Obviously, besides telling
people why was it done ;-)
Comment 8 RHEL Product and Program Management 2006-11-27 21:45:14 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 9 RHEL Product and Program Management 2006-11-27 21:45:19 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 11 Don Zickus 2006-12-04 13:38:18 EST
in 2.6.18-1.2789.el5
Comment 14 Jay Turner 2007-01-10 22:19:11 EST
QE ack for RHEL5.
Comment 15 Jay Turner 2007-01-26 15:44:28 EST
2.6.18-7.el5 included in 20070125.0.

Note You need to log in before you can comment on or make changes to this bug.