Bug 435351 - [RHEL4.7]: PV kernel can OOPs during live migrate
Summary: [RHEL4.7]: PV kernel can OOPs during live migrate
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen
Version: 4.7
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Chris Lalancette
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-02-28 20:05 UTC by Chris Lalancette
Modified: 2008-07-24 19:27 UTC (History)
2 users (show)

Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 19:27:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix for the crash mentioned in this bug (3.65 KB, patch)
2008-02-28 20:34 UTC, Chris Lalancette
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0665 0 normal SHIPPED_LIVE Moderate: Updated kernel packages for Red Hat Enterprise Linux 4.7 2008-07-24 16:41:06 UTC

Description Chris Lalancette 2008-02-28 20:05:07 UTC
Description of problem:
When attempting to live migrate a RHEL-4.7 PV kernel, you can run into the
following OOPS:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at dev:3027
invalid operand: 0000 [1] SMP 
CPU 0 
Modules linked in: md5 ipv6 autofs4 sunrpc loop xennet dm_snapshot dm_zero dm_mi
rror ext3 jbd dm_mod xenblk sd_mod scsi_mod
Pid: 7, comm: xenwatch Not tainted 2.6.9-68.15.ELxenU
RIP: e030:[<ffffffff8023e843>] <ffffffff8023e843>{free_netdev+30}
RSP: e02b:ffffff8000b97da0  EFLAGS: 00010293
RAX: 0000000000000002 RBX: ffffff801e2d8380 RCX: 00000000000017af
RDX: 00000000000017af RSI: 0000000000000000 RDI: ffffff801e2d8000
RBP: ffffff8001aeae00 R08: ffffff801fe76a08 R09: ffffff801e2d8380
R10: 0000000100000000 R11: 0000000000000001 R12: ffffffff80353100
R13: 00000000fffffffc R14: ffffff8000021d78 R15: ffffffff80144c5c
FS:  0000002a95577880(0000) GS:ffffffff80420a80(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process xenwatch (pid: 7, threadinfo ffffff8000b96000, task ffffff8000b6a7f0)
Stack: ffffffffa00989bb ffffffffa009cd28 ffffffff802240e3 ffffffffa009cd28 
       ffffff8001aeae48 ffffffffa009cd28 ffffffff8020eb94 ffffffffff5fd000 
       ffffffff803531a0 ffffff8001aeae48 
Call Trace:<ffffffffa00989bb>{:xennet:netfront_remove+25} <ffffffff802240e3>{xen
bus_dev_remove+44} 
       <ffffffff8020eb94>{device_release_driver+83} <ffffffff8020ed58>{bus_remov
e_device+162} 
       <ffffffff8020dfd2>{device_del+104} <ffffffff8020dff8>{device_unregister+9
} 
       <ffffffff802247dd>{dev_changed+149} <ffffffff80144c5c>{keventd_create_kth
read+0} 
       <ffffffff802235c2>{xenwatch_handle_callback+21} <ffffffff8022375b>{xenwat
ch_thread+358} 
       <ffffffff8012daa0>{autoremove_wake_function+0} <ffffffff8012daa0>{autorem
ove_wake_function+0} 
       <ffffffff802235f5>{xenwatch_thread+0} <ffffffff80144c33>{kthread+200} 
       <ffffffff8010e056>{child_rip+8} <ffffffff80144c5c>{keventd_create_kthread
+0} 
       <ffffffff80144b6b>{kthread+0} <ffffffff8010e04e>{child_rip+0} 

What's interesting is that it doesn't happen all of the time.  The right
situation seems to be when you have very little hypervisor memory left (i.e. xm
info -> free_memory is very low), and you attempt the live migrate. 
Interestingly enough, I believe this bug was already fixed in xen-3.1-testing.hg
changeset 13100 by Glauber a long time ago; the fix is in RHEL-5, we just must
have missed it for RHEL-4.

Comment 1 Chris Lalancette 2008-02-28 20:34:08 UTC
Created attachment 296256 [details]
Fix for the crash mentioned in this bug

This is the fix for the crash in this bug.  This is upstream xen-3.1-testing
c/s 13100, massaged to apply to RHEL-4.

Comment 4 Vivek Goyal 2008-03-20 14:10:01 UTC
Committed in 68.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 7 errata-xmlrpc 2008-07-24 19:27:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html


Note You need to log in before you can comment on or make changes to this bug.