Bug 435351

Summary: [RHEL4.7]: PV kernel can OOPs during live migrate
Product: Red Hat Enterprise Linux 4 Reporter: Chris Lalancette <clalance>
Component: kernel-xenAssignee: Chris Lalancette <clalance>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.7CC: sputhenp, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:27:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix for the crash mentioned in this bug none

Description Chris Lalancette 2008-02-28 20:05:07 UTC
Description of problem:
When attempting to live migrate a RHEL-4.7 PV kernel, you can run into the
following OOPS:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at dev:3027
invalid operand: 0000 [1] SMP 
CPU 0 
Modules linked in: md5 ipv6 autofs4 sunrpc loop xennet dm_snapshot dm_zero dm_mi
rror ext3 jbd dm_mod xenblk sd_mod scsi_mod
Pid: 7, comm: xenwatch Not tainted 2.6.9-68.15.ELxenU
RIP: e030:[<ffffffff8023e843>] <ffffffff8023e843>{free_netdev+30}
RSP: e02b:ffffff8000b97da0  EFLAGS: 00010293
RAX: 0000000000000002 RBX: ffffff801e2d8380 RCX: 00000000000017af
RDX: 00000000000017af RSI: 0000000000000000 RDI: ffffff801e2d8000
RBP: ffffff8001aeae00 R08: ffffff801fe76a08 R09: ffffff801e2d8380
R10: 0000000100000000 R11: 0000000000000001 R12: ffffffff80353100
R13: 00000000fffffffc R14: ffffff8000021d78 R15: ffffffff80144c5c
FS:  0000002a95577880(0000) GS:ffffffff80420a80(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process xenwatch (pid: 7, threadinfo ffffff8000b96000, task ffffff8000b6a7f0)
Stack: ffffffffa00989bb ffffffffa009cd28 ffffffff802240e3 ffffffffa009cd28 
       ffffff8001aeae48 ffffffffa009cd28 ffffffff8020eb94 ffffffffff5fd000 
       ffffffff803531a0 ffffff8001aeae48 
Call Trace:<ffffffffa00989bb>{:xennet:netfront_remove+25} <ffffffff802240e3>{xen
bus_dev_remove+44} 
       <ffffffff8020eb94>{device_release_driver+83} <ffffffff8020ed58>{bus_remov
e_device+162} 
       <ffffffff8020dfd2>{device_del+104} <ffffffff8020dff8>{device_unregister+9
} 
       <ffffffff802247dd>{dev_changed+149} <ffffffff80144c5c>{keventd_create_kth
read+0} 
       <ffffffff802235c2>{xenwatch_handle_callback+21} <ffffffff8022375b>{xenwat
ch_thread+358} 
       <ffffffff8012daa0>{autoremove_wake_function+0} <ffffffff8012daa0>{autorem
ove_wake_function+0} 
       <ffffffff802235f5>{xenwatch_thread+0} <ffffffff80144c33>{kthread+200} 
       <ffffffff8010e056>{child_rip+8} <ffffffff80144c5c>{keventd_create_kthread
+0} 
       <ffffffff80144b6b>{kthread+0} <ffffffff8010e04e>{child_rip+0} 

What's interesting is that it doesn't happen all of the time.  The right
situation seems to be when you have very little hypervisor memory left (i.e. xm
info -> free_memory is very low), and you attempt the live migrate. 
Interestingly enough, I believe this bug was already fixed in xen-3.1-testing.hg
changeset 13100 by Glauber a long time ago; the fix is in RHEL-5, we just must
have missed it for RHEL-4.

Comment 1 Chris Lalancette 2008-02-28 20:34:08 UTC
Created attachment 296256 [details]
Fix for the crash mentioned in this bug

This is the fix for the crash in this bug.  This is upstream xen-3.1-testing
c/s 13100, massaged to apply to RHEL-4.

Comment 4 Vivek Goyal 2008-03-20 14:10:01 UTC
Committed in 68.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 7 errata-xmlrpc 2008-07-24 19:27:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html