Description of problem: When attempting to live migrate a RHEL-4.7 PV kernel, you can run into the following OOPS: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at dev:3027 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: md5 ipv6 autofs4 sunrpc loop xennet dm_snapshot dm_zero dm_mi rror ext3 jbd dm_mod xenblk sd_mod scsi_mod Pid: 7, comm: xenwatch Not tainted 2.6.9-68.15.ELxenU RIP: e030:[<ffffffff8023e843>] <ffffffff8023e843>{free_netdev+30} RSP: e02b:ffffff8000b97da0 EFLAGS: 00010293 RAX: 0000000000000002 RBX: ffffff801e2d8380 RCX: 00000000000017af RDX: 00000000000017af RSI: 0000000000000000 RDI: ffffff801e2d8000 RBP: ffffff8001aeae00 R08: ffffff801fe76a08 R09: ffffff801e2d8380 R10: 0000000100000000 R11: 0000000000000001 R12: ffffffff80353100 R13: 00000000fffffffc R14: ffffff8000021d78 R15: ffffffff80144c5c FS: 0000002a95577880(0000) GS:ffffffff80420a80(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process xenwatch (pid: 7, threadinfo ffffff8000b96000, task ffffff8000b6a7f0) Stack: ffffffffa00989bb ffffffffa009cd28 ffffffff802240e3 ffffffffa009cd28 ffffff8001aeae48 ffffffffa009cd28 ffffffff8020eb94 ffffffffff5fd000 ffffffff803531a0 ffffff8001aeae48 Call Trace:<ffffffffa00989bb>{:xennet:netfront_remove+25} <ffffffff802240e3>{xen bus_dev_remove+44} <ffffffff8020eb94>{device_release_driver+83} <ffffffff8020ed58>{bus_remov e_device+162} <ffffffff8020dfd2>{device_del+104} <ffffffff8020dff8>{device_unregister+9 } <ffffffff802247dd>{dev_changed+149} <ffffffff80144c5c>{keventd_create_kth read+0} <ffffffff802235c2>{xenwatch_handle_callback+21} <ffffffff8022375b>{xenwat ch_thread+358} <ffffffff8012daa0>{autoremove_wake_function+0} <ffffffff8012daa0>{autorem ove_wake_function+0} <ffffffff802235f5>{xenwatch_thread+0} <ffffffff80144c33>{kthread+200} <ffffffff8010e056>{child_rip+8} <ffffffff80144c5c>{keventd_create_kthread +0} <ffffffff80144b6b>{kthread+0} <ffffffff8010e04e>{child_rip+0} What's interesting is that it doesn't happen all of the time. The right situation seems to be when you have very little hypervisor memory left (i.e. xm info -> free_memory is very low), and you attempt the live migrate. Interestingly enough, I believe this bug was already fixed in xen-3.1-testing.hg changeset 13100 by Glauber a long time ago; the fix is in RHEL-5, we just must have missed it for RHEL-4.
Created attachment 296256 [details] Fix for the crash mentioned in this bug This is the fix for the crash in this bug. This is upstream xen-3.1-testing c/s 13100, massaged to apply to RHEL-4.
Committed in 68.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html