Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 448893

Summary: Kernel oops when I migrate a HVM guest that runs xenpv
Product: Red Hat Enterprise Linux 5 Reporter: Erwan Velu <erwan>
Component: xenpv-kmodAssignee: Don Dutile (Red Hat) <ddutile>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.1CC: riek, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-13 13:29:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 449772    
Attachments:
Description Flags
x86_64 xenpv package with possible suspend fix none

Description Erwan Velu 2008-05-29 11:55:28 UTC
Description of problem:
I run a HVM domU that have kmod-xenpv. The network is greatly faster that way,
cool. But when I migrate this VM, I have the following trace in the guest OS.

Note that live migration works fine.
Note2: that domU have noapic and nolapic set. When enabled, I have a similar
trace + some apic calls.

BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ>  [<ffffffff800b6475>] softlockup_tick+0xd5/0xe7
 [<ffffffff8009413e>] update_process_times+0x42/0x68
 [<ffffffff80075724>] smp_local_timer_interrupt+0x2c/0x61
 [<ffffffff8006d0ed>] main_timer_handler+0x25e/0x404
 [<ffffffff8004d152>] hrtimer_run_queues+0xd9/0x16d
 [<ffffffff8006d2a8>] timer_interrupt+0x15/0x2b
 [<ffffffff800107b1>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b67fd>] __do_IRQ+0xa4/0x105
 [<ffffffff8009c2df>] keventd_create_kthread+0x0/0x61
 [<ffffffff8006b3bd>] do_IRQ+0xe7/0xf5
 [<ffffffff8005c615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff880757f5>] :xen_platform_pci:__xen_suspend+0xd4/0x117
 [<ffffffff8009c2df>] keventd_create_kthread+0x0/0x61
 [<ffffffff8807547a>] :xen_platform_pci:xen_suspend+0x0/0x31
 [<ffffffff88075489>] :xen_platform_pci:xen_suspend+0xf/0x31
 [<ffffffff8807547a>] :xen_platform_pci:xen_suspend+0x0/0x31
 [<ffffffff800321fc>] kthread+0xfe/0x132
 [<ffffffff8005cfb1>] child_rip+0xa/0x11
 [<ffffffff8009c2df>] keventd_create_kthread+0x0/0x61
 [<ffffffff800320fe>] kthread+0x0/0x132
 [<ffffffff8005cfa7>] child_rip+0x0/0x11

register_blkdev: cannot get major 3 for ide
vbd vbd-768: 19 xlvbd_add at /local/domain/0/backend/vbd/9/768
register_blkdev: cannot get major 22 for ide
vbd vbd-5632: 19 xlvbd_add at /local/domain/0/backend/vbd/9/5632
register_blkdev: cannot get major 3 for ide
vbd vbd-832: 19 xlvbd_add at /local/domain/0/backend/vbd/9/832
netfront: device eth0 has copying receive path.

Version-Release number of selected component (if applicable):
guest kernel : 2.6.18-53.1.21.el5
xenpv 0.1-9.el5
host kernel : 2.6.18-53.1.19.el5xen

How reproducible:
100%

Steps to Reproduce:
1.create an HVM domU that uses xenpv
2.Migrate it to another node
3.the migrated vm shows a oops
  
Actual results:
Oopsing the domU

Expected results:
No oops at all ;)

Additional info:

Comment 2 Don Dutile (Red Hat) 2008-05-30 19:17:45 UTC
Erwan,

A bit confused:
In your first sentence, you state that a migrate generates soft lockup statements;
then you state that live migration works fine; then you state the migrated vm
shows an oops.

So, are multiple events occurring,? if so, what is the first event?
if not, please be clear what is or is not occurring.

Please provide details of dom0 (kernel rev), domU(same, kernel rev),
and xend logs from dom0's.

The migrate command you used would be nice to see as well.

Comment 3 Don Dutile (Red Hat) 2008-05-30 19:20:24 UTC
oops... I saw the kernel versions above, after reading more carefully.

I would still like the details of what occurred (migrate worked) or not
(guest oops'd).

- Don

Comment 4 Erwan Velu 2008-06-02 07:31:50 UTC
Hey Don,

To be more precise, If I do a "xm migrate mydomU mynewnode", domU's kernel shows
that trace. If I do a "xm migrate --live mydomU mynewnode", the oops rarely appears.

By rarely, I mean that most of time the "live" migrate works but sometimes this
oops appears. I think 80% of time, live migrates works whereas 100% of "xm
migrate" fails.

Comment 5 Don Dutile (Red Hat) 2008-06-19 22:04:02 UTC
Created attachment 309883 [details]
x86_64 xenpv package with possible suspend fix

Erwan,

Please give the attached x86_64 patch a try.
(I'm guessing you were testing x86_64).

if you need another rpm (i686, i686-PAE), let me know, i'll post it.

- Don

Comment 6 Don Dutile (Red Hat) 2008-06-19 22:04:58 UTC
Setting status to NEEDINFO so reporter can re-assign to me when reporting
test status back.



Comment 7 RHEL Program Management 2008-06-20 13:41:52 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Daniel Riek 2008-10-13 15:33:50 UTC
Bugzilla is not a support tool

The Bugzilla interface at bugzilla.redhat.com is used internally by Red Hat to process changes e.g. to Red Hat Enterprise Linux and related products, as well as by the Fedora Community to develop the Fedora Project.

It is publicly available and everyone with an email address is able to create an account, file bugs, comment on bugs she or he has access to. Not all bugs are public though and not all issues filed are handled in the same way: it makes a huge difference who is behind a bug.

Red Hat does monitor Bugzilla entries, and it does review them for inclusion in errata, etc.

Nevertheless, as noted on the login page, Bugzilla is not a Support tool. It is an Engineering tool. It is used by Red Hat Engineering to track issues and product changes, as well as to interact with Egineering partners and other parties external to Red Hat on a technical level.

So while all changes to Red Hat Enterprise Linux will at a point go through Bugzilla, this difference has a number of important consequences for general product issues filed directly through Bugzilla by external users without any privileged Engineering relationship:

    * Red Hat does NOT guarantee any response time or Service Level Agreement (SLA) for Bugzilla entries. - A review might happen immediately or after a time span of any length. The SLAs for Red Hat Enterprise Linux provided by Red Hat Support can be found at:  https://www.redhat.com/support/policy/sla/production/

    * Not all comments are publicly visible. - Red Hat Support provides customers with appropriate information excerpts and status updates from that. Red Hat does not commit to provide detailed explanations, or guidance in the context of Bugzilla. Therefore for example, Bugzilla entries might be closed as it seems without any further explanation, while Red Hat Support actually provides such explanation to customers filing through the regular support channels.

    * Issues coming through the regular support process, will always be prioritized higher than issues of similar impact and severity that are being filed directly through Bugzilla (unless the later get linked to customer support tickets, like this issue was). This means that they are more likely to be addressed and they are more likely to meet inclusion criteria consistent with the Red Hat Enterprise Linux life cycle policy: http://www.redhat.com/security/updates/errata/

    * Work-arounds and Hotfixes if possible and appropriate are provided by Red Hat Support and through the regular support process. - This means that even before a permanent fix is made available through RHN,customers who raised a high severity issue through Red Hat Support, are likely to receive an interim solution.

Red Hat provides common Bugzilla access in order provide efficient development community interaction and as much transparency as possible to our customers. Our Engineers are encouraged to provide non-customer specific and non-confidential information publicly as often as possible.

So while Red Hat considers issues directly entered into Bugzilla valuable feedback - may it be as comments to existing Bugzilla entries or by opening a new one; for customers encountering production issues, Bugzilla is not the right channel.

Therefore we ask our customers to file requests important for their production systems via our Support service. Only for those issues, we can ensure a consistent communication. Information about our production support process can be found at: http://www.redhat.com/support/process/

Bugzilla can always be used as a supportive channel to that communication.

Note: If your customer is participating in the Academic program and has chosen to run a Subscription without support, they consequently have no access to Red Hat Support and thus no SLA. If you feel that this is insufficient for your use case, you should consider contacting the Red Hat Education specialist as described at: http://www.redhat.com/solutions/education/academic/individual/

Comment 10 Daniel Riek 2008-10-13 15:34:22 UTC
Reporter, were you able to verify the patch?

Comment 11 Erwan Velu 2008-10-14 11:06:45 UTC
I'm sorry I hadn't time for that. I'm adding this on my planning and keep you in touch

Comment 12 RHEL Program Management 2008-10-27 18:22:18 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.