Bug 697868
Summary: | xenfv: 32-bit guest hangs on boot | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Andrew Jones <drjones> | ||||||||||
Component: | kernel | Assignee: | Andrew Jones <drjones> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 6.1 | CC: | borgan, dhoward, jwest, leiwang, lwoodman, mjenner, qwan, sforsber, syeghiay, tburke, xen-maint, yuzhou | ||||||||||
Target Milestone: | rc | Keywords: | Regression, TestOnly, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | i686 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-12-06 13:10:53 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 691310 | ||||||||||||
Bug Blocks: | 711530 | ||||||||||||
Attachments: |
|
Description
Andrew Jones
2011-04-19 13:54:02 UTC
Description of problem: When booting a RHEL6.1 32bit HVM guest with nic (rtl8139|netfront), it hang randomly on both 32bits host Version-Release number of selected component (if applicable): RHEL-Server-6.1-20110413.1 32bit hvm guest(kernel-2.6.32-131.0.1.el6.i686) xen-3.0.3-128.el5 kernel-xen-2.6.18-257.el5 How reproducible: Sometimes Steps to Reproduce: 1. add "xen_emul_unplug=never" in HVM guest kernel line(not neccessary for netfront nic) 2. boot a pre-installed 32bit HVM guest with nic rtl8139|netfront Example: vif = [ "type=ioemu,mac=06:16:36:63:32:a1,bridge=xenbr0,script=vif-bridge,model=netfront" ] vif = [ "type=netfront,mac=00:01:36:63:23:b3,bridge=xenbr0,script=vif-bridge" ] Actual results: The guest hang randomly. Expected results: The guest should boot up successfully. Additional info: 1. RhEL6.1-32-HVM-20110406.0 32bit HVM guest works well on 32bit host for both netfront and rtl8139 (guest kernel: kernel-2.6.32-130.el6.i686) Description of problem: When booting a RHEL6.1 32bit HVM guest with nic (rtl8139|netfront), it hang randomly on 32bits host Version-Release number of selected component (if applicable): RHEL-Server-6.1-20110413.1 32bit hvm guest(kernel-2.6.32-131.0.1.el6.i686) xen-3.0.3-128.el5 kernel-xen-2.6.18-257.el5 How reproducible: Sometimes Steps to Reproduce: 1. add "xen_emul_unplug=never" in HVM guest kernel line(not necessary for netfront nic) 2. boot a pre-installed 32bit HVM guest with nic rtl8139|netfront Example: vif = [ "type=ioemu,mac=06:16:36:63:32:a1,bridge=xenbr0,script=vif-bridge,model=netfront" ] vif = [ "type=netfront,mac=00:01:36:63:23:b3,bridge=xenbr0,script=vif-bridge" ] Actual results: The guest hang randomly. Expected results: The guest should boot up successfully. Additional info: RhEL6.1-32-HVM-20110406.0 32bit HVM guest works well on 32bit host for both netfront and rtl8139 (guest kernel: kernel-2.6.32-130.el6.i686) Created attachment 493345 [details]
boot log rtl8139_1
boot log when 32bit HVM guest with nic rtl8139 hangs.(version 1)
Created attachment 493346 [details]
boot log rtl8139_2
boot log when 32bit HVM guest with nic rtl8139 hangs.(version 2)
Created attachment 493347 [details]
boot log netfront_1
boot log when 32bit HVM guest with nic netfront hangs.(version 1)
Created attachment 493348 [details]
boot log netfront_2
boot log when 32bit HVM guest with nic netfront hangs.(version 2)
Here's an update. I'm suspicious of this patch commit 7e5a20fa4abbd109130921bf44a96b8eb050719e Author: Andrea Arcangeli <aarcange> Date: Mon Feb 28 22:34:13 2011 -0500 [mm] fix pgd_lock deadlock I haven't been able to reproduce the issue consistently enough though to be able to affirmatively state that before this patch was integrated there was no bug, and after there was. I'm continuing to experiment and lersek is poking at the core dumps I've captured. Sigh... I've reproduced the hang even with the patch pointed to in comment 9 reverted from my own build. I got burned by a change in our git tree. I wasn't looking at the right branch while guessing suspect patches. As comment 2 shows, this issue occurs on kernel-2.6.32-131.0.1.el6.i686, which has some different patches than -131. Once I ran git-log on the right tags I immediately saw a very suspect patch commit c57d7e1a2e2c96d84b3483727fdfcab4d4c0b566 Author: Larry Woodman <lwoodman> Date: Fri Apr 1 16:00:30 2011 -0400 [mm] pdpte registers are not flushed when PGD entry is changed in x86 PAE mode With this patch reverted I was able to complete 26 consecutive, successful reboots. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Since this bug depends on bug 691310 which is POST now, we will wait for bug 691310 fixed and then verify this bug. The blocker aspect of this bug was because it regressed xen hvm guests. The solution was to revert the patch pointed to in comment 11. That patch will be modified to consider xen hvm guests before being brought back in, however it's been moved to 6.2/6.1.z. Therefore I'm changing the flags of this bug to reflect that, as this bug now completely depends on that bug and is testonly. Verified the bug with RHEL-Server-6.1-20110427.0 32bit hvm guest(kernel-2.6.32-131.0.10.el6.i686), xen-3.0.3-130.el5, kernel-xen-2.6.18-258.el5 on both 32bit host and 64bit host. The guest didn't hang on boot with "vcpus=4" for 20 times consecutive reboot. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html |