Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 697868

Summary:

xenfv: 32-bit guest hangs on boot

Product:

Red Hat Enterprise Linux 6

Reporter:

Andrew Jones <drjones>

Component:

kernel

Assignee:

Andrew Jones <drjones>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.1

CC:

borgan, dhoward, jwest, leiwang, lwoodman, mjenner, qwan, sforsber, syeghiay, tburke, xen-maint, yuzhou

Target Milestone:

Keywords:

Regression, TestOnly, ZStream

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-12-06 13:10:53 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

691310

Bug Blocks:

711530

Attachments:

Description	Flags
boot log rtl8139_1	none
boot log rtl8139_2	none
boot log netfront_1	none
boot log netfront_2	none

Description Andrew Jones 2011-04-19 13:54:02 UTC

rhel 6.1 32-bit hvm guests can hang on boot. I can reproduce it on my 64-bit host machine and there was a report of seeing a hang on 32-bit hosts as well. It doesn't reproduce every time, but I've already seen it more than once. It may be easier to reproduce with > 1 vcpu, although I'm not positive of that correlation. I was able to get a core

All tasks were stuck at c08223a3

$ addr2line -fie vmlinux c08223a3
arch_start_context_switch
/usr/src/debug/kernel-2.6.32-131.el6/linux-2.6.32-131.el6.i686/arch/x86/include/asm/paravirt.h:725
context_switch
/usr/src/debug/kernel-2.6.32-131.el6/linux-2.6.32-131.el6.i686/kernel/sched.c:2931
schedule
/usr/src/debug/kernel-2.6.32-131.el6/linux-2.6.32-131.el6.i686/kernel/sched.c:5821

Comment 2 Yuyu Zhou 2011-04-20 02:16:31 UTC

Description of problem:
When booting a RHEL6.1 32bit HVM guest with nic (rtl8139|netfront), it hang randomly on both 32bits host

Version-Release number of selected component (if applicable):
RHEL-Server-6.1-20110413.1 32bit hvm guest(kernel-2.6.32-131.0.1.el6.i686)
xen-3.0.3-128.el5
kernel-xen-2.6.18-257.el5

How reproducible:
Sometimes

Steps to Reproduce:
1. add "xen_emul_unplug=never" in HVM guest kernel line(not neccessary for netfront nic)

2. boot a pre-installed 32bit HVM guest with nic rtl8139|netfront
Example:
vif = [ "type=ioemu,mac=06:16:36:63:32:a1,bridge=xenbr0,script=vif-bridge,model=netfront" ]
vif = [ "type=netfront,mac=00:01:36:63:23:b3,bridge=xenbr0,script=vif-bridge" ] 

Actual results:
The guest hang randomly. 

Expected results:
The guest should boot up successfully.

Additional info:
1. RhEL6.1-32-HVM-20110406.0 32bit HVM guest works well on 32bit host for both netfront and rtl8139 (guest kernel: kernel-2.6.32-130.el6.i686)

Comment 3 Yuyu Zhou 2011-04-20 02:18:30 UTC

Description of problem:
When booting a RHEL6.1 32bit HVM guest with nic (rtl8139|netfront), it hang randomly on 32bits host

Version-Release number of selected component (if applicable):
RHEL-Server-6.1-20110413.1 32bit hvm guest(kernel-2.6.32-131.0.1.el6.i686)
xen-3.0.3-128.el5
kernel-xen-2.6.18-257.el5

How reproducible:
Sometimes

Steps to Reproduce:
1. add "xen_emul_unplug=never" in HVM guest kernel line(not necessary for netfront nic)

2. boot a pre-installed 32bit HVM guest with nic rtl8139|netfront
Example:
vif = [ "type=ioemu,mac=06:16:36:63:32:a1,bridge=xenbr0,script=vif-bridge,model=netfront" ]
vif = [ "type=netfront,mac=00:01:36:63:23:b3,bridge=xenbr0,script=vif-bridge" ] 

Actual results:
The guest hang randomly. 

Expected results:
The guest should boot up successfully.

Additional info:
RhEL6.1-32-HVM-20110406.0 32bit HVM guest works well on 32bit host for both netfront and rtl8139 (guest kernel: kernel-2.6.32-130.el6.i686)

Comment 4 Yuyu Zhou 2011-04-20 02:22:25 UTC

Created attachment 493345 [details]
boot log rtl8139_1

boot log when 32bit HVM guest with nic rtl8139 hangs.(version 1)

Comment 5 Yuyu Zhou 2011-04-20 02:23:12 UTC

Created attachment 493346 [details]
boot log rtl8139_2

boot log when 32bit HVM guest with nic rtl8139 hangs.(version 2)

Comment 6 Yuyu Zhou 2011-04-20 02:24:23 UTC

Created attachment 493347 [details]
boot log netfront_1

boot log when 32bit HVM guest with nic netfront hangs.(version 1)

Comment 7 Yuyu Zhou 2011-04-20 02:25:21 UTC

Created attachment 493348 [details]
boot log netfront_2

boot log when 32bit HVM guest with nic netfront hangs.(version 2)

Comment 9 Andrew Jones 2011-04-21 12:23:24 UTC

Here's an update. I'm suspicious of this patch

commit 7e5a20fa4abbd109130921bf44a96b8eb050719e
Author: Andrea Arcangeli <aarcange>
Date:   Mon Feb 28 22:34:13 2011 -0500

    [mm] fix pgd_lock deadlock

I haven't been able to reproduce the issue consistently enough though to be able to affirmatively state that before this patch was integrated there was no bug, and after there was. I'm continuing to experiment and lersek is poking at the core dumps I've captured.

Comment 10 Andrew Jones 2011-04-21 13:25:23 UTC

Sigh... I've reproduced the hang even with the patch pointed to in comment 9 reverted from my own build.

Comment 11 Andrew Jones 2011-04-21 14:54:54 UTC

I got burned by a change in our git tree. I wasn't looking at the right branch while guessing suspect patches. As comment 2 shows, this issue occurs on kernel-2.6.32-131.0.1.el6.i686, which has some different patches than -131. Once I ran git-log on the right tags I immediately saw a very suspect patch

commit c57d7e1a2e2c96d84b3483727fdfcab4d4c0b566
Author: Larry Woodman <lwoodman>
Date:   Fri Apr 1 16:00:30 2011 -0400

    [mm] pdpte registers are not flushed when PGD entry is changed in x86 PAE mode

With this patch reverted I was able to complete 26 consecutive, successful reboots.

Comment 12 RHEL Program Management 2011-04-26 11:19:39 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 13 Yuyu Zhou 2011-04-27 08:14:16 UTC

Since this bug depends on bug 691310 which is POST now, we will wait for bug 691310 fixed and then verify this bug.

Comment 14 Andrew Jones 2011-04-28 13:47:08 UTC

The blocker aspect of this bug was because it regressed xen hvm guests. The solution was to revert the patch pointed to in comment 11. That patch will be modified to consider xen hvm guests before being brought back in, however it's been moved to 6.2/6.1.z. Therefore I'm changing the flags of this bug to reflect that, as this bug now completely depends on that bug and is testonly.

Comment 16 Yuyu Zhou 2011-05-03 07:18:12 UTC

Verified the bug with RHEL-Server-6.1-20110427.0 32bit hvm guest(kernel-2.6.32-131.0.10.el6.i686), xen-3.0.3-130.el5, kernel-xen-2.6.18-258.el5 on both 32bit host and 64bit host.
The guest didn't hang on boot with "vcpus=4" for 20 times consecutive reboot.

Comment 24 errata-xmlrpc 2011-12-06 13:10:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html