Bug 829016 - kernel 3.4.0-1 does not boot on xen domU, unable to handle kernel paging request
Summary: kernel 3.4.0-1 does not boot on xen domU, unable to handle kernel paging request
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 832249 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-05 19:04 UTC by vt
Modified: 2013-01-13 12:15 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-06-17 22:22:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
console log from ec2 (13.88 KB, text/plain)
2012-06-05 19:04 UTC, vt
no flags Details
another console log (from linode) (14.28 KB, text/plain)
2012-06-05 19:05 UTC, vt
no flags Details

Description vt 2012-06-05 19:04:14 UTC
Created attachment 589619 [details]
console log from ec2

Description of problem:
The system no longer boots as a Xen DomU when running kernel 3.4.0-1 via pv-grub.

Version-Release number of selected component (if applicable):
3.4.0-1

How reproducible:
Always, tested on Linode (Xen 3.4.4) and Amazon EC2 t1.micro (Xen 3.4.3-kaos_t1micro)

Steps to Reproduce (using EC2):
1. Start Fedora from official EC2 image (e.g. ami-e269e5d2 on us-west-2)
2. Install kernel 3.4.0-1
3. Add new kernel to /boot/grub/menu.lst
4. Reboot

Actual results:
Oops "unable to handle kernel paging request" while loading init binary, with backtrace starting with atomic64_read_cx8/unmap_single_vma/generic_file_aio_read. Boot process does not continue.

Expected results:
System boots up normally.

Additional info:

Comment 1 vt 2012-06-05 19:05:15 UTC
Created attachment 589620 [details]
another console log (from linode)

Comment 2 Andrew Jones 2012-06-06 08:54:20 UTC
# git log --oneline v3.3..v3.4 arch/x86/lib/atomic64_cx8_32.S
cb8095b x86: atomic64 assembly improvements

Maybe a regression from ^^? I sent an email to Jan and xen-devel.

Comment 3 Andrew Jones 2012-06-06 11:58:48 UTC
(In reply to comment #2)
> # git log --oneline v3.3..v3.4 arch/x86/lib/atomic64_cx8_32.S
> cb8095b x86: atomic64 assembly improvements
> 
> Maybe a regression from ^^? I sent an email to Jan and xen-devel.

Jan says no chance that that commit is the problem. This problem should be bisectable though for anybody that can reproduce it.

Comment 4 vt 2012-06-07 01:43:41 UTC
mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch is the culprit

Comment 5 Andrew Jones 2012-06-07 10:38:12 UTC
(In reply to comment #4)
> mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.
> patch is the culprit

Ah drat! I didn't check to see what Fedora pulled in on top of 3.4. Had I done that I would have immediately suspected this patch instead. We've already encountered one problem with this patch for RHEL6 and fixed it. The patch F17 has, however, is the "fixed" version. Now the difference between RHEL6 and F17 though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests, but RHEL6 does not. So with this addition of this patch F17 is now calling atomic64_read() from pmd_none_or_trans_huge_or_clear_bad(). So now the question is, why is Xen senstive to this?

Comment 6 Andrew Jones 2012-06-07 12:02:58 UTC
This issue is being discussed upstream now as well.

http://permalink.gmane.org/gmane.comp.emulators.xen.devel/132522

Comment 7 Josh Boyer 2012-06-07 13:09:46 UTC
(In reply to comment #4)
> mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.
> patch is the culprit

That patch fixes a CVE, so I doubt we're going to drop it.  We'll watch upstream to see what Andrea comes up with and bring it back.

Comment 9 Andrew Jones 2012-06-07 15:45:46 UTC
I don't have an later hypervisor (Xen 4) setup for testing, but it'd be nice to know if guests work on them. If so, then maybe RHEL5 and EC2 need to look at Xen hypervisor c/s 17498 and/or others to patch their emulation of cmpxchg8b.

Comment 11 Andrew Jones 2012-06-08 06:34:06 UTC
Andrea posted a patch, http://www.spinics.net/lists/kernel/msg1353628.html

Comment 12 Josh Boyer 2012-06-11 12:34:55 UTC
I believe Andrew tested Andrea's patch successfully.  I'll get this in today.

Comment 13 Josh Boyer 2012-06-11 12:47:22 UTC
Applied to F17 and rawhide.

Comment 14 Fedora Update System 2012-06-15 06:24:31 UTC
kernel-3.4.2-4.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.4.2-4.fc17

Comment 15 Fedora Update System 2012-06-15 06:25:41 UTC
kernel-3.4.2-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.4.2-1.fc16

Comment 16 Andrew Jones 2012-06-15 06:52:18 UTC
*** Bug 832249 has been marked as a duplicate of this bug. ***

Comment 17 Fedora Update System 2012-06-15 23:49:11 UTC
Package kernel-3.4.2-4.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.4.2-4.fc17'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-9501/kernel-3.4.2-4.fc17
then log in and leave karma (feedback).

Comment 18 Fedora Update System 2012-06-17 22:22:50 UTC
kernel-3.4.2-4.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 19 Fedora Update System 2012-06-20 00:26:38 UTC
kernel-3.4.2-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.