Bug 829016 - kernel 3.4.0-1 does not boot on xen domU, unable to handle kernel paging request
kernel 3.4.0-1 does not boot on xen domU, unable to handle kernel paging request
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
: 832249 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-05 15:04 EDT by vt
Modified: 2013-01-13 07:15 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-17 18:22:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
console log from ec2 (13.88 KB, text/plain)
2012-06-05 15:04 EDT, vt
no flags Details
another console log (from linode) (14.28 KB, text/plain)
2012-06-05 15:05 EDT, vt
no flags Details

  None (edit)
Description vt 2012-06-05 15:04:14 EDT
Created attachment 589619 [details]
console log from ec2

Description of problem:
The system no longer boots as a Xen DomU when running kernel 3.4.0-1 via pv-grub.

Version-Release number of selected component (if applicable):
3.4.0-1

How reproducible:
Always, tested on Linode (Xen 3.4.4) and Amazon EC2 t1.micro (Xen 3.4.3-kaos_t1micro)

Steps to Reproduce (using EC2):
1. Start Fedora from official EC2 image (e.g. ami-e269e5d2 on us-west-2)
2. Install kernel 3.4.0-1
3. Add new kernel to /boot/grub/menu.lst
4. Reboot

Actual results:
Oops "unable to handle kernel paging request" while loading init binary, with backtrace starting with atomic64_read_cx8/unmap_single_vma/generic_file_aio_read. Boot process does not continue.

Expected results:
System boots up normally.

Additional info:
Comment 1 vt 2012-06-05 15:05:15 EDT
Created attachment 589620 [details]
another console log (from linode)
Comment 2 Andrew Jones 2012-06-06 04:54:20 EDT
# git log --oneline v3.3..v3.4 arch/x86/lib/atomic64_cx8_32.S
cb8095b x86: atomic64 assembly improvements

Maybe a regression from ^^? I sent an email to Jan and xen-devel.
Comment 3 Andrew Jones 2012-06-06 07:58:48 EDT
(In reply to comment #2)
> # git log --oneline v3.3..v3.4 arch/x86/lib/atomic64_cx8_32.S
> cb8095b x86: atomic64 assembly improvements
> 
> Maybe a regression from ^^? I sent an email to Jan and xen-devel.

Jan says no chance that that commit is the problem. This problem should be bisectable though for anybody that can reproduce it.
Comment 4 vt 2012-06-06 21:43:41 EDT
mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch is the culprit
Comment 5 Andrew Jones 2012-06-07 06:38:12 EDT
(In reply to comment #4)
> mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.
> patch is the culprit

Ah drat! I didn't check to see what Fedora pulled in on top of 3.4. Had I done that I would have immediately suspected this patch instead. We've already encountered one problem with this patch for RHEL6 and fixed it. The patch F17 has, however, is the "fixed" version. Now the difference between RHEL6 and F17 though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests, but RHEL6 does not. So with this addition of this patch F17 is now calling atomic64_read() from pmd_none_or_trans_huge_or_clear_bad(). So now the question is, why is Xen senstive to this?
Comment 6 Andrew Jones 2012-06-07 08:02:58 EDT
This issue is being discussed upstream now as well.

http://permalink.gmane.org/gmane.comp.emulators.xen.devel/132522
Comment 7 Josh Boyer 2012-06-07 09:09:46 EDT
(In reply to comment #4)
> mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.
> patch is the culprit

That patch fixes a CVE, so I doubt we're going to drop it.  We'll watch upstream to see what Andrea comes up with and bring it back.
Comment 9 Andrew Jones 2012-06-07 11:45:46 EDT
I don't have an later hypervisor (Xen 4) setup for testing, but it'd be nice to know if guests work on them. If so, then maybe RHEL5 and EC2 need to look at Xen hypervisor c/s 17498 and/or others to patch their emulation of cmpxchg8b.
Comment 11 Andrew Jones 2012-06-08 02:34:06 EDT
Andrea posted a patch, http://www.spinics.net/lists/kernel/msg1353628.html
Comment 12 Josh Boyer 2012-06-11 08:34:55 EDT
I believe Andrew tested Andrea's patch successfully.  I'll get this in today.
Comment 13 Josh Boyer 2012-06-11 08:47:22 EDT
Applied to F17 and rawhide.
Comment 14 Fedora Update System 2012-06-15 02:24:31 EDT
kernel-3.4.2-4.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.4.2-4.fc17
Comment 15 Fedora Update System 2012-06-15 02:25:41 EDT
kernel-3.4.2-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.4.2-1.fc16
Comment 16 Andrew Jones 2012-06-15 02:52:18 EDT
*** Bug 832249 has been marked as a duplicate of this bug. ***
Comment 17 Fedora Update System 2012-06-15 19:49:11 EDT
Package kernel-3.4.2-4.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.4.2-4.fc17'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-9501/kernel-3.4.2-4.fc17
then log in and leave karma (feedback).
Comment 18 Fedora Update System 2012-06-17 18:22:50 EDT
kernel-3.4.2-4.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 19 Fedora Update System 2012-06-19 20:26:38 EDT
kernel-3.4.2-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.