Created attachment 589619 [details] console log from ec2 Description of problem: The system no longer boots as a Xen DomU when running kernel 3.4.0-1 via pv-grub. Version-Release number of selected component (if applicable): 3.4.0-1 How reproducible: Always, tested on Linode (Xen 3.4.4) and Amazon EC2 t1.micro (Xen 3.4.3-kaos_t1micro) Steps to Reproduce (using EC2): 1. Start Fedora from official EC2 image (e.g. ami-e269e5d2 on us-west-2) 2. Install kernel 3.4.0-1 3. Add new kernel to /boot/grub/menu.lst 4. Reboot Actual results: Oops "unable to handle kernel paging request" while loading init binary, with backtrace starting with atomic64_read_cx8/unmap_single_vma/generic_file_aio_read. Boot process does not continue. Expected results: System boots up normally. Additional info:
Created attachment 589620 [details] another console log (from linode)
# git log --oneline v3.3..v3.4 arch/x86/lib/atomic64_cx8_32.S cb8095b x86: atomic64 assembly improvements Maybe a regression from ^^? I sent an email to Jan and xen-devel.
(In reply to comment #2) > # git log --oneline v3.3..v3.4 arch/x86/lib/atomic64_cx8_32.S > cb8095b x86: atomic64 assembly improvements > > Maybe a regression from ^^? I sent an email to Jan and xen-devel. Jan says no chance that that commit is the problem. This problem should be bisectable though for anybody that can reproduce it.
mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch is the culprit
(In reply to comment #4) > mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition. > patch is the culprit Ah drat! I didn't check to see what Fedora pulled in on top of 3.4. Had I done that I would have immediately suspected this patch instead. We've already encountered one problem with this patch for RHEL6 and fixed it. The patch F17 has, however, is the "fixed" version. Now the difference between RHEL6 and F17 though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests, but RHEL6 does not. So with this addition of this patch F17 is now calling atomic64_read() from pmd_none_or_trans_huge_or_clear_bad(). So now the question is, why is Xen senstive to this?
This issue is being discussed upstream now as well. http://permalink.gmane.org/gmane.comp.emulators.xen.devel/132522
(In reply to comment #4) > mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition. > patch is the culprit That patch fixes a CVE, so I doubt we're going to drop it. We'll watch upstream to see what Andrea comes up with and bring it back.
I don't have an later hypervisor (Xen 4) setup for testing, but it'd be nice to know if guests work on them. If so, then maybe RHEL5 and EC2 need to look at Xen hypervisor c/s 17498 and/or others to patch their emulation of cmpxchg8b.
Andrea posted a patch, http://www.spinics.net/lists/kernel/msg1353628.html
I believe Andrew tested Andrea's patch successfully. I'll get this in today.
Applied to F17 and rawhide.
kernel-3.4.2-4.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/kernel-3.4.2-4.fc17
kernel-3.4.2-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.4.2-1.fc16
*** Bug 832249 has been marked as a duplicate of this bug. ***
Package kernel-3.4.2-4.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.4.2-4.fc17' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-9501/kernel-3.4.2-4.fc17 then log in and leave karma (feedback).
kernel-3.4.2-4.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.4.2-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.