Bug 1331092

Summary: kvm+THP/corruption in 4.5.x kernel
Product: [Fedora] Fedora Reporter: Dr. David Alan Gilbert <dgilbert>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: crobinso, gansalmon, itamar, jforbes, jonathan, kernel-maint, madhu.chinakonda, mchehab
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.5.3-300.fc24 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1331113 (view as bug list) Environment:
Last Closed: 2016-05-08 10:29:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1331113    

Description Dr. David Alan Gilbert 2016-04-27 17:11:51 UTC
Description of problem:
Hi,
  There's a KVM+THP corruption in 4.5 (and current 4.6) kernels that i've triggered on Fedora while testing the QEMU Postcopy feature.  There's a chance it might trigger with the use of the more common Balloon feature as well, so it's probably worth fixing since the symptom is a random guest memory corruption

Fixed by Andrea Arcangeli's patch:
[PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled
  currently being discussed on lkml/linux-mm/qemu-devel

Version-Release number of selected component (if applicable):
4.5, 4.6 kernels

How reproducible:
ah well, that's rather complicated;  I can reproduce it 100% in a nest, and about 1/1000 runs of my test suite on a real host.  It disappears if you turn THP off or include the patch noted.

Steps to Reproduce:
I've seen reports that migrating a busy guest using postcopy will hang on 4.5;
but I'm about to post tests/postcopy-test for qemu.  I run it repeatedly and it fails the first time on a VM but randomly on hardware.

Actual results:
A migrated VM whose contents aren't quite the same as the source.

Expected results:
A nice happy migrated VM

Additional info:

Comment 1 Justin M. Forbes 2016-04-28 19:28:07 UTC
This patch has been added to rawhide and f24 kernels. It should make the next build.

Comment 2 Fedora Update System 2016-05-05 12:15:47 UTC
kernel-4.5.3-300.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-4ce97823af

Comment 3 Dr. David Alan Gilbert 2016-05-05 12:31:19 UTC
It's worth keeping an eye out for other THP fixes going in; I know there's at least one other one being discussed effecting VFIO.

Dave

Comment 4 Fedora Update System 2016-05-06 11:28:27 UTC
kernel-4.5.3-300.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-4ce97823af

Comment 5 Fedora Update System 2016-05-08 10:28:48 UTC
kernel-4.5.3-300.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 6 Dr. David Alan Gilbert 2016-05-09 09:21:02 UTC
I can confirm 4.5.3-300.fc24 fixes it for me.