Bug 1331113

Summary: kvm+THP/corruption in 4.5.x kernel
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Jones <drjones>
Component: kernel-aarch64Assignee: Andrew Jones <drjones>
kernel-aarch64 sub component: KVM QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: chayang, crobinso, dgilbert, extras-qa, gansalmon, itamar, jonathan, juzhang, kernel-maint, madhu.chinakonda, mchehab, virt-maint
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1331092 Environment:
Last Closed: 2016-06-13 17:35:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1331092    
Bug Blocks: 1174832    

Description Andrew Jones 2016-04-27 18:45:29 UTC
+++ This bug was initially created as a clone of Bug #1331092 +++

Description of problem:
Hi,
  There's a KVM+THP corruption in 4.5 (and current 4.6) kernels that i've triggered on Fedora while testing the QEMU Postcopy feature.  There's a chance it might trigger with the use of the more common Balloon feature as well, so it's probably worth fixing since the symptom is a random guest memory corruption

Fixed by Andrea Arcangeli's patch:
[PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled
  currently being discussed on lkml/linux-mm/qemu-devel

Version-Release number of selected component (if applicable):
4.5, 4.6 kernels

How reproducible:
ah well, that's rather complicated;  I can reproduce it 100% in a nest, and about 1/1000 runs of my test suite on a real host.  It disappears if you turn THP off or include the patch noted.

Steps to Reproduce:
I've seen reports that migrating a busy guest using postcopy will hang on 4.5;
but I'm about to post tests/postcopy-test for qemu.  I run it repeatedly and it fails the first time on a VM but randomly on hardware.

Actual results:
A migrated VM whose contents aren't quite the same as the source.

Expected results:
A nice happy migrated VM

Additional info:

Comment 1 Andrew Jones 2016-06-13 17:35:28 UTC
I see we don't have THP enabled on the RHELSA kernel. Closing as won't fix.