Bug 854584 - mmu_notifier: updates for RHEL6.4
mmu_notifier: updates for RHEL6.4
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
All Linux
unspecified Severity low
: rc
: ---
Assigned To: Andrea Arcangeli
Madper Xie
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-05 07:36 EDT by Andrea Arcangeli
Modified: 2014-06-18 04:52 EDT (History)
9 users (show)

See Also:
Fixed In Version: kernel-2.6.32-315.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 01:34:13 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrea Arcangeli 2012-09-05 07:36:58 EDT
Description of problem: two race conditions were discovered in the mmu notifier. The implications for KVM are zero or close to zero but other mmu notifier users might be affected.

The two SMP race conditions are in:

1) mmu_notifier_release

2) set_pte_at_notify


Version-Release number of selected component (if applicable): all RHEL6 releases. RHEL5 might be affected as well but given the mostly theoretical nature of the problem in KVM/KSM context, it's not a concern in RHEL5.

How reproducible: one of the two races is reproducible but not with KVM, and the reproducer isn't public. It might require special hardware. In short it's not reproducible.

Actual results:

Race 1): the secondary MMU mappings are not teared down after unregistering the mmu notifier

Race 2): set_pte_at_notify is updating the primary MMU pte before the secondary MMU pte, leading to the guest to see writes with a slight delay during copy-on-write faults


Expected results:

Race 1): the secondary MMU mappings should be teared down before unregistering the mmu notifier

Race 2): the guest if it's not triggering page faults should always see the exact same page of the host at all times, without slight delays in the spte updates

Additional info: Estimated impact for KVM/KSM common virt usage for RHEL/REHV:

Race 1) KVM/KSM impact: the only damage it can do is to call SetPageDirty (atomic) while we may be updating the page->flags not atomically. The dirty bit could
get lost (if the not atomic side does: flags = page->flags;
page->flags = flags | PG_something). But I'm not sure this can lead to
real corruption of the page->flags. On x86 I tend to believe it may be safe (not leading to corrupted dirty bit) and we may just lose the dirty bit which is fine, as we weren't supposed to set it in the first place.

So I think Race 1) cannot cause problems to KVM/KSM.

Race 2) KVM/KSM impact: it can only lead to a minuscule delayed
write done by the host (visible to KVM with a slight delay) but only
if KSM is used. Memory corruption is not possible (the secondary MMU
must have the old page mapped readonly).

fork cannot trigger any COW with KVM, because the whole range tracked
by the secondary MMU is set as MADV_DONTFORK, so KSM is required to trigger it in KVM.

The effect of seeing the host writes slightly delayed is hard to tell, but it's unlikely to cause any trouble even if it happens. The guest CPU may always get an interrupt and see host writes with a slight delay regardless, so it's probably only a theoretical problem. It's still worth fixing for a peace of mind by guarateeing host and guest are seeing the exact same page at all times without delays in the guest view.

Additional info: Worst case impact for other mmu notifier users (KVM/KSM not!):

Race 1): They can be impacted severely, worst case leads to memory corruption.

Race 2): as in the KVM/KSM case, it's hard to tell if it could ever cause problems, likely not.
Comment 2 Andrea Arcangeli 2012-09-05 07:54:22 EDT
Updates (6 patches) posted to rhkernel-list in the thread starting with Message-Id: <1346845953-1538-1-git-send-email-aarcange@redhat.com>.
Comment 3 RHEL Product and Program Management 2012-09-06 05:32:25 EDT
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 4 FuXiangChun 2012-09-07 05:34:48 EDT
Andrea,
      Although this issue only lead to a minuscule delayed write done by the host (visible to KVM with a slight delay) and It almost don't affect kvm. but we still need to reproduce and verify it. About how to reproduce it can you provide some clues for me? or which memory testing tools are available. otherwise, I don't any idea how to reproduce it.
Comment 5 Andrea Arcangeli 2012-09-14 13:08:16 EDT
I don't think it can be reproduced, the race window looks too small to attempt that. Testing KSM+KVM for regressions sounds good enough.
Comment 7 Jarod Wilson 2012-10-01 13:44:09 EDT
Patch(es) available on kernel-2.6.32-315.el6
Comment 12 errata-xmlrpc 2013-02-21 01:34:13 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Note You need to log in before you can comment on or make changes to this bug.