RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 854584 - mmu_notifier: updates for RHEL6.4
Summary: mmu_notifier: updates for RHEL6.4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.4
Hardware: All
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Andrea Arcangeli
QA Contact: Madper Xie
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-05 11:36 UTC by Andrea Arcangeli
Modified: 2014-06-18 08:52 UTC (History)
9 users (show)

Fixed In Version: kernel-2.6.32-315.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-21 06:34:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0496 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6 kernel update 2013-02-20 21:40:54 UTC

Description Andrea Arcangeli 2012-09-05 11:36:58 UTC
Description of problem: two race conditions were discovered in the mmu notifier. The implications for KVM are zero or close to zero but other mmu notifier users might be affected.

The two SMP race conditions are in:

1) mmu_notifier_release

2) set_pte_at_notify


Version-Release number of selected component (if applicable): all RHEL6 releases. RHEL5 might be affected as well but given the mostly theoretical nature of the problem in KVM/KSM context, it's not a concern in RHEL5.

How reproducible: one of the two races is reproducible but not with KVM, and the reproducer isn't public. It might require special hardware. In short it's not reproducible.

Actual results:

Race 1): the secondary MMU mappings are not teared down after unregistering the mmu notifier

Race 2): set_pte_at_notify is updating the primary MMU pte before the secondary MMU pte, leading to the guest to see writes with a slight delay during copy-on-write faults


Expected results:

Race 1): the secondary MMU mappings should be teared down before unregistering the mmu notifier

Race 2): the guest if it's not triggering page faults should always see the exact same page of the host at all times, without slight delays in the spte updates

Additional info: Estimated impact for KVM/KSM common virt usage for RHEL/REHV:

Race 1) KVM/KSM impact: the only damage it can do is to call SetPageDirty (atomic) while we may be updating the page->flags not atomically. The dirty bit could
get lost (if the not atomic side does: flags = page->flags;
page->flags = flags | PG_something). But I'm not sure this can lead to
real corruption of the page->flags. On x86 I tend to believe it may be safe (not leading to corrupted dirty bit) and we may just lose the dirty bit which is fine, as we weren't supposed to set it in the first place.

So I think Race 1) cannot cause problems to KVM/KSM.

Race 2) KVM/KSM impact: it can only lead to a minuscule delayed
write done by the host (visible to KVM with a slight delay) but only
if KSM is used. Memory corruption is not possible (the secondary MMU
must have the old page mapped readonly).

fork cannot trigger any COW with KVM, because the whole range tracked
by the secondary MMU is set as MADV_DONTFORK, so KSM is required to trigger it in KVM.

The effect of seeing the host writes slightly delayed is hard to tell, but it's unlikely to cause any trouble even if it happens. The guest CPU may always get an interrupt and see host writes with a slight delay regardless, so it's probably only a theoretical problem. It's still worth fixing for a peace of mind by guarateeing host and guest are seeing the exact same page at all times without delays in the guest view.

Additional info: Worst case impact for other mmu notifier users (KVM/KSM not!):

Race 1): They can be impacted severely, worst case leads to memory corruption.

Race 2): as in the KVM/KSM case, it's hard to tell if it could ever cause problems, likely not.

Comment 2 Andrea Arcangeli 2012-09-05 11:54:22 UTC
Updates (6 patches) posted to rhkernel-list in the thread starting with Message-Id: <1346845953-1538-1-git-send-email-aarcange>.

Comment 3 RHEL Program Management 2012-09-06 09:32:25 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 4 FuXiangChun 2012-09-07 09:34:48 UTC
Andrea,
      Although this issue only lead to a minuscule delayed write done by the host (visible to KVM with a slight delay) and It almost don't affect kvm. but we still need to reproduce and verify it. About how to reproduce it can you provide some clues for me? or which memory testing tools are available. otherwise, I don't any idea how to reproduce it.

Comment 5 Andrea Arcangeli 2012-09-14 17:08:16 UTC
I don't think it can be reproduced, the race window looks too small to attempt that. Testing KSM+KVM for regressions sounds good enough.

Comment 7 Jarod Wilson 2012-10-01 17:44:09 UTC
Patch(es) available on kernel-2.6.32-315.el6

Comment 12 errata-xmlrpc 2013-02-21 06:34:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html


Note You need to log in before you can comment on or make changes to this bug.