A flaw has been discovered in the Linux kernel memory deduplication mechanism. Previous research has shown that memory deduplication can be attacked by exploiting the Copy-on-Write (COW) mechanism. However, the max page sharing[1] of Kernel Samepage Merging (KSM), added in Linux kernel version 4.4.0-96.119, can create another side channel. When the attacker and the victim share the same host and the default setting of KSM is "max page sharing=256", the attacker maps 256 memory of the same pages it wants to learn and waits. He can then time the unmap to see if it merges with the victim's page. The reason the unmapping time depends on whether it merges with the victim's page is that additional physical pages are created beyond the KSM's "max page share". Through these operations, the attacker leaks the victim's page. We have confirmed that the target Linux kernel versions are 4.4.0-96.119 through 5.15.0-58-generic, and we expect later versions to be possible. The research, titled "Exploiting Memory Page Management in KSM for Remote Memory Deduplication Attack," was presented at The 24th World Conference on Information Security Applications (WISA), 2023 [2] and will be published by Springer this year [3]. - Reference - [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1680513 [2] https://wisa.or.kr/accepted [3] https://link.springer.com/conference/wisa
Created kernel tracking bugs for this issue: Affects: fedora-all [bug 2259410]
Here's my take on this one, and I don't think it deserves to be classified as "medium" severity level. As far as I understand it, the premise of this issue is that one bad actor can prepare a page with (a) data that is expected to be present on a given target VM, then map it N times on a compromised VM, where N should be the value set at /sys/kernel/mm/ksm/max_page_sharing in the host, (b) wait for KSM to perform a full deduplication scan for the compromised VM, which time will be calculated via the values set for /sys/kernel/mm/ksm/pages_to_scan and /sys/kernel/mm/ksm/sleep_millisecs and the amount of RAM available to the compromised VM, to then measure the time it takes to unmap the crafted pages to determine, by latency values, if the crafted pages were merged with the target "secret" page, or not. I think this kind of attack is highly unlikely to work outside of the controlled environment described on the paper. To begin with, all the sysfs values that are needed to have the timing factor working in favor of the bad actor are unknowns, given these are HOST values that can -- and should -- be modified by sysadmins on a case-by-case basis and this addresses (b), IMO. The real elephant in the room, however, is the fact that the bad actor has to prepare a page with data thought to exist on the target (a). On the experiment, they worked with a convenient string 0xDEADCAFE filing the whole page, but in real life an attacker would have to produce a 4096 byte stream of data that matches exactly the content of their target secret, otherwise KSM won't de-duplicate that page. If we take the complete combination of bytes in a 4k page we get the astronomical value of 1.66 X 10^421, which means it borders the impossible to produce random data that will match with any other page candidate to be merged by KSM. Even if the attacker has knowledge of some strings that can potentially be present within the memory of their targeted VM, it is still required that they produce a 4kB string that perfectly matches with that data as it is laid out on the target page to have a chance to use the timing side-channel to infer that the victim does indeed has that data resident. In a glance, I'd argue that unless the bad actor knows beforehand that a VM has a particular page holding a specific pattern -- which defeats the purpose of such attacks -- the issue discussed at this paper has little practical application as well as such exploit can be easily defeated by simply changing KSM's default settings for max_page_sharing, pages_to_scan and sleep_millisecs, in the virtualization host (as those will not be known by the attacker).
A couple of things here: (1) I stand by and reinforce my comment #8: this CVE doesn't deserve the current 'medium' severity classification it holds; (2) There are several ways to mitigate the problem without pushing any code changes to upstream or RHEL, one of them by simply disabling KSM altogether. We should just close this tracker.