Red Hat Bugzilla – Bug 516909
KSM breaks encryption 157 > kernel > 139 - KSM support now disabled
Last modified: 2009-08-27 09:24:48 EDT
1) Upgrade to grubby-7.0.2-1.fc12.x86_64
2) Install grubby-7.0.2-1.fc12.x86_64
title Fedora (2.6.31-0.145.rc5.git3.fc12.x86_64)
kernel /vmlinuz-2.6.31-0.145.rc5.git3.fc12.x86_64 ro root=/dev/mapper/vg0-rootfs rhgb quiet usbcore.autosuspend=1 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us rd_plytheme=charge
This stanza was written by new-kernel-pkg. Note that it is booting the initrd-generic shipped within the kernel RPM.
3) Attempt boot. My LVM vg is encrypted, with rootfs on a lv within that vg. I type my passphrase, it successfully unlocks it, but then sits there forever.
key slot 0 unlocked.
oops, step #2 is install kernel-2.6.31-0.145.rc5.git3.fc12.x86_64
This is with dracut-0.8-1.fc12.noarch. I tried to create a new initrd image with dracut, but that image exhibits the same problem as initrd-generic.
dracut on kernel-2.6.31-0.125.rc5.git2.fc12.x86_64 works. This seems to be a problem with kernel-2.6.31-0.145.rc5.git3.fc12.x86_64. Reassigning.
kernel-2.6.31-0.145.2.1.rc5.git3.fc12.x86_64 is broken in the same manner.
Same here. The last kernel that is working for me is kernel-2.6.31-0.139.rc5.git3.fc12.x86_64
kernel-2.6.31-0.149.rc5.git3.fc12.x86_64 mkinitrd FAIL
kernel-2.6.31-0.149.rc5.git3.fc12.x86_64 dracut FAIL
I built a LiveCD with kernel-2.6.31-0.149.rc5.git3.fc12.x86_6 + dracut. It gets stuck forever without any error messages and just fails to boot. It seems this has nothing to do with encrypted root.
GOOD kernel-2.6.31-0.139.rc5.git3.fc12.x86_64 mkinitrd
GOOD kernel-2.6.31-0.139.rc5.git3.fc12.x86_64 dracut
FAIL kernel-2.6.31-0.142.rc5.git3.fc12.x86_64 mkinitrd
FAIL kernel-2.6.31-0.142.rc5.git3.fc12.x86_64 dracut
Confirmed, it broke somewhere between 139 and 142.
kernel-2.6.31-0.149.rc5.git3.fc12.sparc64 works just fine here. im using unencrypted lvm
dracut-0.7-4.fc12 was used in the kernel build
I'm confused. The very same livecd of Comment #6 works today, but the kernel installed on my laptop silently gets stuck after unlocking the encrypted disk.
Sven, are you using encryption? enrypted LVM vg specifically?
SysRQ-p after it gets stuck. It appears to be stuck in a loop.
I am using encryption.
I can reproduce those hangs on two machines - both using full vg encryption.
If I'm not mistaken F12Alpha is going to ship with a kernel >139 and that would mean full disk encrytion is broken for alpha. As this is a rather important feature I think blocking on F12Alpha is warranted.
This seems to be the "non-boot-side" of the same bug:
Yeah, believe I can reproduce a similar issue by plugging in a USB hard drive with a Luks encrypted file system:
As reported there, works for 0.139, fails for later kernels up to and including
All these kernels boot fine on my unencrypted LVM, but exhibit "cryptsetup
won't die and consumes available cpu cycles".
I've posted SysRQ-p traces there....
It works with Linus' kernel, patches which introduced problem in Fedora:
Kernel Samepage Merging (KSM).
Quite serious bug, probably all encrypted system are not bootable now.
*** Bug 517545 has been marked as a duplicate of this bug. ***
Both my rawhide machines are back to working state with kernel -157 (which disables the ksm-patches).
From the included KSM series, probmlematic is this patch
Subject: [PATCH 9/12] ksm: fix oom deadlock
(fixes one deadlock...and introduces another one:-)
-157 fixes my "plugging in a USB hard drive with encrypted FS" issue.
FS now mounts and cryptsetup has properly exited.
The F12 Alpha kernel is kernel-2.6.31-0.125.4.2.rc5.git2.fc12, so removing this from the alpha blocker
- '[PATCH 9/12] ksm: fix oom deadlock' appears to cause deadlock with an
encrypted root volume
- This was added in 2.6.31-0.141.rc5.git3 by the addition of this set
of KSM patches:
- the KSM patches have since been disabled since 2.6.31-0.157.rc6 pending
a fix for this
> - '[PATCH 9/12] ksm: fix oom deadlock' appears to cause deadlock with an
> encrypted root volume
FYI: no need to have encrypted root volume, any "cryptsetup luksOpen" on x86_64 will cause deadlock, for process backtrace see bug 517545.
Andrea suggests checking whether these programs are calling madvise() with bogus flags
(In reply to comment #23)
> Andrea suggests checking whether these programs are calling madvise() with
> bogus flags
Not explicitly, but probably forgot to unlock memory - try this code:
int main (int argc, char *argv)
mlockall(MCL_CURRENT | MCL_FUTURE);
Investingating why those troublesome checks that deadlocks mlocked programs are added to page fault path... at first glance they look unnecessary, so asking just in case...
Date: Tue, 25 Aug 2009 16:58:32 +0200
From: Andrea Arcangeli <firstname.lastname@example.org>
To: Hugh Dickins <email@example.com>
Cc: Izik Eidus <firstname.lastname@example.org>, Rik van Riel <email@example.com>,
Chris Wright <firstname.lastname@example.org>,
Nick Piggin <email@example.com>,
Andrew Morton <firstname.lastname@example.org>,
Subject: Re: [PATCH 9/12] ksm: fix oom deadlock
On Mon, Aug 03, 2009 at 01:18:16PM +0100, Hugh Dickins wrote:
> tables which have been freed for reuse; and even do_anonymous_page
> and __do_fault need to check they're not being called by break_ksm
> to reinstate a pte after zap_pte_range has zapped that page table.
This deadlocks exit_mmap in an infinite loop when there's some region
locked. mlock calls gup and pretends to page fault successfully if
there's a vma existing on the region, but it doesn't page fault
anymore because of the mm_count being 0 already, so follow_page fails
and gup retries the page fault forever. And generally I don't like to
add those checks to page fault fast path.
Given we check mm_users == 0 (ksm_test_exit) after taking mmap_sem in
unmerge_and_remove_all_rmap_items, why do we actually need to care
that a page fault happens? We hold mmap_sem so we're guaranteed to see
mm_users == 0 and we won't ever break COW on that mm with mm_users ==
0 so I think those troublesome checks from page fault can be simply
Created attachment 358588 [details]
attempted fix (last one was wrong diff)
Created attachment 358624 [details]
new proposed patch
this is actually making ksm_exit simpler and it already contains down_write(mmap_sem)
(also this time I checked which workstation I'm running firefox on, before picking a random file from /tmp ;)
discussion is going live on linux-mm with Hugh
Hugh acked my attachment 358624 [details] so please apply it and then we can close this bug. We've still some issue to discuss on oom handling with ksm on linux-mm but those aren't crtical issues and once we solve them, patches will flow in rawhide.
Already applied and should be in kernel-2.6.31-0.180.rc7.git4.fc12 today. KSM has been re-enabled.