1) Upgrade to grubby-7.0.2-1.fc12.x86_64 2) Install grubby-7.0.2-1.fc12.x86_64 title Fedora (2.6.31-0.145.rc5.git3.fc12.x86_64) root (hd0,0) kernel /vmlinuz-2.6.31-0.145.rc5.git3.fc12.x86_64 ro root=/dev/mapper/vg0-rootfs rhgb quiet usbcore.autosuspend=1 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us rd_plytheme=charge initrd /initrd-generic-2.6.31-0.145.rc5.git3.fc12.x86_64.img This stanza was written by new-kernel-pkg. Note that it is booting the initrd-generic shipped within the kernel RPM. 3) Attempt boot. My LVM vg is encrypted, with rootfs on a lv within that vg. I type my passphrase, it successfully unlocks it, but then sits there forever. key slot 0 unlocked. Command successful.
oops, step #2 is install kernel-2.6.31-0.145.rc5.git3.fc12.x86_64 This is with dracut-0.8-1.fc12.noarch. I tried to create a new initrd image with dracut, but that image exhibits the same problem as initrd-generic.
dracut on kernel-2.6.31-0.125.rc5.git2.fc12.x86_64 works. This seems to be a problem with kernel-2.6.31-0.145.rc5.git3.fc12.x86_64. Reassigning.
kernel-2.6.31-0.145.2.1.rc5.git3.fc12.x86_64 is broken in the same manner.
Same here. The last kernel that is working for me is kernel-2.6.31-0.139.rc5.git3.fc12.x86_64
kernel-2.6.31-0.149.rc5.git3.fc12.x86_64 mkinitrd FAIL kernel-2.6.31-0.149.rc5.git3.fc12.x86_64 dracut FAIL
I built a LiveCD with kernel-2.6.31-0.149.rc5.git3.fc12.x86_6 + dracut. It gets stuck forever without any error messages and just fails to boot. It seems this has nothing to do with encrypted root.
GOOD kernel-2.6.31-0.139.rc5.git3.fc12.x86_64 mkinitrd GOOD kernel-2.6.31-0.139.rc5.git3.fc12.x86_64 dracut FAIL kernel-2.6.31-0.142.rc5.git3.fc12.x86_64 mkinitrd FAIL kernel-2.6.31-0.142.rc5.git3.fc12.x86_64 dracut Confirmed, it broke somewhere between 139 and 142.
kernel-2.6.31-0.149.rc5.git3.fc12.sparc64 works just fine here. im using unencrypted lvm dracut-0.7-4.fc12 was used in the kernel build
I'm confused. The very same livecd of Comment #6 works today, but the kernel installed on my laptop silently gets stuck after unlocking the encrypted disk. Sven, are you using encryption? enrypted LVM vg specifically?
http://people.redhat.com/wtogami/temp/post139loop.jpg SysRQ-p after it gets stuck. It appears to be stuck in a loop.
I am using encryption. I can reproduce those hangs on two machines - both using full vg encryption.
If I'm not mistaken F12Alpha is going to ship with a kernel >139 and that would mean full disk encrytion is broken for alpha. As this is a rather important feature I think blocking on F12Alpha is warranted.
This seems to be the "non-boot-side" of the same bug: https://bugzilla.redhat.com/show_bug.cgi?id=517545
Yeah, believe I can reproduce a similar issue by plugging in a USB hard drive with a Luks encrypted file system: https://bugzilla.redhat.com/show_bug.cgi?id=517545 As reported there, works for 0.139, fails for later kernels up to and including kernel-2.6.31-0.156.rc6.fc12.x86_64. All these kernels boot fine on my unencrypted LVM, but exhibit "cryptsetup won't die and consumes available cpu cycles". I've posted SysRQ-p traces there.... Same issue?
It works with Linus' kernel, patches which introduced problem in Fedora: Kernel Samepage Merging (KSM). linux-2.6-ksm.patch linux-2.6-ksm-updates.patch Quite serious bug, probably all encrypted system are not bootable now.
*** Bug 517545 has been marked as a duplicate of this bug. ***
Both my rawhide machines are back to working state with kernel -157 (which disables the ksm-patches).
From the included KSM series, probmlematic is this patch Subject: [PATCH 9/12] ksm: fix oom deadlock (fixes one deadlock...and introduces another one:-)
-157 fixes my "plugging in a USB hard drive with encrypted FS" issue. FS now mounts and cryptsetup has properly exited.
The F12 Alpha kernel is kernel-2.6.31-0.125.4.2.rc5.git2.fc12, so removing this from the alpha blocker
Summary: - '[PATCH 9/12] ksm: fix oom deadlock' appears to cause deadlock with an encrypted root volume - This was added in 2.6.31-0.141.rc5.git3 by the addition of this set of KSM patches: http://cvs.fedoraproject.org/viewvc/rpms/kernel/devel/linux-2.6-ksm-updates.patch?revision=1.1 - the KSM patches have since been disabled since 2.6.31-0.157.rc6 pending a fix for this
> - '[PATCH 9/12] ksm: fix oom deadlock' appears to cause deadlock with an > encrypted root volume FYI: no need to have encrypted root volume, any "cryptsetup luksOpen" on x86_64 will cause deadlock, for process backtrace see bug 517545.
Andrea suggests checking whether these programs are calling madvise() with bogus flags
(In reply to comment #23) > Andrea suggests checking whether these programs are calling madvise() with > bogus flags Not explicitly, but probably forgot to unlock memory - try this code: #include <sys/mman.h> int main (int argc, char *argv[]) { mlockall(MCL_CURRENT | MCL_FUTURE); // munlockall(); return 0; }
Investingating why those troublesome checks that deadlocks mlocked programs are added to page fault path... at first glance they look unnecessary, so asking just in case... Date: Tue, 25 Aug 2009 16:58:32 +0200 From: Andrea Arcangeli <aarcange> To: Hugh Dickins <hugh.dickins.uk> Cc: Izik Eidus <ieidus>, Rik van Riel <riel>, Chris Wright <chrisw>, Nick Piggin <nickpiggin.au>, Andrew Morton <akpm>, linux-kernel.org, linux-mm Subject: Re: [PATCH 9/12] ksm: fix oom deadlock On Mon, Aug 03, 2009 at 01:18:16PM +0100, Hugh Dickins wrote: > tables which have been freed for reuse; and even do_anonymous_page > and __do_fault need to check they're not being called by break_ksm > to reinstate a pte after zap_pte_range has zapped that page table. This deadlocks exit_mmap in an infinite loop when there's some region locked. mlock calls gup and pretends to page fault successfully if there's a vma existing on the region, but it doesn't page fault anymore because of the mm_count being 0 already, so follow_page fails and gup retries the page fault forever. And generally I don't like to add those checks to page fault fast path. Given we check mm_users == 0 (ksm_test_exit) after taking mmap_sem in unmerge_and_remove_all_rmap_items, why do we actually need to care that a page fault happens? We hold mmap_sem so we're guaranteed to see mm_users == 0 and we won't ever break COW on that mm with mm_users == 0 so I think those troublesome checks from page fault can be simply removed.
Created attachment 358588 [details] attempted fix (last one was wrong diff)
Created attachment 358624 [details] new proposed patch this is actually making ksm_exit simpler and it already contains down_write(mmap_sem) (also this time I checked which workstation I'm running firefox on, before picking a random file from /tmp ;) discussion is going live on linux-mm with Hugh
Hugh acked my attachment 358624 [details] so please apply it and then we can close this bug. We've still some issue to discuss on oom handling with ksm on linux-mm but those aren't crtical issues and once we solve them, patches will flow in rawhide. thanks!
Already applied and should be in kernel-2.6.31-0.180.rc7.git4.fc12 today. KSM has been re-enabled.