Created attachment 296066 [details] Patch to new version of script where the freeze does not occur From my first impression: The first run of script does not umount the secret/ dir, and in second run the script does mounting to the same mount point, and crashes kernel. After applying this patch to the script and rm -rf secret/ everything is "fine" -- no crash.
I recreated the oops with the attached script. This is an explicit BUG() that is being tripped. It looks like some sequence of operations is leaving us with a lower dentry that has a d_count of 0, which should not be the case after we do a lookup_one_len() to find the lower dentry. I am investigating. --- kernel BUG at fs/ecryptfs/inode.c:299! invalid opcode: 0000 [#1] SMP last sysfs file: /fs/ecryptfs/version Modules linked in: md5 aes_generic aes_i586 ecryptfs autofs4 hidp rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath parport_pc lp parport floppy pcspkr serio_raw 8139too 8139cp mii ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<e0b0e0b8>] Not tainted VLI EFLAGS: 00010246 (2.6.18-81.el5PAE #1) EIP is at ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs] eax: 00000000 ebx: d0bcbee8 ecx: dffef080 edx: dc234228 esi: dfce4ac0 edi: ce5d9a40 ebp: ce5d9040 esp: cff04d94 ds: 007b es: 007b ss: 0068 Process cat (pid: 6763, ti=cff04000 task=c2ebcaa0 task.ti=cff04000) Stack: cff04f3c cff04e1c 00000246 dc234228 dc234888 dc234228 c048377b dec02cc8 d91764a0 e0b1efe0 d0bcbee8 ce5d9040 ce5d90b4 c047abc1 cff04e28 cff04e1c cff04f3c cea2af40 ad96d94b ce5d9040 c99d4007 cff04f3c c047c965 00000000 Call Trace: [<c048377b>] d_alloc+0x14f/0x17d [<c047abc1>] do_lookup+0xb4/0x166 [<c047c965>] __link_path_walk+0x87a/0xd33 [<c047ce67>] link_path_walk+0x49/0xbd [<c047d234>] do_path_lookup+0x20e/0x25e [<c047dae1>] __path_lookup_intent_open+0x42/0x72 [<c047db60>] path_lookup_open+0xf/0x13 [<c047dc56>] open_namei+0x6d/0x5fb [<c046e306>] do_filp_open+0x1c/0x31 [<c046e359>] do_sys_open+0x3e/0xae [<c046e3f6>] sys_open+0x16/0x18 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: b2 0d 00 00 8b 44 24 1c 8b 54 24 20 8b 78 0c 8b 42 0c 8b 50 48 8b 40 44 89 55 48 89 45 44 8b 54 24 1c 83 c4 10 8b 02 85 c0 75 08 <0f> 0b 2b 01 72 55 b1 e0 a1 80 06 b2 e0 ba d0 00 00 00 e8 89 dc EIP: [<e0b0e0b8>] ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs] SS:ESP 0068:cff04d94 --- Mike
This bug is timing-sensitive and/or intermediate. I added a few interspersed echo statements in the test script, and the bug was not triggered. After removing the echo statements, the bug is still not triggered; I suspect that the lower filesystem's dcache has to be in a fresh state after boot to trigger the bug with the operations in this script. Also note that you probably want keysize 8192 instead of 8092 in the script. In any event, a real-life deployment will never require keysize greater than 32 bytes for symmetric ciphers such as aes, twofish, blowfish, and so forth. 16 byte keysizes are sufficient to provide strong protection; going higher than 16 bytes does not buy you any additional security. Given that data extent sizes in eCryptfs are 4096 bytes, I am tempted to add code to just reject any requested keysizes greater than 32 bytes. Mike
> I suspect that the > lower filesystem's dcache has to be in a fresh state after boot to trigger the > bug with the operations in this script. True. It happened to me usually on clear boot. But yesterday it happened on some-hour running machine when I did ls secure/ where to 'secure' was mounted eCryptfs many times (due to script fail). > Also note that you probably want keysize 8192 instead of 8092 in the script. Yes, typo, thanks. > going higher than 16 bytes does not buy you any additional security. ..sure. The purpose it to stress the whole eCryptfs ecosystem.
Note that later versions of the kernel will reject attempts to set keysize greater than 32 for (at least) AES. Do you only observe this kernel oops when using the OpenSSL key module, or does it also happen when you run using the passphrase key module? So far, I have not been able to reproduce it when mounting with key=passphrase. Mike
I've never seen it with passprase key module.
Recently I am facing it up to ten times a day. Any progress there?
I hit this bug while trying to reproduce bz 429142, it turns out a previous run somehow created the foo device node sucessfully and I tried to run 'ls' on it and got this panic (-88 xen, i386 guest on x86_64 dom0). Error opening lower persistent file for lower_dentry [0xc958e448] and lower_mnt [0xc0e35cc0] ecryptfs_interpose: Error attempting to initialize the persistent file for the dentry with name [foo]; rc = [-6] ecryptfs_lookup: Error interposing ------------[ cut here ]------------ kernel BUG at fs/ecryptfs/inode.c:299! invalid opcode: 0000 [#1] SMP last sysfs file: /fs/ecryptfs/version Modules linked in: aes_generic aes_i586 ecryptfs autofs4 hidp rfcomm l2cap bluetooth sunrpc xennet dm_multipath ipv6 xfrm_nalgo crypto_api parport_pc lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0061:[<e14530b8>] Not tainted VLI EFLAGS: 00010246 (2.6.18-88.el5xen #1) EIP is at ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs] eax: 00000000 ebx: c958ee60 ecx: c0eb8080 edx: c958e448 esi: c0e35cc0 edi: c96a00d4 ebp: c96d5cc0 esp: c9568db8 ds: 007b es: 007b ss: 0069 Process ls (pid: 3835, ti=c9568000 task=d16b1aa0 task.ti=c9568000) Stack: c9568f08 c9568e40 c0469812 c958e448 c94b9bb8 c958e448 c04816bb c958e778 c9b8f300 e1463fe0 c958ee60 c96d5cc0 c96d5d34 c0478aed c9568e4c c9568e40 c9568f08 c0e358c0 0024db2a c96d5cc0 de64c007 c9568f08 c047a891 00000000 Call Trace: [<c0469812>] kmem_cache_alloc+0x54/0x5e [<c04816bb>] d_alloc+0x14f/0x17d [<c0478aed>] do_lookup+0xb4/0x166 [<c047a891>] __link_path_walk+0x87a/0xd33 [<c0452b92>] __alloc_pages+0x57/0x297 [<c047ad93>] link_path_walk+0x49/0xbd [<c060a37a>] do_page_fault+0x6de/0xbf1 [<c060a3f3>] do_page_fault+0x757/0xbf1 [<c046047c>] do_mmap_pgoff+0x37b/0x6c3 [<c047b160>] do_path_lookup+0x20e/0x25e [<c047b8a4>] __user_walk_fd+0x29/0x3a [<c047536c>] vfs_stat_fd+0x15/0x3c [<c060a37a>] do_page_fault+0x6de/0xbf1 [<c060a3f3>] do_page_fault+0x757/0xbf1 [<c046047c>] do_mmap_pgoff+0x37b/0x6c3 [<c0475420>] sys_stat64+0xf/0x23 [<c0444349>] audit_syscall_entry+0x14b/0x17d [<c0407b13>] do_syscall_trace+0xab/0xb1 [<c0405413>] syscall_call+0x7/0xb ======================= Code: b2 0d 00 00 8b 44 24 1c 8b 54 24 20 8b 78 0c 8b 42 0c 8b 50 48 8b 40 44 89 55 48 89 45 44 8b 54 24 1c 83 c4 10 8b 02 85 c0 75 08 <0f> 0b 2b 01 72 a5 45 e1 a1 80 56 46 e1 ba d0 00 00 00 e8 ef 66 EIP: [<e14530b8>] ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs] SS:ESP 0069:c9568db8 <0>Kernel panic - not syncing: Fatal exception
Mike, is there a chance that you re-mounted ecryptfs on ecryptfs... I think that the whole problem here is when you mount an ecryptfs filesystem onto an ecryptfs filesystem; basically: # mount -t ecryptfs secret/ secret/ # mount -t ecryptfs secret/ secret/ and I think that if the originally attached test fails, it doesn't unmount when it's done, so we hit this; seems to be a refcounting problem. I'm probably going to suggest a patch to disallow ecryptfs-on-ecryptfs mounts for RHEL5.2, for now, although as Mike says "this should work." There's a) no great usecase for it, and b) not that much stack space! so let's disallow for now and keep rolling....
So I'd fix this test script to be sure that isCrypted and isCrippled unmount before they exit (also if "verbosity=0" is still in the options list, that's no longer a valid option IIRC...) and I'll submit a patch to make re-mounting ecryptfs on ecryptfs fail, at least 'til we get the issues sorted out.
Well, I was going to test this with the latest code submitted for 5.3, but userspace seems to have regressed and it no longer mounts with the mount options from the script. :/ Looking into it ...
Working around the ecryptfs-utils problems, this does appear to still be a kernel issue as well.
in kernel-2.6.18-115.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html