Bug 435115 - kernel freezes when running script which features ecryptfs parts of kernel
kernel freezes when running script which features ecryptfs parts of kernel
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
medium Severity high
: rc
: ---
Assigned To: Eric Sandeen
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-27 08:58 EST by Michal Nowak
Modified: 2013-03-07 21:03 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:10:42 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
RHTS scritp causing crash (20.00 KB, application/x-tar)
2008-02-27 08:58 EST, Michal Nowak
no flags Details
Patch to new version of script where the freeze does not occur (508 bytes, patch)
2008-02-27 09:24 EST, Michal Nowak
no flags Details | Diff

  None (edit)
Comment 2 Michal Nowak 2008-02-27 09:24:04 EST
Created attachment 296066 [details]
Patch to new version of script where the freeze does not occur

From my first impression:

The first run of script does not umount the secret/ dir, and in second run the
script does mounting to the same mount point, and crashes kernel.

After applying this patch to the script and rm -rf secret/ everything is "fine"
-- no crash.
Comment 3 Michael Halcrow 2008-02-27 15:22:57 EST
I recreated the oops with the attached script. This is an explicit BUG() that is
being tripped. It looks like some sequence of operations is leaving us with a
lower dentry that has a d_count of 0, which should not be the case after we do a
lookup_one_len() to find the lower dentry. I am investigating.

---
kernel BUG at fs/ecryptfs/inode.c:299!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /fs/ecryptfs/version
Modules linked in: md5 aes_generic aes_i586 ecryptfs autofs4 hidp rfcomm l2cap
bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack
nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter
ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath parport_pc lp
parport floppy pcspkr serio_raw 8139too 8139cp mii ide_cd cdrom dm_snapshot
dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<e0b0e0b8>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-81.el5PAE #1) 
EIP is at ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs]
eax: 00000000   ebx: d0bcbee8   ecx: dffef080   edx: dc234228
esi: dfce4ac0   edi: ce5d9a40   ebp: ce5d9040   esp: cff04d94
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 6763, ti=cff04000 task=c2ebcaa0 task.ti=cff04000)
Stack: cff04f3c cff04e1c 00000246 dc234228 dc234888 dc234228 c048377b dec02cc8 
       d91764a0 e0b1efe0 d0bcbee8 ce5d9040 ce5d90b4 c047abc1 cff04e28 cff04e1c 
       cff04f3c cea2af40 ad96d94b ce5d9040 c99d4007 cff04f3c c047c965 00000000 
Call Trace:
 [<c048377b>] d_alloc+0x14f/0x17d
 [<c047abc1>] do_lookup+0xb4/0x166
 [<c047c965>] __link_path_walk+0x87a/0xd33
 [<c047ce67>] link_path_walk+0x49/0xbd
 [<c047d234>] do_path_lookup+0x20e/0x25e
 [<c047dae1>] __path_lookup_intent_open+0x42/0x72
 [<c047db60>] path_lookup_open+0xf/0x13
 [<c047dc56>] open_namei+0x6d/0x5fb
 [<c046e306>] do_filp_open+0x1c/0x31
 [<c046e359>] do_sys_open+0x3e/0xae
 [<c046e3f6>] sys_open+0x16/0x18
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
Code: b2 0d 00 00 8b 44 24 1c 8b 54 24 20 8b 78 0c 8b 42 0c 8b 50 48 8b 40 44 89
55 48 89 45 44 8b 54 24 1c 83 c4 10 8b 02 85 c0 75 08 <0f> 0b 2b 01 72 55 b1 e0
a1 80 06 b2 e0 ba d0 00 00 00 e8 89 dc 
EIP: [<e0b0e0b8>] ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs] SS:ESP 0068:cff04d94
---

Mike
Comment 4 Michael Halcrow 2008-02-27 16:40:35 EST
This bug is timing-sensitive and/or intermediate. I added a few interspersed
echo statements in the test script, and the bug was not triggered. After
removing the echo statements, the bug is still not triggered; I suspect that the
lower filesystem's dcache has to be in a fresh state after boot to trigger the
bug with the operations in this script.

Also note that you probably want keysize 8192 instead of 8092 in the script. In
any event, a real-life deployment will never require keysize greater than 32
bytes for symmetric ciphers such as aes, twofish, blowfish, and so forth. 16
byte keysizes are sufficient to provide strong protection; going higher than 16
bytes does not buy you any additional security. Given that data extent sizes in
eCryptfs are 4096 bytes, I am tempted to add code to just reject any requested
keysizes greater than 32 bytes.

Mike
Comment 5 Michal Nowak 2008-02-28 05:17:16 EST
> I suspect that the
> lower filesystem's dcache has to be in a fresh state after boot to trigger the
> bug with the operations in this script.

True. It happened to me usually on clear boot. But yesterday it happened on
some-hour running machine when I did 

    ls secure/

where to 'secure' was mounted eCryptfs many times (due to script fail).

> Also note that you probably want keysize 8192 instead of 8092 in the script.

Yes, typo, thanks. 

> going higher than 16 bytes does not buy you any additional security.

..sure. The purpose it to stress the whole eCryptfs ecosystem.
Comment 6 Michael Halcrow 2008-02-28 11:52:50 EST
Note that later versions of the kernel will reject attempts to set keysize
greater than 32 for (at least) AES.

Do you only observe this kernel oops when using the OpenSSL key module, or does
it also happen when you run using the passphrase key module? So far, I have not
been able to reproduce it when mounting with key=passphrase.

Mike
Comment 7 Michal Nowak 2008-02-29 05:55:50 EST
I've never seen it with passprase key module.
Comment 8 Michal Nowak 2008-03-07 05:28:04 EST
Recently I am facing it up to ten times a day. Any progress there?
Comment 10 Mike Gahagan 2008-04-09 16:03:32 EDT
I hit this bug while trying to reproduce bz 429142, it turns out a previous run
somehow created the foo device node sucessfully and I tried to run 'ls' on it
and got this panic (-88 xen, i386 guest on x86_64 dom0).


Error opening lower persistent file for lower_dentry [0xc958e448] and lower_mnt
[0xc0e35cc0]
ecryptfs_interpose: Error attempting to initialize the persistent file for the
dentry with name [foo]; rc = [-6]
ecryptfs_lookup: Error interposing
------------[ cut here ]------------
kernel BUG at fs/ecryptfs/inode.c:299!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /fs/ecryptfs/version
Modules linked in: aes_generic aes_i586 ecryptfs autofs4 hidp rfcomm l2cap
bluetooth sunrpc xennet dm_multipath ipv6 xfrm_nalgo crypto_api parport_pc lp
parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod xenblk ext3 jbd uhci_hcd
ohci_hcd ehci_hcd
CPU:    0
EIP:    0061:[<e14530b8>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-88.el5xen #1)
EIP is at ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs]
eax: 00000000   ebx: c958ee60   ecx: c0eb8080   edx: c958e448
esi: c0e35cc0   edi: c96a00d4   ebp: c96d5cc0   esp: c9568db8
ds: 007b   es: 007b   ss: 0069
Process ls (pid: 3835, ti=c9568000 task=d16b1aa0 task.ti=c9568000)
Stack: c9568f08 c9568e40 c0469812 c958e448 c94b9bb8 c958e448 c04816bb c958e778
       c9b8f300 e1463fe0 c958ee60 c96d5cc0 c96d5d34 c0478aed c9568e4c c9568e40
       c9568f08 c0e358c0 0024db2a c96d5cc0 de64c007 c9568f08 c047a891 00000000
Call Trace:
 [<c0469812>] kmem_cache_alloc+0x54/0x5e
 [<c04816bb>] d_alloc+0x14f/0x17d
 [<c0478aed>] do_lookup+0xb4/0x166
 [<c047a891>] __link_path_walk+0x87a/0xd33
 [<c0452b92>] __alloc_pages+0x57/0x297
 [<c047ad93>] link_path_walk+0x49/0xbd
 [<c060a37a>] do_page_fault+0x6de/0xbf1
 [<c060a3f3>] do_page_fault+0x757/0xbf1
 [<c046047c>] do_mmap_pgoff+0x37b/0x6c3
 [<c047b160>] do_path_lookup+0x20e/0x25e
 [<c047b8a4>] __user_walk_fd+0x29/0x3a
 [<c047536c>] vfs_stat_fd+0x15/0x3c
 [<c060a37a>] do_page_fault+0x6de/0xbf1
 [<c060a3f3>] do_page_fault+0x757/0xbf1
 [<c046047c>] do_mmap_pgoff+0x37b/0x6c3
 [<c0475420>] sys_stat64+0xf/0x23
 [<c0444349>] audit_syscall_entry+0x14b/0x17d
 [<c0407b13>] do_syscall_trace+0xab/0xb1
 [<c0405413>] syscall_call+0x7/0xb
 =======================
Code: b2 0d 00 00 8b 44 24 1c 8b 54 24 20 8b 78 0c 8b 42 0c 8b 50 48 8b 40 44 89
55 48 89 45 44 8b 54 24 1c 83 c4 10 8b 02 85 c0 75 08 <0f> 0b 2b 01 72 a5 45 e1
a1 80 56 46 e1 ba d0 00 00 00 e8 ef 66
EIP: [<e14530b8>] ecryptfs_lookup+0x1d3/0x4a2 [ecryptfs] SS:ESP 0069:c9568db8
 <0>Kernel panic - not syncing: Fatal exception
Comment 11 Eric Sandeen 2008-04-09 17:29:12 EDT
Mike, is there a chance that you re-mounted ecryptfs on ecryptfs...

I think that the whole problem here is when you mount an ecryptfs filesystem
onto an ecryptfs filesystem; basically:

# mount -t ecryptfs secret/ secret/
# mount -t ecryptfs secret/ secret/

and I think that if the originally attached test fails, it doesn't unmount when
it's done, so we hit this; seems to be a refcounting problem.

I'm probably going to suggest a patch to disallow ecryptfs-on-ecryptfs mounts
for RHEL5.2, for now, although as Mike says "this should work."

There's a) no great usecase for it, and b) not that much stack space! so let's
disallow for now and keep rolling....
Comment 12 Eric Sandeen 2008-04-09 18:34:04 EDT
So I'd fix this test script to be sure that isCrypted and isCrippled unmount
before they exit (also if "verbosity=0" is still in the options list, that's no
longer a valid option IIRC...) and I'll submit a patch to make re-mounting
ecryptfs on ecryptfs fail, at least 'til we get the issues sorted out.
Comment 16 Eric Sandeen 2008-08-27 14:02:48 EDT
Well, I was going to test this with the latest code submitted for 5.3, but userspace seems to have regressed and it no longer mounts with the mount options from the script.  :/  Looking into it ...
Comment 17 Eric Sandeen 2008-08-27 14:25:43 EDT
Working around the ecryptfs-utils problems, this does appear to still be a kernel issue as well.
Comment 19 Don Zickus 2008-09-15 10:16:38 EDT
in kernel-2.6.18-115.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 23 errata-xmlrpc 2009-01-20 15:10:42 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.