Description of problem: The following kernel panic occurred following an attempt to unmount an ecryptfs overlay. Of possible interest is that on the cli, I proposed twofish encryption, but since the tools still insist on asking me for the cipher, I chose AES-256 instead. Version-Release number of selected component (if applicable): kernel 2.6.18-58.el5 + eric's ecryptfs backport How reproducible: [root@xw4400-01 .ecryptfs]# mount -t ecryptfs -o key=openssl:keyfile=/root/.ecryptfs/pki/openssl/key.pem:ecryptfs_cipher=twofish:ecryptfs_passthrough=no:passthrough=no /secret /secret Passphrase: Cipher 1) Twofish 2) AES-128 3) AES-192 4) AES-256 5) CAST6 6) Triple-DES 7) Blowfish 8) CAST5 Selection [AES-128]: 1 Attempting to mount with the following options: ecryptfs_cipher=twofish ecryptfs_key_bytes=16 ecryptfs_sig=94733f578c4b75f6 Mounted eCryptfs [root@xw4400-01 .ecryptfs]# cat /secret/twofish.txt twofish encryption in the house [root@xw4400-01 .ecryptfs]# umount Usage: umount [-hV] umount -a [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts] umount [-f] [-r] [-n] [-v] special | node... [root@xw4400-01 .ecryptfs]# umount /secret/ [root@xw4400-01 .ecryptfs]# mount -t ecryptfs -o key=openssl:keyfile=/root/.ecryptfs/pki/openssl/key.pem:ecryptfs_cipher=twofish:ecryptfs_passthrough=no:passthrough=no /secret /secret Passphrase: Cipher 1) Twofish 2) AES-128 3) AES-192 4) AES-256 5) CAST6 6) Triple-DES 7) Blowfish 8) CAST5 Selection [AES-128]: 4 Attempting to mount with the following options: ecryptfs_cipher=aes ecryptfs_key_bytes=32 ecryptfs_sig=94733f578c4b75f6 Mounted eCryptfs [root@xw4400-01 .ecryptfs]# cat /secret/twofish.txt twofish encryption in the house [root@xw4400-01 .ecryptfs]# umount /secret/ BUG: Dentry ffff810019a20df8{i=7ff37,n=twofish.txt} still in use (-1) [unmount of ecryptfs ecryptfs] Unable to handle kernel NULL pointer dereference at 0000000000000098 RIP: [<ffffffff885b58b8>] :ecryptfs:ecryptfs_show_options+0x1c/0x83 PGD 2a2c7067 PUD 2a2c8067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /fs/ecryptfs/version CPU 1 Modules linked in: twofish ecryptfs(U) md5 aes ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc netxen_nic cpufreq_ondemand dm_multipath video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac lp snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sg snd_pcm snd_timer ata_piix snd shpchp floppy soundcore parport_pc ide_cd snd_page_alloc firewire_ohci cdrom parport firewire_core tg3 serio_raw pcspkr dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 2997, comm: hald Not tainted 2.6.18-58.el5 #1 RIP: 0010:[<ffffffff885b58b8>] [<ffffffff885b58b8>] :ecryptfs:ecryptfs_show_options+0x1c/0x83 RSP: 0018:ffff81002a2cde68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff810022db76c0 RCX: 0000000000000000 RDX: 0000000000000040 RSI: 0000000000000000 RDI: 00000000000000d0 RBP: ffff81003ff8a580 R08: ffff810023fb7259 R09: ffff810022db76c0 R10: ffffffffffffffff R11: 0000000000000000 R12: 0000000000000000 R13: ffff810022db76c0 R14: 0000000000000000 R15: 00002aaaaaab4000 FS: 00002aaaaaac9b00(0000) GS:ffff810037ca17c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000098 CR3: 000000002a2c6000 CR4: 00000000000006e0 Process hald (pid: 2997, threadinfo ffff81002a2cc000, task ffff81002a0147e0) Stack: ffff81003ff8a580 ffff810022db76c0 ffff81003ff8a580 0000000000000000 0000000000001000 ffffffff8003076a ffff810022db76c0 ffff81003ff8a580 0000000000000241 ffffffff8003ebb9 ffff81002a2cdf50 ffff81002a5524c0 Call Trace: [<ffffffff8003076a>] show_vfsmnt+0x10f/0x129 [<ffffffff8003ebb9>] seq_read+0x1b8/0x28c [<ffffffff8000b270>] vfs_read+0xcb/0x171 [<ffffffff80011508>] sys_read+0x45/0x6e [<ffffffff8005b28d>] tracesys+0xd5/0xe0 Code: 48 8b 80 98 00 00 00 4c 8b 20 48 8b 68 08 e8 b3 62 a8 f7 48 RIP [<ffffffff885b58b8>] :ecryptfs:ecryptfs_show_options+0x1c/0x83 RSP <ffff81002a2cde68> CR2: 0000000000000098 <0>Kernel panic - not syncing: Fatal exception
Thus far, I've been unable to reproduce this one, but will keep an eye out for it...
Based on the size of the function (0x83) looks like this is prior to the patch I did which shows actual mount options... and we know that we had some bad pointer manipulation when bad mount options were given, so this may just be random corruption. The real problem here is likely: BUG: Dentry ffff810019a20df8{i=7ff37,n=twofish.txt} still in use (-1) [unmount of ecryptfs ecryptfs] the "-1" is atomic_read(&dentry->d_count). Hm, refcounting problems? and then it looks like the old show_options tried to manipulate some of the dentries, and sb->s_root was null... hm, interesting, it was the hald thread that oopsed. But after the above BUG() message, I think all bets are off. I'd like to know why the dentry was in use. If we don't see it again I'll close it and chalk it up to the memory corruption problems we fixed, but I'll leave it open a while to see if we see it again.
Ah... any chance you had a readonly mount underneath, or some other file open failure? As phro pointed out: Index: ecryptfs-kernel-2.6.24-rc3/main.c =================================================================== --- ecryptfs-kernel-2.6.24-rc3.orig/main.c +++ ecryptfs-kernel-2.6.24-rc3/main.c @@ -138,11 +138,14 @@ int ecryptfs_init_persistent_file(struct inode_info->lower_file = dentry_open(lower_dentry, lower_mnt, (O_RDWR | O_LARGEFILE)); - if (IS_ERR(inode_info->lower_file)) + if (IS_ERR(inode_info->lower_file)) { + dget(lower_dentry); + mntget(lower_mnt); inode_info->lower_file = dentry_open(lower_dentry, lower_mnt, (O_RDONLY | O_LARGEFILE)); + } if (IS_ERR(inode_info->lower_file)) { printk(KERN_ERR "Error opening lower persistent file " "for lower_dentry [0x%p] and lower_mnt [0x%p]\n", this might explain the -1 refcount.
I'm not able to reproduce this so far using the -88 kernel.
We've never seen this again, and I think that the patch as shown in comment #3 (which is in the rhel5.2 and upstream code, now) is the likely (fixed) culprit.