Bug 867344

Summary: [abrt]: kernel BUG at fs/dcache.c:2138!
Product: [Fedora] Fedora Reporter: Vit Zahradka <vit.zahradka>
Component: kernelAssignee: Sachin Prabhu <sprabhu>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: aviro, cryogenium, gansalmon, gartim, hectorguerrero1866, hollladiewaldfee, itamar, jforbes, jlayton, jonathan, kernel-maint, kevin.paetzold, korsnick, madhu.chinakonda, marcus.moeller, motoskov, sprabhu
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Unspecified   
Whiteboard: abrt_hash:23a683122a84f0f780ae9c7d5a32288dbd6d02e7
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-12 20:57:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch -- don't call cifs_lookup on hashed, negative dentry
none
cifs: Do not lookup hashed negative dentry in cifs_atomic_open none

Description Vit Zahradka 2012-10-17 10:59:54 UTC
Description of problem:
problem occurs probably in case I try to access indexed projects data in Eclipse

Additional info:
libreport version: 2.0.14
abrt_version:   2.0.13
cmdline:        BOOT_IMAGE=/vmlinuz-3.6.1-1.fc17.i686 root=/dev/mapper/vg_krecek-lv_root ro rd.md=0 rd.dm=0 SYSFONT=True KEYTABLE=us rd.luks=0 rd.lvm.lv=vg_krecek/lv_swap rd.lvm.lv=vg_krecek/lv_root LANG=en_US.UTF-8 rhgb quiet
kernel:         3.6.1-1.fc17.i686

backtrace:
:kernel BUG at fs/dcache.c:2138!
:invalid opcode: 0000 [#1] SMP 
:Modules linked in: fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat des_generic xt_CHECKSUM iptable_mangle md4 bridge stp llc nls_utf8 cifs dns_resolver fscache lockd sunrpc rfcomm bnep be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_conexant btusb bluetooth snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm thinkpad_acpi snd_page_alloc arc4 snd_timer iwldvm mac80211 iwlwifi cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd media rfkill soundcore i2c_i801 mei r592 memstick iTCO_wdt iTCO_vendor_support lpc_ich tg3 coretemp microcode ppdev parport_pc parport vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc firewire_oh
:ci firewire_core crc_itu_t sdhci_pci sdhci mmc_core yenta_socket wmi video radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core
:Pid: 2219, comm: eclipse Not tainted 3.6.1-1.fc17.i686 #1 LENOVO 27326FG/27326FG
:EIP: 0060:[<c0557577>] EFLAGS: 00010282 CPU: 1
:EIP is at __d_rehash+0x57/0x60
:EAX: ee641c80 EBX: 0000096a ECX: f7338208 EDX: f7338208
:ESI: fffffffe EDI: f6339f60 EBP: f0481d84 ESP: f0481d80
: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
:CR0: 8005003b CR2: 06567f84 CR3: 30710000 CR4: 000027d0
:DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
:DR6: ffff0ff0 DR7: 00000400
:Process eclipse (pid: 2219, ti=f0480000 task=f0490c90 task.ti=f0480000)
:Stack:
: 0000096a f0481d90 c05575b2 ee641c80 f0481d9c c05575da 00000000 f0481dd8
: f9c01e32 f0734000 00004213 00000000 f0734000 f0481df4 f11ce600 00004213
: f6b07c60 00000000 00000000 f0481ef8 ee641c80 ef1781e0 f0481e20 f9c01f52
:Call Trace:
: [<c05575b2>] _d_rehash+0x32/0x40
: [<c05575da>] d_rehash+0x1a/0x30
: [<f9c01e32>] cifs_lookup+0x212/0x300 [cifs]
: [<f9c01f52>] cifs_atomic_open+0x32/0x200 [cifs]
: [<c05584bf>] ? d_lookup+0x2f/0x50
: [<c055074c>] do_last+0x42c/0xbd0
: [<c054dd8c>] ? link_path_walk+0x1dc/0x750
: [<c0550f8d>] path_openat+0x9d/0x3b0
: [<c046d47f>] ? wake_up_state+0xf/0x20
: [<c0490ed2>] ? wake_futex+0x32/0x50
: [<c046fcb5>] ? __dequeue_entity+0x25/0x40
: [<c0551481>] do_filp_open+0x31/0x80
: [<c055bdfc>] ? alloc_fd+0x3c/0xe0
: [<c054c970>] ? getname_flags+0x40/0xd0
: [<c0542885>] do_sys_open+0xe5/0x1c0
: [<c05429d0>] sys_openat+0x30/0x40
: [<c095addf>] sysenter_do_call+0x12/0x28
:Code: 48 08 74 03 89 59 04 f6 c3 01 89 50 0c 75 20 f6 02 01 74 19 83 cb 01 89 1a 0f ba 32 00 5b 5d c3 f3 90 8b 0a 80 e1 01 75 f7 eb b9 <0f> 0b 0f 0b 0f 0b 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 0d fc
:EIP: [<c0557577>] __d_rehash+0x57/0x60 SS:ESP 0068:f0481d80

Comment 1 Josh Boyer 2012-10-17 15:24:06 UTC
*** Bug 867319 has been marked as a duplicate of this bug. ***

Comment 2 Jeff Layton 2012-10-17 15:50:58 UTC
Looks like:

        BUG_ON(!d_unhashed(entry));

...and the d_rehash was called from d_add. Perhaps we should be doing something different to add this thing to the cache? I'll need to look more closely at how the atomic_open code works.

Comment 3 Jeff Layton 2012-10-17 18:43:48 UTC
cc'ing Al, in case he has any ideas while I look at this...

Comment 4 Jeff Layton 2012-10-17 20:00:13 UTC
Created attachment 629053 [details]
patch -- don't call cifs_lookup on hashed, negative dentry

Vit, does this patch fix it?

Comment 5 Hector 2012-10-18 19:18:35 UTC
I'm just reboot the system

Package: kernel
Architecture: x86_64
OS Release: Fedora release 17 (Beefy Miracle)

Comment 6 hollladiewaldfee 2012-10-19 19:29:47 UTC
upgrade to 3.6.2-4.fc17.x86_64

Package: kernel
Architecture: x86_64
OS Release: Fedora release 17 (Beefy Miracle)

Comment 7 Kevin Paetzold 2012-10-19 19:39:31 UTC
This exact error seemed to occur to me this AM on 3.6.2-4.fc17.x86_64
That is of course not the first time it has happened.  
In my case I seem to hit this whenever attempting to save a .ods file (in libreoffice calc) which is mounted via CIFS.

Comment 8 Jeff Layton 2012-10-19 19:47:09 UTC
Could any of you test the patch in comment #4 and let me know if it helps?

Comment 9 Kevin Paetzold 2012-10-19 20:18:04 UTC
i have not built a linux kernel in > 10 years so it would probably take me a little while to do that unless you could build it for me.

Comment 10 Vit Zahradka 2012-10-20 15:35:59 UTC
(In reply to comment #8)
> Could any of you test the patch in comment #4 and let me know if it helps?

Thanks for your immediate reaction. I tested the patch and the reported problem does not appear anymore. Good work!

Comment 11 Jeff Layton 2012-10-21 00:39:51 UTC
Thanks for testing it. I'll send it upstream ASAP!

Comment 12 Josh Boyer 2012-10-22 19:12:17 UTC
*** Bug 867954 has been marked as a duplicate of this bug. ***

Comment 13 g. artim 2012-10-22 19:28:27 UTC
when is the eta on pushing this out via yum? thanks, got 10 desktops borked..

Comment 14 Jeff Layton 2012-10-22 19:39:34 UTC
Josh, can you apply the patch that's attached to this email to current 3.6 kernels? Apparently a number of people are hitting this.

It should go into stable kernels eventually, but it's not in mainline yet and I'm not sure when Steve F will get around to putting it there.

Comment 15 Josh Boyer 2012-10-22 19:48:42 UTC
Applied to F16-F18.  Thanks Jeff!

Comment 16 g. artim 2012-10-23 02:21:07 UTC
does this mean the updates via yum will show up now? or is there a way for me to know the patch is on, ie patchlist for the kernel or cifs modules.

Comment 17 Jeff Layton 2012-10-23 10:43:06 UTC
It's not in yet. It'll take some time to get the patch into an actual build and then it'll need to sit in updates-testing for a bit.

If you need it sooner, you'll need to build your own kernel. Alternately you can revert to something pre-3.6.

Comment 18 Marcus Moeller 2012-10-23 14:33:58 UTC
I have seen the fs/dcache.c:2138 error under several other circumstances. The latest kernel update you pushed let's me start eclipse correctly a few times, but the error is not yet gone completly.

Comment 19 Marcus Moeller 2012-10-23 14:50:02 UTC
I was able to reproduce this by deleting .eclipse before starting it again.

Comment 20 Fedora Update System 2012-10-23 21:58:10 UTC
kernel-3.6.3-3.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.6.3-3.fc18

Comment 21 Fedora Update System 2012-10-24 16:15:58 UTC
Package kernel-3.6.3-3.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.6.3-3.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-16787/kernel-3.6.3-3.fc18
then log in and leave karma (feedback).

Comment 22 Marcus Moeller 2012-10-24 16:53:42 UTC
Sorry, noticed that kernel-3.6.3-1.fc17 does not even seem to contain the patch. Could we please prepare a F17 build for that, too?

Comment 23 Josh Boyer 2012-10-24 17:06:38 UTC
(In reply to comment #22)
> Sorry, noticed that kernel-3.6.3-1.fc17 does not even seem to contain the
> patch. Could we please prepare a F17 build for that, too?

The patch is in Fedora git.  It will be included in the next F17 build.

Comment 24 Marcus Moeller 2012-10-24 17:09:33 UTC
Thanks Josh.

Comment 25 Marcus Moeller 2012-10-26 11:21:38 UTC
@Jeff The patch seems to have introduced a new strange behaviour.

We are mounting our user homes using CIFS and autofs.

During login .bashrc does not seem to be read correctly, so PS1 is displayed as -bash-4.2$

When doing a

ls -al .bashrc && bash

PS1 is displayed correctly, again.

This does not occur on a NFS mounted home.

Comment 26 Sachin Prabhu 2012-10-26 20:12:32 UTC
(In reply to comment #4)
> Created attachment 629053 [details]
> patch -- don't call cifs_lookup on hashed, negative dentry
> 
> Vit, does this patch fix it?

Jeff,

This patch is wrong. 
The dentry used is not associated with a d_inode at this stage in the atomic_open process.

lookup_open is called on the last patch component of an open file.
We first lookup the dcache and if we do find a positive dentry, we return the dentry and let the f_op->open complete the open path.
The next stage in case we intend on opening the file after the lookup is to use atomic_open to open the file.
At this stage we do not have an inode associated with the dentry.

static int lookup_open(struct nameidata *nd, struct path *path,
                        struct file *file,
                        const struct open_flags *op,
                        bool got_write, int *opened)
{
..
       dentry = lookup_dcache(&nd->last, dir, nd->flags, &need_lookup);
        if (IS_ERR(dentry))
                return PTR_ERR(dentry);

        /* Cached positive dentry: will open in f_op->open */
        if (!need_lookup && dentry->d_inode)
                goto out_no_open;

        if ((nd->flags & LOOKUP_OPEN) && dir_inode->i_op->atomic_open) {
                return atomic_open(nd, dentry, path, file, op, got_write,
                                   need_lookup, opened);
..
}

In atomic_open, we do an explicit check to ensure that there is no inode associated with the dentry.
At this stage we have fresh dentry which needs to be initialised.
static int atomic_open(struct nameidata *nd, struct dentry *dentry,
                        struct path *path, struct file *file,
                        const struct open_flags *op,
                        bool got_write, bool need_lookup,
                        int *opened)
{
..
        BUG_ON(dentry->d_inode);
..
        file->f_path.dentry = DENTRY_NOT_SET;
        file->f_path.mnt = nd->path.mnt;
        error = dir->i_op->atomic_open(dir, dentry, file, open_flag, mode,
                                      opened);
..
}

Here we will always hit the if condition since the inode is not set in this code path. We will always fail with an ENOENT in this code path.

int
cifs_atomic_open(struct inode *inode, struct dentry *direntry,
                 struct file *file, unsigned oflags, umode_t mode,
                 int *opened)
{
..
        if (!(oflags & O_CREAT)) {
                struct dentry *res;   	
   		if (!direntry->d_inode)
   			return -ENOENT;
                *res = cifs_lookup(inode, direntry, 0);
                if (IS_ERR(res))
                        return PTR_ERR(res);

                return finish_no_open(file, res);
        }
..
}

This will however work when we do a lookup on the file and then attempt to open the file as done by Marcus in the last comment since the lookup would have populated the dcache with the required values. We find this cached value in lookup_open which would then follow a different code path and instead use f_ops->open to open the file instead of using cifs_atomic_open.

So NAK on the patch.

I am continuing investigation on the original bug report.

Comment 27 Sachin Prabhu 2012-10-29 16:31:42 UTC
I was able to reproduce this problem easily with 2 terminals open on the client. 

To reproduce
1) Mount a cifs share on the client
# mount -t cifs -o <options> //share/export /mnt

2) Open a terminal and run the following command
# while :;do  cat /mnt/b >/dev/null 2>&1; done

3) On another terminal, run the following command
# while :; do touch /mnt/b; rm /mnt/b; done

The client crashes with 

[   95.475073] ------------[ cut here ]------------
[   95.476973] kernel BUG at fs/dcache.c:2138!
[   95.478611] invalid opcode: 0000 [#1] SMP
[   95.480230] Modules linked in: md4 nls_utf8 cifs dns_resolver fscache lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables virtio_net virtio_balloon i2c_piix4 microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core
[   95.491625] CPU 1
[   95.492565] Pid: 4757, comm: cat Not tainted 3.7.0-0.rc1.git1.1.sprabhu.fc17.x86_64 #1 Red Hat KVM
[   95.497119] RIP: 0010:[<ffffffff811a8b3f>]  [<ffffffff811a8b3f>] __d_rehash+0x5f/0x70
[   95.499711] RSP: 0018:ffff880036273b58  EFLAGS: 00010286
[   95.502549] RAX: 000000000001f29b RBX: ffff88003cfc5840 RCX: 0000000000000011
[   95.504972] RDX: ffffc90000000000 RSI: ffffc900000f94d8 RDI: ffff88003cfc5840
[   95.507366] RBP: ffff880036273b58 R08: 0000000000016a40 R09: ffff88003699f400
[   95.509785] R10: ffff88003db30508 R11: ffff8800368cf000 R12: ffff8800367f1400
[   95.512523] R13: 0000000000000000 R14: 00000000fffffffe R15: 000000000000000d
[   95.515224] FS:  00007fd6f90d2740(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[   95.517673] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   95.519393] CR2: 0000000000430000 CR3: 000000003c9ea000 CR4: 00000000000006e0
[   95.521938] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   95.525328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   95.528883] Process cat (pid: 4757, threadinfo ffff880036272000, task ffff8800368b1720)
[   95.532794] Stack:
[   95.533590]  ffff880036273b68 ffffffff811a8b86 ffff880036273b88 ffffffff811a8bb2
[   95.535976]  ffff88003cfc5840 0000000000000000 ffff880036273be8 ffffffffa016b793
[   95.538735]  ffff88003db30508 ffff880036a74a00 ffff88003a062680 0000000000000000
[   95.541496] Call Trace:
[   95.542410]  [<ffffffff811a8b86>] _d_rehash+0x36/0x40
[   95.544133]  [<ffffffff811a8bb2>] d_rehash+0x22/0x30
[   95.546104]  [<ffffffffa016b793>] cifs_lookup+0x263/0x360 [cifs]
[   95.548148]  [<ffffffffa016b8ea>] cifs_atomic_open+0x5a/0x300 [cifs]
[   95.550398]  [<ffffffff811a9d05>] ? d_lookup+0x35/0x60
[   95.552317]  [<ffffffff8119c6c0>] ? lookup_dcache+0x80/0xd0
[   95.554213]  [<ffffffff811a1129>] do_last+0x4a9/0xe00
[   95.556139]  [<ffffffff8119dfc8>] ? inode_permission+0x18/0x50
[   95.558675]  [<ffffffff811741b0>] ? alloc_pages_current+0xb0/0x120

Comment 28 Sachin Prabhu 2012-10-30 16:07:42 UTC
Another requirement which is needed for the reproducer above to work is that the user doesn't have permission to write onto that directory. I seem to have blundered onto the reproducer above by mistake. 

Given below is another reproducer which can also be run against a share you can write to.

To reproduce:

1) Mount a cifs share on the client
# mount -t cifs -o <options> //share/export /mnt

2) Open a terminal and run the following command
# while :;do  cat /mnt/b >/dev/null 2>&1; done

3) On another terminal and run the same command from step 2.
# while :;do  cat /mnt/b >/dev/null 2>&1; done

It will crash in a similar manner.

The problem occurs because lookup_open can pass a dentry which is a hashed negative lookup dentry. When we open the file with the O_CREAT flag, cifs_atomic_open() attempts to do a lookup on that dentry. It doesn't expect to be passed a hashed dentry and so ends up crashing. When we have a hashed negative lookup dentry in cifs_atomic_lookup() we have already revalidated the dentry and found it to be fine. We should return a -ENOENT in that case.

Comment 29 Sachin Prabhu 2012-10-30 16:18:59 UTC
Created attachment 635685 [details]
cifs: Do not lookup hashed negative dentry in  cifs_atomic_open

cifs: Do not lookup hashed negative dentry in  cifs_atomic_open

We do not need to lookup a hashed negative directory since we have
already revalidated it before and have found it to be fine.

This also prevents a crash in cifs_lookup() when it attempts to rehash
the already hashed negative lookup dentry.

Signed-off-by: Sachin Prabhu <sprabhu>

Comment 30 Sachin Prabhu 2012-10-30 16:27:45 UTC
In the reproducer in c#28, the file /mnt/b must not actually exist to be able to reproduce this issue.

Comment 31 Josh Boyer 2012-10-30 17:08:08 UTC
(In reply to comment #29)
> Created attachment 635685 [details]
> cifs: Do not lookup hashed negative dentry in  cifs_atomic_open

Pushed this to F16-F18 (just updated the existing patch).

Comment 32 Marcus Moeller 2012-10-31 13:08:45 UTC
Is there a new build scheduled already?

Comment 33 Justin M. Forbes 2012-10-31 13:57:19 UTC
3.6.5 is supposed to release this evening, so I would expect a build in the morning most likely.

Comment 34 Josh Boyer 2012-11-01 11:35:11 UTC
*** Bug 872090 has been marked as a duplicate of this bug. ***

Comment 35 Fedora Update System 2012-11-01 13:24:03 UTC
kernel-3.6.5-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.6.5-2.fc16

Comment 36 Fedora Update System 2012-11-02 03:22:25 UTC
Package kernel-3.6.5-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.6.5-2.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-17479/kernel-3.6.5-2.fc16
then log in and leave karma (feedback).

Comment 37 Josh Boyer 2012-11-05 13:39:51 UTC
*** Bug 872794 has been marked as a duplicate of this bug. ***

Comment 38 Josh Boyer 2012-11-12 20:57:44 UTC
This should be fixed at this point.  Bodhi is having trouble closing bugs.