Bug 867344
Summary: | [abrt]: kernel BUG at fs/dcache.c:2138! | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Vit Zahradka <vit.zahradka> | ||||||
Component: | kernel | Assignee: | Sachin Prabhu <sprabhu> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 17 | CC: | aviro, cryogenium, gansalmon, gartim, hectorguerrero1866, hollladiewaldfee, itamar, jforbes, jlayton, jonathan, kernel-maint, kevin.paetzold, korsnick, madhu.chinakonda, marcus.moeller, motoskov, sprabhu | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | abrt_hash:23a683122a84f0f780ae9c7d5a32288dbd6d02e7 | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-11-12 20:57:44 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Vit Zahradka
2012-10-17 10:59:54 UTC
*** Bug 867319 has been marked as a duplicate of this bug. *** Looks like: BUG_ON(!d_unhashed(entry)); ...and the d_rehash was called from d_add. Perhaps we should be doing something different to add this thing to the cache? I'll need to look more closely at how the atomic_open code works. cc'ing Al, in case he has any ideas while I look at this... Created attachment 629053 [details]
patch -- don't call cifs_lookup on hashed, negative dentry
Vit, does this patch fix it?
I'm just reboot the system Package: kernel Architecture: x86_64 OS Release: Fedora release 17 (Beefy Miracle) upgrade to 3.6.2-4.fc17.x86_64 Package: kernel Architecture: x86_64 OS Release: Fedora release 17 (Beefy Miracle) This exact error seemed to occur to me this AM on 3.6.2-4.fc17.x86_64 That is of course not the first time it has happened. In my case I seem to hit this whenever attempting to save a .ods file (in libreoffice calc) which is mounted via CIFS. Could any of you test the patch in comment #4 and let me know if it helps? i have not built a linux kernel in > 10 years so it would probably take me a little while to do that unless you could build it for me. (In reply to comment #8) > Could any of you test the patch in comment #4 and let me know if it helps? Thanks for your immediate reaction. I tested the patch and the reported problem does not appear anymore. Good work! Thanks for testing it. I'll send it upstream ASAP! *** Bug 867954 has been marked as a duplicate of this bug. *** when is the eta on pushing this out via yum? thanks, got 10 desktops borked.. Josh, can you apply the patch that's attached to this email to current 3.6 kernels? Apparently a number of people are hitting this. It should go into stable kernels eventually, but it's not in mainline yet and I'm not sure when Steve F will get around to putting it there. Applied to F16-F18. Thanks Jeff! does this mean the updates via yum will show up now? or is there a way for me to know the patch is on, ie patchlist for the kernel or cifs modules. It's not in yet. It'll take some time to get the patch into an actual build and then it'll need to sit in updates-testing for a bit. If you need it sooner, you'll need to build your own kernel. Alternately you can revert to something pre-3.6. I have seen the fs/dcache.c:2138 error under several other circumstances. The latest kernel update you pushed let's me start eclipse correctly a few times, but the error is not yet gone completly. I was able to reproduce this by deleting .eclipse before starting it again. kernel-3.6.3-3.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.6.3-3.fc18 Package kernel-3.6.3-3.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.6.3-3.fc18' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-16787/kernel-3.6.3-3.fc18 then log in and leave karma (feedback). Sorry, noticed that kernel-3.6.3-1.fc17 does not even seem to contain the patch. Could we please prepare a F17 build for that, too? (In reply to comment #22) > Sorry, noticed that kernel-3.6.3-1.fc17 does not even seem to contain the > patch. Could we please prepare a F17 build for that, too? The patch is in Fedora git. It will be included in the next F17 build. Thanks Josh. @Jeff The patch seems to have introduced a new strange behaviour. We are mounting our user homes using CIFS and autofs. During login .bashrc does not seem to be read correctly, so PS1 is displayed as -bash-4.2$ When doing a ls -al .bashrc && bash PS1 is displayed correctly, again. This does not occur on a NFS mounted home. (In reply to comment #4) > Created attachment 629053 [details] > patch -- don't call cifs_lookup on hashed, negative dentry > > Vit, does this patch fix it? Jeff, This patch is wrong. The dentry used is not associated with a d_inode at this stage in the atomic_open process. lookup_open is called on the last patch component of an open file. We first lookup the dcache and if we do find a positive dentry, we return the dentry and let the f_op->open complete the open path. The next stage in case we intend on opening the file after the lookup is to use atomic_open to open the file. At this stage we do not have an inode associated with the dentry. static int lookup_open(struct nameidata *nd, struct path *path, struct file *file, const struct open_flags *op, bool got_write, int *opened) { .. dentry = lookup_dcache(&nd->last, dir, nd->flags, &need_lookup); if (IS_ERR(dentry)) return PTR_ERR(dentry); /* Cached positive dentry: will open in f_op->open */ if (!need_lookup && dentry->d_inode) goto out_no_open; if ((nd->flags & LOOKUP_OPEN) && dir_inode->i_op->atomic_open) { return atomic_open(nd, dentry, path, file, op, got_write, need_lookup, opened); .. } In atomic_open, we do an explicit check to ensure that there is no inode associated with the dentry. At this stage we have fresh dentry which needs to be initialised. static int atomic_open(struct nameidata *nd, struct dentry *dentry, struct path *path, struct file *file, const struct open_flags *op, bool got_write, bool need_lookup, int *opened) { .. BUG_ON(dentry->d_inode); .. file->f_path.dentry = DENTRY_NOT_SET; file->f_path.mnt = nd->path.mnt; error = dir->i_op->atomic_open(dir, dentry, file, open_flag, mode, opened); .. } Here we will always hit the if condition since the inode is not set in this code path. We will always fail with an ENOENT in this code path. int cifs_atomic_open(struct inode *inode, struct dentry *direntry, struct file *file, unsigned oflags, umode_t mode, int *opened) { .. if (!(oflags & O_CREAT)) { struct dentry *res; if (!direntry->d_inode) return -ENOENT; *res = cifs_lookup(inode, direntry, 0); if (IS_ERR(res)) return PTR_ERR(res); return finish_no_open(file, res); } .. } This will however work when we do a lookup on the file and then attempt to open the file as done by Marcus in the last comment since the lookup would have populated the dcache with the required values. We find this cached value in lookup_open which would then follow a different code path and instead use f_ops->open to open the file instead of using cifs_atomic_open. So NAK on the patch. I am continuing investigation on the original bug report. I was able to reproduce this problem easily with 2 terminals open on the client. To reproduce 1) Mount a cifs share on the client # mount -t cifs -o <options> //share/export /mnt 2) Open a terminal and run the following command # while :;do cat /mnt/b >/dev/null 2>&1; done 3) On another terminal, run the following command # while :; do touch /mnt/b; rm /mnt/b; done The client crashes with [ 95.475073] ------------[ cut here ]------------ [ 95.476973] kernel BUG at fs/dcache.c:2138! [ 95.478611] invalid opcode: 0000 [#1] SMP [ 95.480230] Modules linked in: md4 nls_utf8 cifs dns_resolver fscache lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables virtio_net virtio_balloon i2c_piix4 microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [ 95.491625] CPU 1 [ 95.492565] Pid: 4757, comm: cat Not tainted 3.7.0-0.rc1.git1.1.sprabhu.fc17.x86_64 #1 Red Hat KVM [ 95.497119] RIP: 0010:[<ffffffff811a8b3f>] [<ffffffff811a8b3f>] __d_rehash+0x5f/0x70 [ 95.499711] RSP: 0018:ffff880036273b58 EFLAGS: 00010286 [ 95.502549] RAX: 000000000001f29b RBX: ffff88003cfc5840 RCX: 0000000000000011 [ 95.504972] RDX: ffffc90000000000 RSI: ffffc900000f94d8 RDI: ffff88003cfc5840 [ 95.507366] RBP: ffff880036273b58 R08: 0000000000016a40 R09: ffff88003699f400 [ 95.509785] R10: ffff88003db30508 R11: ffff8800368cf000 R12: ffff8800367f1400 [ 95.512523] R13: 0000000000000000 R14: 00000000fffffffe R15: 000000000000000d [ 95.515224] FS: 00007fd6f90d2740(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000 [ 95.517673] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 95.519393] CR2: 0000000000430000 CR3: 000000003c9ea000 CR4: 00000000000006e0 [ 95.521938] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 95.525328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 95.528883] Process cat (pid: 4757, threadinfo ffff880036272000, task ffff8800368b1720) [ 95.532794] Stack: [ 95.533590] ffff880036273b68 ffffffff811a8b86 ffff880036273b88 ffffffff811a8bb2 [ 95.535976] ffff88003cfc5840 0000000000000000 ffff880036273be8 ffffffffa016b793 [ 95.538735] ffff88003db30508 ffff880036a74a00 ffff88003a062680 0000000000000000 [ 95.541496] Call Trace: [ 95.542410] [<ffffffff811a8b86>] _d_rehash+0x36/0x40 [ 95.544133] [<ffffffff811a8bb2>] d_rehash+0x22/0x30 [ 95.546104] [<ffffffffa016b793>] cifs_lookup+0x263/0x360 [cifs] [ 95.548148] [<ffffffffa016b8ea>] cifs_atomic_open+0x5a/0x300 [cifs] [ 95.550398] [<ffffffff811a9d05>] ? d_lookup+0x35/0x60 [ 95.552317] [<ffffffff8119c6c0>] ? lookup_dcache+0x80/0xd0 [ 95.554213] [<ffffffff811a1129>] do_last+0x4a9/0xe00 [ 95.556139] [<ffffffff8119dfc8>] ? inode_permission+0x18/0x50 [ 95.558675] [<ffffffff811741b0>] ? alloc_pages_current+0xb0/0x120 Another requirement which is needed for the reproducer above to work is that the user doesn't have permission to write onto that directory. I seem to have blundered onto the reproducer above by mistake. Given below is another reproducer which can also be run against a share you can write to. To reproduce: 1) Mount a cifs share on the client # mount -t cifs -o <options> //share/export /mnt 2) Open a terminal and run the following command # while :;do cat /mnt/b >/dev/null 2>&1; done 3) On another terminal and run the same command from step 2. # while :;do cat /mnt/b >/dev/null 2>&1; done It will crash in a similar manner. The problem occurs because lookup_open can pass a dentry which is a hashed negative lookup dentry. When we open the file with the O_CREAT flag, cifs_atomic_open() attempts to do a lookup on that dentry. It doesn't expect to be passed a hashed dentry and so ends up crashing. When we have a hashed negative lookup dentry in cifs_atomic_lookup() we have already revalidated the dentry and found it to be fine. We should return a -ENOENT in that case. Created attachment 635685 [details]
cifs: Do not lookup hashed negative dentry in cifs_atomic_open
cifs: Do not lookup hashed negative dentry in cifs_atomic_open
We do not need to lookup a hashed negative directory since we have
already revalidated it before and have found it to be fine.
This also prevents a crash in cifs_lookup() when it attempts to rehash
the already hashed negative lookup dentry.
Signed-off-by: Sachin Prabhu <sprabhu>
In the reproducer in c#28, the file /mnt/b must not actually exist to be able to reproduce this issue. (In reply to comment #29) > Created attachment 635685 [details] > cifs: Do not lookup hashed negative dentry in cifs_atomic_open Pushed this to F16-F18 (just updated the existing patch). Is there a new build scheduled already? 3.6.5 is supposed to release this evening, so I would expect a build in the morning most likely. *** Bug 872090 has been marked as a duplicate of this bug. *** kernel-3.6.5-2.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.6.5-2.fc16 Package kernel-3.6.5-2.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.6.5-2.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-17479/kernel-3.6.5-2.fc16 then log in and leave karma (feedback). *** Bug 872794 has been marked as a duplicate of this bug. *** This should be fixed at this point. Bodhi is having trouble closing bugs. |