Bug 516742 - CIFS - crash in small_smb_init
Summary: CIFS - crash in small_smb_init
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.8
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: yanfu,wang
URL:
Whiteboard:
Depends On:
Blocks: 526974 531005 532026
TreeView+ depends on / blocked
 
Reported: 2009-08-11 11:33 UTC by Sachin Prabhu
Modified: 2018-11-14 19:54 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 526974 531005 (view as bug list)
Environment:
Last Closed: 2011-02-16 15:17:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Preliminary analysis. (9.36 KB, text/plain)
2009-08-11 20:46 UTC, Sachin Prabhu
no flags Details
patchset -- first draft (16.14 KB, patch)
2009-08-17 12:03 UTC, Jeff Layton
no flags Details | Diff
patchset -- second draft (26.28 KB, patch)
2009-08-27 13:50 UTC, Jeff Layton
no flags Details | Diff
patchset 3 -- reduce dependencies on later patches (43.84 KB, patch)
2009-08-27 14:12 UTC, Jeff Layton
no flags Details | Diff
patchset -- alternate approach (33.32 KB, patch)
2009-09-16 01:16 UTC, Jeff Layton
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0263 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update 2011-02-16 15:14:55 UTC

Description Sachin Prabhu 2009-08-11 11:33:05 UTC
A user reported a kernel crash with the following Oops message

CIFS VFS: No response for cmd 114 mid 60781
CIFS VFS: No response for cmd 114 mid 60782
CIFS VFS: No response for cmd 114 mid 60783
ide-cd: cmd 0x1e timed out
hda: irq timeout: status=0xd0 { Busy }
hda: irq timeout: error=0x00
hda: ATAPI reset complete
Unable to handle kernel paging request at 000000000000eb6c RIP:
<ffffffffa02f9b12>{:cifs:small_smb_init+120}
PML4 cc431067 PGD 9ef66067 PMD 0
Oops: 0000 [1] SMP
CPU 2
Modules linked in: cifs nls_cp932 nls_utf8 smbfs mptctl mptbase ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) appletalk(U) parport_pc lp parport autofs4 i2c_dev i2c_core hpilo(U) sunrpc 8021q ds yent
a_socket pcmcia_core cpufreq_powersave ide_dump cciss_dump scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery ac joydev md5 ipv6 uhci_hcd ohci_hcd ehci_hcd e1000 bnx2 sg qla2400 ext3 jbd
qla2xxx scsi_transport_fc cciss sd_mod scsi_mod
Pid: 25134, comm: cifsoplockd Tainted: GF     2.6.9-89.0.3.ELsmp
RIP: 0010:[<ffffffffa02f9b12>] <ffffffffa02f9b12>{:cifs:small_smb_init+120}
RSP: 0018:000001003430fdb8  EFLAGS: 00010202
RAX: 000000000000eb20 RBX: 0000000000000000 RCX: 000001003430fe98
RDX: 000001012c5824c0 RSI: 0000000000000008 RDI: 0000000000000024
RBP: 000001012c5824c0 R08: 0000000000000000 R09: 0000000000000000
R10: 00000100800233c8 R11: 0000000000040000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
FS:  0000002a95782b00(0000) GS:ffffffff80504780(0000) knlGS:00000000f7fd66c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000eb6c CR3: 0000000037ea2000 CR4: 00000000000006e0
Process cifsoplockd (pid: 25134, threadinfo 000001003430e000, task 00000100bd0627f0)
Stack: 000001003430fe98 000000082fd02cb8 000001012fa03568 000001012e3a35e8
      000001012fa38e40 000001012e766398 000001012ddce278 000001012fafcd60
      000001012e327098 000001012f62da80
Call Trace:<ffffffffa02fbf2a>{:cifs:CIFSSMBLock+125} <ffffffff8014c9c8>{keventd_create_kthread+0}
      <ffffffffa02f9997>{:cifs:cifs_oplock_thread+324} <ffffffffa02f9853>{:cifs:cifs_oplock_thread+0}
      <ffffffff8014c99f>{kthread+200} <ffffffff80110fd7>{child_rip+8}
      <ffffffff8014c9c8>{keventd_create_kthread+0} <ffffffff8014c8d7>{kthread+0}
      <ffffffff80110fcf>{child_rip+0}

Code: 83 78 4c 02 0f 84 37 02 00 00 48 8b 40 40 48 85 c0 0f 84 2a
RIP <ffffffffa02f9b12>{:cifs:small_smb_init+120} RSP <000001003430fdb8>
CR2: 000000000000eb6c

The customer environment is a complex environment using Appletalk as well as CIFS and the problem has not been reproduced outside this environment.

Comment 6 Sachin Prabhu 2009-08-11 20:46:01 UTC
Created attachment 357078 [details]
Preliminary analysis.

Preliminary analysis of the problem performed by Umeya-san.

Comment 9 Jeff Layton 2009-08-13 11:05:14 UTC
Erm...thought I did a writeup yesterday on this, but it doesn't seem to have made it to the BZ. Anyway...

Nice work on the analysis.

I don't think this is list corruption. 0xffffffffa0334bc0 is the address of the GlobalOplock_Q list head itself:

crash> p &GlobalOplock_Q
$1 = (struct list_head *) 0xffffffffa0334bc0

If you use the "list" command with the -H, then crash will know to skip that entry when it prints out the list:

crash> list -H GlobalOplock_Q
10128caf3c0
10128caff80

...I think you're probably correct about the race however. What likely happened is that an unmount raced with the oplock release, the tcon struct got torn down and reused. So I think this is probably a fairly straightforward use-after free situation.

What we probably need to do is have this code take a reference to the tcon and do a put on it when the call is complete.

The problem here is that the locking in this codepath is really awful and is probably going to need some redesign. That may take a little while.

Comment 14 Jeff Layton 2009-08-17 11:16:12 UTC
(In reply to comment #10)
> Event posted on 08-17-2009 12:20pm JST by tumeya
> 
> Jeff, another panic. I'm guessing this is reverse of what'd happened
> before, that it crashed down the line of cifs_umount but the list is
> empty. Do you think that would be correct? 
> 
> 

My suspicion here is that this is actually a more straightforward problem. DeleteTconOplockQEntries is walking the list and removing entries from it, but isn't using list_for_each_entry_safe. It needs to do that instead. The patchset I'm working on should address that problem as well.

Comment 16 Jeff Layton 2009-08-17 12:03:30 UTC
Created attachment 357640 [details]
patchset -- first draft

This is the patchset I've been testing on rawhide. It probably still needs work, but I think the basic concept is correct.

The main fix here is to have cifs_oplock_thread hold a lock over the entire period that the oplock queue entry is being used.

Comment 17 Jeff Layton 2009-08-17 12:21:03 UTC
Actually...you may be right (In reply to comment #14)
> (In reply to comment #10)
> > Event posted on 08-17-2009 12:20pm JST by tumeya
> > 
> > Jeff, another panic. I'm guessing this is reverse of what'd happened
> > before, that it crashed down the line of cifs_umount but the list is
> > empty. Do you think that would be correct? 
> > 
> > 
> 
> My suspicion here is that this is actually a more straightforward problem.
> DeleteTconOplockQEntries is walking the list and removing entries from it, but
> isn't using list_for_each_entry_safe. It needs to do that instead. The patchset
> I'm working on should address that problem as well.  

Actually, you may be correct on this. The thread drops the spinlock and reacquires it in DeleteOplockQEntry. The problem of course is that there is no guarantee that the entry will still be on the list at that point.

Either way, the patchset in comment #16 should take care of both problems.

I went ahead and sent the set upstream as an RFC. Awaiting feedback on it now.

Comment 18 Jeff Layton 2009-08-17 12:48:47 UTC
Another thing that might help them is reducing the number of umounts that occur. It looks like they're using the automounter. It may be helpful to increase the timeout before a umount occurs, or to consider statically mounting the shares out of /etc/fstab until we get this problem resolved.

Comment 22 Jeff Layton 2009-08-18 18:14:07 UTC
New patchset posted to upstream linux-cifs-client list. Awaiting feedback there now:

http://lists.samba.org/archive/linux-cifs-client/2009-August/005141.html

Comment 25 Jeff Layton 2009-08-27 13:50:40 UTC
Created attachment 358873 [details]
patchset -- second draft

This is a backport of the patchset that is most currently being discussed upstream. It's based on the my latest test kernels and has some dependencies on earlier patches.

I'm building a test kernel now.

Comment 26 Jeff Layton 2009-08-27 14:12:42 UTC
Created attachment 358879 [details]
patchset 3 -- reduce dependencies on later patches

This patchset should apply reasonably cleanly to current 4.9 development kernels. It also contains a cleanup patch to remove some unneeded conditional compilation.

Test kernel is building now.

Comment 27 Jeff Layton 2009-08-27 19:20:59 UTC
Patchset is in the test kernels on my people page:

http://people.redhat.com/jlayton

Please test them someplace non-critical and let me know how they do.

Comment 28 Jeff Layton 2009-09-16 01:16:33 UTC
Created attachment 361168 [details]
patchset -- alternate approach

This patchset is a backport of the one that's most likely to make it upstream. It changes the oplock break response job into a workqueue task.

I've added this set to my test kernels. If the customer who reported this is able to test them, that would be most helpful:

http://people.redhat.com/jlayton

Comment 30 Jeff Layton 2009-09-18 13:12:35 UTC
Oops from latest core:

----------------------[snip]-------------------------
VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
Unable to handle kernel paging request at 000076f0ffff18a8 RIP: 
<ffffffff801950c6>{iput+47}
PML4 0 
Oops: 0000 [1] SMP 
CPU 4 
Modules linked in: nls_utf8 cifs appletalk(U) parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc 8021q ds yenta_soc
ket pcmcia_core cpufreq_powersave ide_dump cciss_dump scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery 
ac joydev md5 ipv6 uhci_hcd ohci_hcd ehci_hcd e1000e e1000 bnx2 sg qla2400 ext3 jbd qla2xxx scsi_transport_fc cciss sd
_mod scsi_mod
Pid: 17864, comm: cifsoplockd Tainted: GF     2.6.9-89.11.EL.jtltest.69smp
RIP: 0010:[<ffffffff801950c6>] <ffffffff801950c6>{iput+47}
RSP: 0018:0000010063153ee8  EFLAGS: 00010202
RAX: 000076f0ffff1880 RBX: 0000010113f1c078 RCX: 0000000000000000
RDX: 0000000000001e00 RSI: 0000000000000002 RDI: 0000010113f1c078
RBP: 0000010113f1c078 R08: 0000000000000002 R09: 0000010063153d00
R10: 000001007ec90d10 R11: 0000000000000256 R12: 000001011a43e440
R13: 0000010113f1c038 R14: 00000100cfdefbc0 R15: ffffffff8014c9b0
FS:  0000002a95782b00(0000) GS:ffffffff80504f80(0000) knlGS:00000000f7e2c6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000076f0ffff18a8 CR3: 00000000cfe32000 CR4: 00000000000006e0
Process cifsoplockd (pid: 17864, threadinfo 0000010063152000, task 00000100757f7030)
Stack: 0000000000000000 ffffffffa02adab1 0000000000000000 ffffffffa02ad993 
       00000101219c1e08 00000000fffffffc 00000101219c1df8 ffffffff8014c987 
       ffffffffffffffff 00000101219c1df8 
Call Trace:<ffffffffa02adab1>{:cifs:cifs_oplock_thread+286} <ffffffffa02ad993>{:cifs:cifs_oplock_thread+0} 
       <ffffffff8014c987>{kthread+200} <ffffffff80110fd7>{child_rip+8} 
       <ffffffff8014c9b0>{keventd_create_kthread+0} <ffffffff8014c8bf>{kthread+0} 
       <ffffffff80110fcf>{child_rip+0} 

Code: 48 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 8d 7b 38 48 c7 
RIP <ffffffff801950c6>{iput+47} RSP <0000010063153ee8>
CR2: 000076f0ffff18a8
----------------------[snip]-------------------------

The "Busy inodes after umount" message means that the superblock was freed while there was still an active inode reference. I'm going to go out on a limb and assume that the active inode reference is due to a queued oplock callback.

The crash occurred when we tried to put the inode reference after attempting to write back the inode. iput was called and that referenced the inode's superblock. The superblock had been freed however and the memory was likely recycled.

Note that this crash is from an earlier kernel with an earlier version of this set, but I don't think the latest set would have prevented this panic either.

I think the right solution is probably to have the oplock break callbacks also take a vfsmount reference. That should prevent the superblock from being freed until after its work completes.

I'll need to respin the upstream patchset with these changes too, so it may be a little while before I have anything new for them to test.

Comment 31 Jeff Layton 2009-09-18 13:16:21 UTC
Adding Steve French (upstream CIFS maintainer) to this BZ...

Steve, I'll probably need to respin the patchset or add a new patch on top of it to have the oplock break callbacks take a vfsmount reference.

Comment 32 Jeff Layton 2009-09-23 20:51:06 UTC
I've got some kernels on my people.redhat.com page with the latest backport of a patchset. If my understanding of the crash described in comment #30 is correct, then this should fix that as well.

http://people.redhat.com/jlayton

Please have them test these someplace non-critical if they are able and let me know how they do.

Comment 36 Jan Tluka 2009-10-26 10:16:48 UTC
@GSS

We'll need confirmation that customer will test the fix as the bug is not reproducible outside their environment.

Thanks.

Comment 39 RHEL Program Management 2009-10-26 18:59:31 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 42 Vivek Goyal 2009-10-30 18:05:05 UTC
Committed in 89.14.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 48 yanfu,wang 2011-01-17 07:59:53 UTC
hi,
How the test result from customer since QE inside RH just do code review? Can someone provide feedback from customer?

Comment 50 errata-xmlrpc 2011-02-16 15:17:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html


Note You need to log in before you can comment on or make changes to this bug.