A user reported a kernel crash with the following Oops message CIFS VFS: No response for cmd 114 mid 60781 CIFS VFS: No response for cmd 114 mid 60782 CIFS VFS: No response for cmd 114 mid 60783 ide-cd: cmd 0x1e timed out hda: irq timeout: status=0xd0 { Busy } hda: irq timeout: error=0x00 hda: ATAPI reset complete Unable to handle kernel paging request at 000000000000eb6c RIP: <ffffffffa02f9b12>{:cifs:small_smb_init+120} PML4 cc431067 PGD 9ef66067 PMD 0 Oops: 0000 [1] SMP CPU 2 Modules linked in: cifs nls_cp932 nls_utf8 smbfs mptctl mptbase ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) appletalk(U) parport_pc lp parport autofs4 i2c_dev i2c_core hpilo(U) sunrpc 8021q ds yent a_socket pcmcia_core cpufreq_powersave ide_dump cciss_dump scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery ac joydev md5 ipv6 uhci_hcd ohci_hcd ehci_hcd e1000 bnx2 sg qla2400 ext3 jbd qla2xxx scsi_transport_fc cciss sd_mod scsi_mod Pid: 25134, comm: cifsoplockd Tainted: GF 2.6.9-89.0.3.ELsmp RIP: 0010:[<ffffffffa02f9b12>] <ffffffffa02f9b12>{:cifs:small_smb_init+120} RSP: 0018:000001003430fdb8 EFLAGS: 00010202 RAX: 000000000000eb20 RBX: 0000000000000000 RCX: 000001003430fe98 RDX: 000001012c5824c0 RSI: 0000000000000008 RDI: 0000000000000024 RBP: 000001012c5824c0 R08: 0000000000000000 R09: 0000000000000000 R10: 00000100800233c8 R11: 0000000000040000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000 FS: 0000002a95782b00(0000) GS:ffffffff80504780(0000) knlGS:00000000f7fd66c0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000000eb6c CR3: 0000000037ea2000 CR4: 00000000000006e0 Process cifsoplockd (pid: 25134, threadinfo 000001003430e000, task 00000100bd0627f0) Stack: 000001003430fe98 000000082fd02cb8 000001012fa03568 000001012e3a35e8 000001012fa38e40 000001012e766398 000001012ddce278 000001012fafcd60 000001012e327098 000001012f62da80 Call Trace:<ffffffffa02fbf2a>{:cifs:CIFSSMBLock+125} <ffffffff8014c9c8>{keventd_create_kthread+0} <ffffffffa02f9997>{:cifs:cifs_oplock_thread+324} <ffffffffa02f9853>{:cifs:cifs_oplock_thread+0} <ffffffff8014c99f>{kthread+200} <ffffffff80110fd7>{child_rip+8} <ffffffff8014c9c8>{keventd_create_kthread+0} <ffffffff8014c8d7>{kthread+0} <ffffffff80110fcf>{child_rip+0} Code: 83 78 4c 02 0f 84 37 02 00 00 48 8b 40 40 48 85 c0 0f 84 2a RIP <ffffffffa02f9b12>{:cifs:small_smb_init+120} RSP <000001003430fdb8> CR2: 000000000000eb6c The customer environment is a complex environment using Appletalk as well as CIFS and the problem has not been reproduced outside this environment.
Created attachment 357078 [details] Preliminary analysis. Preliminary analysis of the problem performed by Umeya-san.
Erm...thought I did a writeup yesterday on this, but it doesn't seem to have made it to the BZ. Anyway... Nice work on the analysis. I don't think this is list corruption. 0xffffffffa0334bc0 is the address of the GlobalOplock_Q list head itself: crash> p &GlobalOplock_Q $1 = (struct list_head *) 0xffffffffa0334bc0 If you use the "list" command with the -H, then crash will know to skip that entry when it prints out the list: crash> list -H GlobalOplock_Q 10128caf3c0 10128caff80 ...I think you're probably correct about the race however. What likely happened is that an unmount raced with the oplock release, the tcon struct got torn down and reused. So I think this is probably a fairly straightforward use-after free situation. What we probably need to do is have this code take a reference to the tcon and do a put on it when the call is complete. The problem here is that the locking in this codepath is really awful and is probably going to need some redesign. That may take a little while.
(In reply to comment #10) > Event posted on 08-17-2009 12:20pm JST by tumeya > > Jeff, another panic. I'm guessing this is reverse of what'd happened > before, that it crashed down the line of cifs_umount but the list is > empty. Do you think that would be correct? > > My suspicion here is that this is actually a more straightforward problem. DeleteTconOplockQEntries is walking the list and removing entries from it, but isn't using list_for_each_entry_safe. It needs to do that instead. The patchset I'm working on should address that problem as well.
Created attachment 357640 [details] patchset -- first draft This is the patchset I've been testing on rawhide. It probably still needs work, but I think the basic concept is correct. The main fix here is to have cifs_oplock_thread hold a lock over the entire period that the oplock queue entry is being used.
Actually...you may be right (In reply to comment #14) > (In reply to comment #10) > > Event posted on 08-17-2009 12:20pm JST by tumeya > > > > Jeff, another panic. I'm guessing this is reverse of what'd happened > > before, that it crashed down the line of cifs_umount but the list is > > empty. Do you think that would be correct? > > > > > > My suspicion here is that this is actually a more straightforward problem. > DeleteTconOplockQEntries is walking the list and removing entries from it, but > isn't using list_for_each_entry_safe. It needs to do that instead. The patchset > I'm working on should address that problem as well. Actually, you may be correct on this. The thread drops the spinlock and reacquires it in DeleteOplockQEntry. The problem of course is that there is no guarantee that the entry will still be on the list at that point. Either way, the patchset in comment #16 should take care of both problems. I went ahead and sent the set upstream as an RFC. Awaiting feedback on it now.
Another thing that might help them is reducing the number of umounts that occur. It looks like they're using the automounter. It may be helpful to increase the timeout before a umount occurs, or to consider statically mounting the shares out of /etc/fstab until we get this problem resolved.
New patchset posted to upstream linux-cifs-client list. Awaiting feedback there now: http://lists.samba.org/archive/linux-cifs-client/2009-August/005141.html
Created attachment 358873 [details] patchset -- second draft This is a backport of the patchset that is most currently being discussed upstream. It's based on the my latest test kernels and has some dependencies on earlier patches. I'm building a test kernel now.
Created attachment 358879 [details] patchset 3 -- reduce dependencies on later patches This patchset should apply reasonably cleanly to current 4.9 development kernels. It also contains a cleanup patch to remove some unneeded conditional compilation. Test kernel is building now.
Patchset is in the test kernels on my people page: http://people.redhat.com/jlayton Please test them someplace non-critical and let me know how they do.
Created attachment 361168 [details] patchset -- alternate approach This patchset is a backport of the one that's most likely to make it upstream. It changes the oplock break response job into a workqueue task. I've added this set to my test kernels. If the customer who reported this is able to test them, that would be most helpful: http://people.redhat.com/jlayton
Oops from latest core: ----------------------[snip]------------------------- VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Unable to handle kernel paging request at 000076f0ffff18a8 RIP: <ffffffff801950c6>{iput+47} PML4 0 Oops: 0000 [1] SMP CPU 4 Modules linked in: nls_utf8 cifs appletalk(U) parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc 8021q ds yenta_soc ket pcmcia_core cpufreq_powersave ide_dump cciss_dump scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery ac joydev md5 ipv6 uhci_hcd ohci_hcd ehci_hcd e1000e e1000 bnx2 sg qla2400 ext3 jbd qla2xxx scsi_transport_fc cciss sd _mod scsi_mod Pid: 17864, comm: cifsoplockd Tainted: GF 2.6.9-89.11.EL.jtltest.69smp RIP: 0010:[<ffffffff801950c6>] <ffffffff801950c6>{iput+47} RSP: 0018:0000010063153ee8 EFLAGS: 00010202 RAX: 000076f0ffff1880 RBX: 0000010113f1c078 RCX: 0000000000000000 RDX: 0000000000001e00 RSI: 0000000000000002 RDI: 0000010113f1c078 RBP: 0000010113f1c078 R08: 0000000000000002 R09: 0000010063153d00 R10: 000001007ec90d10 R11: 0000000000000256 R12: 000001011a43e440 R13: 0000010113f1c038 R14: 00000100cfdefbc0 R15: ffffffff8014c9b0 FS: 0000002a95782b00(0000) GS:ffffffff80504f80(0000) knlGS:00000000f7e2c6c0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000076f0ffff18a8 CR3: 00000000cfe32000 CR4: 00000000000006e0 Process cifsoplockd (pid: 17864, threadinfo 0000010063152000, task 00000100757f7030) Stack: 0000000000000000 ffffffffa02adab1 0000000000000000 ffffffffa02ad993 00000101219c1e08 00000000fffffffc 00000101219c1df8 ffffffff8014c987 ffffffffffffffff 00000101219c1df8 Call Trace:<ffffffffa02adab1>{:cifs:cifs_oplock_thread+286} <ffffffffa02ad993>{:cifs:cifs_oplock_thread+0} <ffffffff8014c987>{kthread+200} <ffffffff80110fd7>{child_rip+8} <ffffffff8014c9b0>{keventd_create_kthread+0} <ffffffff8014c8bf>{kthread+0} <ffffffff80110fcf>{child_rip+0} Code: 48 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 8d 7b 38 48 c7 RIP <ffffffff801950c6>{iput+47} RSP <0000010063153ee8> CR2: 000076f0ffff18a8 ----------------------[snip]------------------------- The "Busy inodes after umount" message means that the superblock was freed while there was still an active inode reference. I'm going to go out on a limb and assume that the active inode reference is due to a queued oplock callback. The crash occurred when we tried to put the inode reference after attempting to write back the inode. iput was called and that referenced the inode's superblock. The superblock had been freed however and the memory was likely recycled. Note that this crash is from an earlier kernel with an earlier version of this set, but I don't think the latest set would have prevented this panic either. I think the right solution is probably to have the oplock break callbacks also take a vfsmount reference. That should prevent the superblock from being freed until after its work completes. I'll need to respin the upstream patchset with these changes too, so it may be a little while before I have anything new for them to test.
Adding Steve French (upstream CIFS maintainer) to this BZ... Steve, I'll probably need to respin the patchset or add a new patch on top of it to have the oplock break callbacks take a vfsmount reference.
I've got some kernels on my people.redhat.com page with the latest backport of a patchset. If my understanding of the crash described in comment #30 is correct, then this should fix that as well. http://people.redhat.com/jlayton Please have them test these someplace non-critical if they are able and let me know how they do.
@GSS We'll need confirmation that customer will test the fix as the bug is not reproducible outside their environment. Thanks.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 89.14.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
hi, How the test result from customer since QE inside RH just do code review? Can someone provide feedback from customer?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html