Escalated to Bugzilla from IssueTracker
Description of problem: System panic with below panic message : ------------[ cut here ]------------ kernel BUG at fs/nfs/nfs4state.c:135! invalid operand: 0000 [#1] SMP Modules linked in: loop netconsole netdump sg ppp_async ppp_generic slhc crc_ccitt mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler dell_rbu nfs lockd nfs_acl autofs4 sunrpc ipt_limit ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button battery ac ohci_hcd e1000 tg3 floppy ext3 jbd aacraid sd_mod scsi_mod CPU: 1 EIP: 0060:[<f8b46ea4>] Not tainted VLI EFLAGS: 00010202 (2.6.9-67.ELsmp) EIP is at nfs4_free_client+0x35/0x69 [nfs] eax: c305ee40 ebx: c305ee00 ecx: c305ee48 edx: c1000000 esi: c3191600 edi: f8b5c860 ebp: cc66d000 esp: cc66df34 ds: 007b es: 007b ss: 0068 Process umount (pid: 15237, threadinfo=cc66d000 task=e5781630) Stack: c3191600 f8b46ce8 c3191c00 f8b2faab c3191c40 c3191c00 c016103c 00000000 bff3ebf0 08a2f11d c0175392 d039c17c f5d9aa00 c01640ba 00000202 00000000 00000001 00000001 00000000 c01512da f6f470c4 e822fd84 c015160a b7ce8000 Call Trace: [<f8b46ce8>] destroy_nfsv4_state+0x2b/0x37 [nfs] [<f8b2faab>] nfs4_kill_super+0x3b/0x5c [nfs] [<c016103c>] deactivate_super+0x5b/0x70 [<c0175392>] sys_umount+0x65/0x6c [<c01640ba>] sys_stat64+0xf/0x23 [<c01512da>] unmap_vma_list+0xe/0x17 [<c015160a>] do_munmap+0x108/0x116 [<c01753a4>] sys_oldumount+0xb/0xe [<c02d8607>] syscall_call+0x7/0xb Code: c1 74 20 8b 41 04 8b 11 89 42 04 89 10 89 c8 c7 01 00 01 10 00 c7 41 04 00 02 20 00 e8 95 16 60 c7 eb d6 8d 43 40 39 43 40 74 08 <0f> 0b 87 00 a2 eb b4 f8 8b 43 64 85 c0 74 05 e8 0e f6 f6 ff 89 How reproducible: Random Steps to Reproduce: No real steps but we have a complete vmcore and I had attached the CAS report. Actual results: System panic at random. Expected results: System should not panic. Additional info: Your corefile is ready for you You may view it at core-i386.gsslab.rdu.redhat.com Login with kerberos name/password $ cd /cores/20080331031722/work /cores/20080331031722/work$ ./crash Sosreport attached. This event sent from IssueTracker by fleite [Support Engineering Group] issue 173813
Found a similar bugzilla: http://devresources.linux-foundation.org/dev/nfsv4/bugzilla/show_bug.cgi?id=113 One more bugzilla with similar call traces: https://bugzilla.redhat.com/show_bug.cgi?id=228292#c61 Regards, Nitin Issue escalated to Support Engineering Group by: nbansal. nbansal assigned to issue for Production Support (Pune). Internal Status set to 'Waiting on SEG' Status set to: Waiting on Tech This event sent from IssueTracker by fleite [Support Engineering Group] issue 173813
Note: Looks related to https://bugzilla.redhat.com/show_bug.cgi?id=433249 This event sent from IssueTracker by fleite [Support Engineering Group] issue 173813
Note: looks slightly related https://bugzilla.redhat.com/show_bug.cgi?id=402581 This event sent from IssueTracker by fleite [Support Engineering Group] issue 173813
I'm not very familiar with the NFSv4 structures, so I'm having a hard time navigating the vmcore on crash. I'll escalate this to Engineering to get some extra help. Issue escalated to RHEL 4 Storage by: fleite. Internal Status set to 'Waiting on Engineering' This event sent from IssueTracker by fleite [Support Engineering Group] issue 173813
Ugh...Busy inodes after umount problem. We probably won't be able to tell much from the core. Typically, the conditions that cause this situation are long gone by the time the box crashes. In this case, we crashed because of this: BUG_ON(!list_empty(&clp->cl_state_owners)); This list should be empty, but it's not (likely because we have busy inodes). Looking at the core, the nfs4_client address is in %ebx. Some interesting fields: crash> struct nfs4_client c305ee00 ... cl_state_owners = { next = 0xf7d1e880, prev = 0xf7d1e880 }, ... ...which has only one entry on the list: struct nfs4_state_owner { so_list = { next = 0xc305ee40, prev = 0xc305ee40 }, so_client = 0xc305ee00, so_id = 0x0, so_sema = { count = { counter = 0x1 }, sleepers = 0x0, wait = { lock = { lock = 0x1, magic = 0xdead4ead }, task_list = { next = 0xf7d1e8a0, prev = 0xf7d1e8a0 } } }, so_seqid = 0x500b9, so_count = { counter = 0x1 }, so_cred = 0xf7d1e900, so_states = { next = 0xf7d56600, prev = 0xf7d56600 }, so_delegations = { next = 0xf7d1e8bc, prev = 0xf7d1e8bc } } ...I'll have to work back from here and see if I can track down the inode.
For reference, I got he nfs4_state_owner like this: crash> list nfs4_state_owner.so_list -s nfs4_state_owner 0xf7d1e880 f7d1e880 ...the so_states list is non-empty: crash> list nfs4_state.open_states -s nfs4_state 0xf7d56600 f7d56600 struct nfs4_state { ... inc_open = { next = 0xf7054240, prev = 0xf7054240 }, ... inode = 0xc5dcf94c, ... } ...the inc_open list also seems to be non-empty, so it looks like an open attempt failed at some point and wasn't properly cleaned up: crash> list nfs4_inc_open.state -s nfs4_inc_open 0xf7054240 f7054240 struct nfs4_inc_open { state = { next = 0xf7d56618, prev = 0xf7d56618 }, task = 0xe5008e30, flags = 0x11 } ...the task here seems to be long gone. Some interesting stuff from the inode: i_ino = 63361 i_mode = 100644 (regular file) i_sb = 0xc3190a00, ...superblock here is also gone from the mount list.
Ok, my suspicion is that this is another manifestation of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=234587 ...with that patch I had NFSv4 track incomplete opens and clean them up if a setattr failed (i.e. O_TRUNC opens). Obviously, this doesn't seem to be sufficient. We need to make sure that we clean up these incomplete opens whenever open_namei returns an error. I think this means we need a fs-specific "open_cleanup" inode operation. For most filesystems, this would be a no-op, but for nfsv4 we'd call this. This does mean that the fix isn't confined to NFSv4, though I think we can make sure that the impact is negligible for other filesystems. The big question is how to not break kabi with this. If we add a new inode op, then we'll also have to add a flag of some sort to make sure that we don't try to dereference it on any filesystems that don't have the op.
This turns out to be rather tricky, actually... There are a lot of special cases in this codepath and we need to make sure that we hit the right ones. On a good note, the work already done by Peter Staubach and David Howells already gives us an extended inode_ops struct that nfs4 already uses. We should be able to just tack an open_cleanup function onto that struct and add a new SB flag for filesystems that define it (just nfsv4 here). The tricky part is knowing how to call the new operation. There are several reasons that open_namei can return error, and I'm not sure whether the nameidata will be properly filled out in all cases.
Created attachment 309745 [details] proposed patch -- handle nfs4 incomplete opens more comprehensively A possible patch for this problem... This uses some of the infrastructure that Peter S. added to do 64-bit inode support. It does the following: 1) renames the INO64 flags to something more generic (since the new op doesn't have anything to do with 64 bit inodes0 2) adds a new lookup_cleanup extended inode operation, and defines such an operation for nfsv4. 3) removes the place in nfs4_proc_setattr that would clean up incomplete opens 3) has open_namei() call this lookup_cleanup operation whenever the path_lookup succeeds, but it eventually returns an error. This seems to work and still fixes the original issue that the inc_open stuff is intended to fix. It probably needs more testing. If the customer is easily able to reproduce the problem the oops in this case, then having them test this patch would be helpful. I'll plan to add this to my next set of test kernels and will post here when I have them built.
I have some test kernels built with this patch (plus other ones that I have queued up for 4.8): http://people.redhat.com/jlayton/ ...if the customer is able to reproduce this problem and can test these somewhere non-critical, then that would be helpful...
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Updating PM score.
Created attachment 315673 [details] patch -- handle nfs4 incomplete opens more comprehensively Respun patch. Since I did the original patch, someone reported bug 457407. That bug demonstrated that the earlier patch still didn't cover enough. In particular, it was possible for an open to happen in the d_revalidate codepath. An incomplete open in that case would not be tracked or cleaned up. With the new lookup_cleanup operation though, we don't really need to keep track of incomplete opens in such a complicated fashion. Since we're always calling that when open_namei errors out, it's sufficient to just do an extra nfs4_close_state on the nfs4_state and not bother with the extra tracking. This patch implements that. It fixes the panic in bug 457407 and should also fix the problem here. This patch is in the test kernels on my people page if anyone wishes to test it: http://people.redhat.com/jlayton/
*** Bug 457407 has been marked as a duplicate of this bug. ***
Hello Jeff-san, Fujitsu confirmed that the patch 27-bz-446396-nfs4-handle-incomple.patch worked! Thanks! Best Regards, M Oshiro Internal Status set to 'Waiting on Engineering' Status set to: Waiting on Tech This event sent from IssueTracker by moshiro issue 191939
Created attachment 317001 [details] patch -- handle nfs4 incomplete opens more comprehensively Hopefully final patch. Add calls to do_lookup_cleanup in other codepaths that do lookups with open intents. Check for missing extended inode and file ops (since I'm using the same flag). Other than that, everything should pretty much be the same...
Created attachment 317484 [details] updated patch -- handle nfs4 incomplete opens more comprehensively Updated patch, some small changes and extra sanity checks for new undo op.
Committed in 78.28.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
*** Bug 493709 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html