Bug 1226052 - NFS: kernel crashed with "kernel BUG at fs/nfs/pagelist.c:387!", "NFS: Invalid unlock attempted" and "nfs_unlock_request"
Summary: NFS: kernel crashed with "kernel BUG at fs/nfs/pagelist.c:387!", "NFS: Invali...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: nfs-maint
QA Contact: Yongcheng Yang
URL:
Whiteboard:
Depends On:
Blocks: 1298243 1400499 1426820
TreeView+ depends on / blocked
 
Reported: 2015-05-28 20:58 UTC by Dave Wysochanski
Modified: 2023-08-14 12:49 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1426820 (view as bug list)
Environment:
Last Closed: 2020-12-15 07:34:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1358574 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 1400499 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Knowledge Base (Solution) 1455633 0 None None None Never

Internal Links: 1358574 1400499 1831116

Comment 2 Dave Wysochanski 2015-05-28 21:09:10 UTC
#### Detailed analysis from vmcore

* Note the system which crashed due to a kernel assertion / BUG, and this is untainted vmcore.  So unless this is something unusual such as a hardware issue it is likely a kernel bug.
~~~
crash> sys
        CPUS: 48
        DATE: Tue May  5 16:47:54 2015
      UPTIME: 18 days, 18:11:59
LOAD AVERAGE: 4.89, 3.56, 3.11
       TASKS: 1044
    NODENAME: foo.bar.com
     RELEASE: 3.10.0-229.el7.x86_64
     VERSION: #1 SMP Thu Jan 29 18:37:38 EST 2015
     MACHINE: x86_64  (2599 Mhz)
      MEMORY: 127.9 GB
       PANIC: "kernel BUG at fs/nfs/pagelist.c:387!"
crash> mod -t
no tainted modules
crash> sys -t
TAINTED_MASK: 0  
~~~

* Map the crash to code
~~~

[1621342.293360] NFS: Invalid unlock attempted
[1621342.293410] ------------[ cut here ]------------
[1621342.293458] kernel BUG at fs/nfs/pagelist.c:387!
...
[1621342.294182] CPU: 3 PID: 820 Comm: kworker/3:1 Not tainted 3.10.0-229.el7.x86_64 #1
[1621342.294240] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
[1621342.294316] Workqueue: nfsiod rpc_async_release [sunrpc]
[1621342.294361] task: ffff882024660b60 ti: ffff882024690000 task.ti: ffff882024690000
[1621342.294418] RIP: 0010:[<ffffffffa05de7d7>]  [<ffffffffa05de7d7>] nfs_unlock_request+0x47/0x50 [nfs]
...
[1621342.295325] Call Trace:
[1621342.295367]  [<ffffffffa05e28e0>] nfs_write_completion+0x90/0x150 [nfs]
[1621342.295430]  [<ffffffffa05dd62e>] nfs_pgio_release+0x2e/0x40 [nfs]
[1621342.295493]  [<ffffffffa057c87a>] rpc_free_task+0x2a/0x70 [sunrpc]
[1621342.295552]  [<ffffffffa057c8d5>] rpc_async_release+0x15/0x20 [sunrpc]
[1621342.295612]  [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
[1621342.295661]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400

include/linux/nfs_page.h
95 
96-->#define NFS_WBACK_BUSY(req)        (test_bit(PG_BUSY,&(req)->wb_flags))
97 

fs/nfs/pagelist.c                                                                                                                   
378                                                                                                                                 
379 /**                                                                                                                             
380  * nfs_unlock_request - Unlock request and wake up sleepers.                                                                    
381  * @req:                                                                                                                        
382  */                                                                                                                             
383 void nfs_unlock_request(struct nfs_page *req)                                                                                   
384 {                                                                                                                               
385     if (!NFS_WBACK_BUSY(req)) {                                                                                                 
386             printk(KERN_ERR "NFS: Invalid unlock attempted\n");                                                                 
387-->          BUG();                                                                                                              
388     }                                                                                                                           
389     smp_mb__before_clear_bit();                                                                                                 
390     clear_bit(PG_BUSY, &req->wb_flags);                                                                                         
391     smp_mb__after_clear_bit();
392     wake_up_bit(&req->wb_flags, PG_BUSY);
393 }
~~~

Now obtain the nfs_page * from the stack.
~~~
crash> dis -r ffffffffa05e28e0 | tail
0xffffffffa05e28c1 <nfs_write_completion+113>:  mov    0x10(%r12),%rbx
0xffffffffa05e28c6 <nfs_write_completion+118>:  cmp    %rbx,%r14
0xffffffffa05e28c9 <nfs_write_completion+121>:  je     0xffffffffa05e2881
0xffffffffa05e28cb <nfs_write_completion+123>:  xor    %r13d,%r13d
0xffffffffa05e28ce <nfs_write_completion+126>:  jmp    0xffffffffa05e28fa
0xffffffffa05e28d0 <nfs_write_completion+128>:  mov    %rbx,%rdi
0xffffffffa05e28d3 <nfs_write_completion+131>:  callq  0xffffffffa05e0eb0 <nfs_inode_remove_request>
0xffffffffa05e28d8 <nfs_write_completion+136>:  mov    %rbx,%rdi
0xffffffffa05e28db <nfs_write_completion+139>:  callq  0xffffffffa05de790 <nfs_unlock_request>
0xffffffffa05e28e0 <nfs_write_completion+144>:  mov    %rbx,%rdi
crash> dis nfs_unlock_request | head
0xffffffffa05de790 <nfs_unlock_request>:        nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa05de795 <nfs_unlock_request+5>:      push   %rbp
0xffffffffa05de796 <nfs_unlock_request+6>:      mov    %rsp,%rbp
0xffffffffa05de799 <nfs_unlock_request+9>:      push   %r12
0xffffffffa05de79b <nfs_unlock_request+11>:     push   %rbx
0xffffffffa05de79c <nfs_unlock_request+12>:     mov    0x40(%rdi),%rax
0xffffffffa05de7a0 <nfs_unlock_request+16>:     mov    %rdi,%rbx
0xffffffffa05de7a3 <nfs_unlock_request+19>:     test   $0x1,%al
0xffffffffa05de7a5 <nfs_unlock_request+21>:     je     0xffffffffa05de7c9
0xffffffffa05de7a7 <nfs_unlock_request+23>:     callq  0xffffffff810a0490 <__smp_mb__before_atomic>
crash> bt -f | grep nfs_write_completion
 #7 [ffff882024693d78] nfs_write_completion at ffffffffa05e28e0 [nfs]
crash> px (0xffff882024693d78 - 3*8)
$2 = 0xffff882024693d60
crash> rd 0xffff882024693d60
ffff882024693d60:  ffff8810264baa80                    ..K&....
crash> print nfs_unlock_request
$3 = {void (struct nfs_page *)} 0xffffffffa05de790
crash> kmem ffff8810264baa80
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff88103f007900 kmalloc-128              128      12564     17600    275     8k
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea0040992e80  ffff8810264ba000     0     64         62     2
  FREE / [ALLOCATED]
   ffff8810264baa80


crash> p sizeof(struct nfs_page)
$4 = 96
crash> nfs_page ffff8810264baa80
struct nfs_page {
  wb_list = {
    next = 0xffff8810264ba880, 
    prev = 0xffff8810264baa80
  }, 
  wb_page = 0x0, 
  wb_context = 0x0, 
  wb_lock_context = 0x0, 
  wb_index = 465, 
  wb_offset = 0, 
  wb_pgbase = 0, 
  wb_bytes = 313, 
  wb_kref = {
    refcount = {
      counter = 0
    }
  }, 
  wb_flags = 0, 
  wb_verf = {
    data = "\000\000\000\000\000\000\000"
  }, 
  wb_this_page = 0xffff8810264baa80, 
  wb_head = 0xffff8810264baa80
}
~~~

The nfs_page * looks like it has been freed, and clearly wb_flags = 0 hence the BUG (the PG_BUSY bit is not set, nor is any other bit).

Comment 22 RHEL Program Management 2020-12-15 07:34:32 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.