Bug 1226052

Summary: NFS: kernel crashed with "kernel BUG at fs/nfs/pagelist.c:387!", "NFS: Invalid unlock attempted" and "nfs_unlock_request"
Product: Red Hat Enterprise Linux 7 Reporter: Dave Wysochanski <dwysocha>
Component: kernelAssignee: nfs-maint
kernel sub component: NFS QA Contact: Yongcheng Yang <yoyang>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: bcodding, eguan, jbastian, jcm, jiyin, rbergant, xzhou, yoyang
Version: 7.1Keywords: Reopened
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1426820 (view as bug list) Environment:
Last Closed: 2020-12-15 07:34:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1298243, 1400499, 1426820    

Comment 2 Dave Wysochanski 2015-05-28 21:09:10 UTC
#### Detailed analysis from vmcore

* Note the system which crashed due to a kernel assertion / BUG, and this is untainted vmcore.  So unless this is something unusual such as a hardware issue it is likely a kernel bug.
~~~
crash> sys
        CPUS: 48
        DATE: Tue May  5 16:47:54 2015
      UPTIME: 18 days, 18:11:59
LOAD AVERAGE: 4.89, 3.56, 3.11
       TASKS: 1044
    NODENAME: foo.bar.com
     RELEASE: 3.10.0-229.el7.x86_64
     VERSION: #1 SMP Thu Jan 29 18:37:38 EST 2015
     MACHINE: x86_64  (2599 Mhz)
      MEMORY: 127.9 GB
       PANIC: "kernel BUG at fs/nfs/pagelist.c:387!"
crash> mod -t
no tainted modules
crash> sys -t
TAINTED_MASK: 0  
~~~

* Map the crash to code
~~~

[1621342.293360] NFS: Invalid unlock attempted
[1621342.293410] ------------[ cut here ]------------
[1621342.293458] kernel BUG at fs/nfs/pagelist.c:387!
...
[1621342.294182] CPU: 3 PID: 820 Comm: kworker/3:1 Not tainted 3.10.0-229.el7.x86_64 #1
[1621342.294240] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
[1621342.294316] Workqueue: nfsiod rpc_async_release [sunrpc]
[1621342.294361] task: ffff882024660b60 ti: ffff882024690000 task.ti: ffff882024690000
[1621342.294418] RIP: 0010:[<ffffffffa05de7d7>]  [<ffffffffa05de7d7>] nfs_unlock_request+0x47/0x50 [nfs]
...
[1621342.295325] Call Trace:
[1621342.295367]  [<ffffffffa05e28e0>] nfs_write_completion+0x90/0x150 [nfs]
[1621342.295430]  [<ffffffffa05dd62e>] nfs_pgio_release+0x2e/0x40 [nfs]
[1621342.295493]  [<ffffffffa057c87a>] rpc_free_task+0x2a/0x70 [sunrpc]
[1621342.295552]  [<ffffffffa057c8d5>] rpc_async_release+0x15/0x20 [sunrpc]
[1621342.295612]  [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
[1621342.295661]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400

include/linux/nfs_page.h
95 
96-->#define NFS_WBACK_BUSY(req)        (test_bit(PG_BUSY,&(req)->wb_flags))
97 

fs/nfs/pagelist.c                                                                                                                   
378                                                                                                                                 
379 /**                                                                                                                             
380  * nfs_unlock_request - Unlock request and wake up sleepers.                                                                    
381  * @req:                                                                                                                        
382  */                                                                                                                             
383 void nfs_unlock_request(struct nfs_page *req)                                                                                   
384 {                                                                                                                               
385     if (!NFS_WBACK_BUSY(req)) {                                                                                                 
386             printk(KERN_ERR "NFS: Invalid unlock attempted\n");                                                                 
387-->          BUG();                                                                                                              
388     }                                                                                                                           
389     smp_mb__before_clear_bit();                                                                                                 
390     clear_bit(PG_BUSY, &req->wb_flags);                                                                                         
391     smp_mb__after_clear_bit();
392     wake_up_bit(&req->wb_flags, PG_BUSY);
393 }
~~~

Now obtain the nfs_page * from the stack.
~~~
crash> dis -r ffffffffa05e28e0 | tail
0xffffffffa05e28c1 <nfs_write_completion+113>:  mov    0x10(%r12),%rbx
0xffffffffa05e28c6 <nfs_write_completion+118>:  cmp    %rbx,%r14
0xffffffffa05e28c9 <nfs_write_completion+121>:  je     0xffffffffa05e2881
0xffffffffa05e28cb <nfs_write_completion+123>:  xor    %r13d,%r13d
0xffffffffa05e28ce <nfs_write_completion+126>:  jmp    0xffffffffa05e28fa
0xffffffffa05e28d0 <nfs_write_completion+128>:  mov    %rbx,%rdi
0xffffffffa05e28d3 <nfs_write_completion+131>:  callq  0xffffffffa05e0eb0 <nfs_inode_remove_request>
0xffffffffa05e28d8 <nfs_write_completion+136>:  mov    %rbx,%rdi
0xffffffffa05e28db <nfs_write_completion+139>:  callq  0xffffffffa05de790 <nfs_unlock_request>
0xffffffffa05e28e0 <nfs_write_completion+144>:  mov    %rbx,%rdi
crash> dis nfs_unlock_request | head
0xffffffffa05de790 <nfs_unlock_request>:        nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa05de795 <nfs_unlock_request+5>:      push   %rbp
0xffffffffa05de796 <nfs_unlock_request+6>:      mov    %rsp,%rbp
0xffffffffa05de799 <nfs_unlock_request+9>:      push   %r12
0xffffffffa05de79b <nfs_unlock_request+11>:     push   %rbx
0xffffffffa05de79c <nfs_unlock_request+12>:     mov    0x40(%rdi),%rax
0xffffffffa05de7a0 <nfs_unlock_request+16>:     mov    %rdi,%rbx
0xffffffffa05de7a3 <nfs_unlock_request+19>:     test   $0x1,%al
0xffffffffa05de7a5 <nfs_unlock_request+21>:     je     0xffffffffa05de7c9
0xffffffffa05de7a7 <nfs_unlock_request+23>:     callq  0xffffffff810a0490 <__smp_mb__before_atomic>
crash> bt -f | grep nfs_write_completion
 #7 [ffff882024693d78] nfs_write_completion at ffffffffa05e28e0 [nfs]
crash> px (0xffff882024693d78 - 3*8)
$2 = 0xffff882024693d60
crash> rd 0xffff882024693d60
ffff882024693d60:  ffff8810264baa80                    ..K&....
crash> print nfs_unlock_request
$3 = {void (struct nfs_page *)} 0xffffffffa05de790
crash> kmem ffff8810264baa80
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff88103f007900 kmalloc-128              128      12564     17600    275     8k
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea0040992e80  ffff8810264ba000     0     64         62     2
  FREE / [ALLOCATED]
   ffff8810264baa80


crash> p sizeof(struct nfs_page)
$4 = 96
crash> nfs_page ffff8810264baa80
struct nfs_page {
  wb_list = {
    next = 0xffff8810264ba880, 
    prev = 0xffff8810264baa80
  }, 
  wb_page = 0x0, 
  wb_context = 0x0, 
  wb_lock_context = 0x0, 
  wb_index = 465, 
  wb_offset = 0, 
  wb_pgbase = 0, 
  wb_bytes = 313, 
  wb_kref = {
    refcount = {
      counter = 0
    }
  }, 
  wb_flags = 0, 
  wb_verf = {
    data = "\000\000\000\000\000\000\000"
  }, 
  wb_this_page = 0xffff8810264baa80, 
  wb_head = 0xffff8810264baa80
}
~~~

The nfs_page * looks like it has been freed, and clearly wb_flags = 0 hence the BUG (the PG_BUSY bit is not set, nor is any other bit).

Comment 22 RHEL Program Management 2020-12-15 07:34:32 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.