Bug 611938

Summary:	[RHEL5u3] System panic at sunrpc xprt_autoclose()
Product:	Red Hat Enterprise Linux 5	Reporter:	Flavio Leitner <fleitner>
Component:	kernel	Assignee:	Jeff Layton <jlayton>
Status:	CLOSED ERRATA	QA Contact:	Network QE <network-qe>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.3	CC:	apatrick, bfields, dtian, dwysocha, jonstanley, kzhang, moshiro, shsch21, steved, tao, trond.myklebust, yugzhang
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-01-13 21:41:12 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Flavio Leitner 2010-07-06 21:52:08 UTC

Description of problem:

Version-Release number of selected component:
Red Hat Enterprise Linux Version Number: RHEL5
Release Number                         : 5.3
Architecture                           : x86_64
Kernel Version                         : 2.6.18-128.2.1

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[<0000000000000000>] _stext+0x7ffff000/0x1000
PGD 0
Oops: 0010 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 1
Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables ipv6 xfrm_nalgo crypto_api autofs4 hidp l2cap bluetooth nfs lockd fscache nfs_acl vsock(U) vmmemctl(U) pvscsi(U) acpiphp sunrpc dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp sg floppy parport_pc i2c_piix4 i2c_core serio_raw ide_cd parport cdrom vmci(U) e1000 shpchp pcspkr vmxnet(U) dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 3035, comm: rpciod/1 Tainted: G      2.6.18-128.2.1.el5 #1
RIP: 0010:[<0000000000000000>]  [<0000000000000000>] _stext+0x7ffff000/0x1000
RSP: 0018:ffff810073d05e18  EFLAGS: 00010246
RAX: ffff81002cbce4c0 RBX: ffff81007e42f400 RCX: ffff81007e42f400
RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff81007e42f400
RBP: ffff81007e42f6a0 R08: ffff81000000cb00 R09: ffff81007e42f400
R10: ffff8100656d15b0 R11: ffffffff80213e54 R12: ffff81007e42f738
R13: 0000000000000282 R14: ffff81007e42f400 R15: ffffffff883649ca
FS:  0000000000000000(0000) GS:ffff81000a6c27c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000742ee000 CR4: 00000000000006e0
Process rpciod/1 (pid: 3035, threadinfo ffff810073d04000, task ffff81007fdea040)
Stack:  ffffffff883649fe ffff81001c1c6080 ffff81007e42f698 ffff81007c6f9d40
ffffffff8004d1c5 ffff810073d05e80 ffff81007c6f9d40 ffffffff80049a3f
ffff81007b1c9d98 0000000000000282 ffff81007b1c9d88 ffffffff8009d989
Call Trace:
[<ffffffff883649fe>] :sunrpc:xprt_autoclose+0x34/0x40
[<ffffffff8004d1c5>] run_workqueue+0x94/0xe4
[<ffffffff80049a3f>] worker_thread+0x0/0x122
[<ffffffff8009d989>] keventd_create_kthread+0x0/0xc4
[<ffffffff80049b2f>] worker_thread+0xf0/0x122
[<ffffffff8008a4ad>] default_wake_function+0x0/0xe
[<ffffffff8009d989>] keventd_create_kthread+0x0/0xc4
[<ffffffff8009d989>] keventd_create_kthread+0x0/0xc4
[<ffffffff800323b8>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d989>] keventd_create_kthread+0x0/0xc4
[<ffffffff800322ba>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11


crash-4.1.1> dis xprt_autoclose
0xffffffff883649ca <xprt_autoclose>:    push   %r12
0xffffffff883649cc <xprt_autoclose+2>:  push   %rbx
0xffffffff883649cd <xprt_autoclose+3>:  mov    %rdi,%rbx        : RBX: ffff81007e42f400 rpc_xprt
0xffffffff883649d0 <xprt_autoclose+6>:  lea    0x338(%rbx),%r12 : offset of rpc_xprt->transport_lock
0xffffffff883649d7 <xprt_autoclose+13>: sub    $0x8,%rsp
0xffffffff883649db <xprt_autoclose+17>: callq  0xffffffff88364295 <xprt_disconnect>
0xffffffff883649e0 <xprt_autoclose+22>: mov    0x8(%rbx),%rax
0xffffffff883649e4 <xprt_autoclose+26>: mov    %rbx,%rdi
0xffffffff883649e7 <xprt_autoclose+29>: callq  *0x58(%rax)
0xffffffff883649ea <xprt_autoclose+32>: mov    %r12,%rdi
0xffffffff883649ed <xprt_autoclose+35>: callq  0xffffffff80064bd8 <_spin_lock_bh>
0xffffffff883649f2 <xprt_autoclose+40>: mov    0x8(%rbx),%rax
0xffffffff883649f6 <xprt_autoclose+44>: mov    %rbx,%rdi
0xffffffff883649f9 <xprt_autoclose+47>: xor    %esi,%esi    : RAX: ffff81002cbce4c0 : rpc_xprt_ops
0xffffffff883649fb <xprt_autoclose+49>: callq  *0x10(%rax)  : offset of ops->release_xprt
0xffffffff883649fe <xprt_autoclose+52>: pop    %rbx         : panic


482 static void xprt_autoclose(void *args)
483 {
484         struct rpc_xprt *xprt = (struct rpc_xprt *)args;
485
486         xprt_disconnect(xprt);
487         xprt->ops->close(xprt);
488         xprt_release_write(xprt, NULL);  <= here
:
266 static inline void xprt_release_write(struct rpc_xprt *xprt, struct rpc_task *task)
267 {
268         spin_lock_bh(&xprt->transport_lock);
269         xprt->ops->release_xprt(xprt, task); <= here

crash> struct -o rpc_xprt_ops
struct rpc_xprt_ops {
  [0] void (*set_buffer_size)(struct rpc_xprt *, size_t, size_t);
  [8] int (*reserve_xprt)(struct rpc_task *);
 [16] void (*release_xprt)(struct rpc_xprt *, struct rpc_task *); <= here

crash> rd ffff81002cbce4c0 30
ffff81002cbce4c0:  0000000000000000 0c8018ac00000002   ................
ffff81002cbce4d0:  0000000000000000 0000000000000000   ................
                   ^^^^^^^^^^^^^^^^
rpc_xprt_ops structure is invalid.

crash> rpc_xprt ffff81007e42f400
struct rpc_xprt {
kref = {
  refcount = {
    counter = 0
  }
},
ops = 0xffff81002cbce4c0,
sock = 0x35fe6,
inet = 0x0,
timeout = {
  to_initval = 0,
  to_maxval = 0,
  to_increment = 4,
  to_retries = 2150564641,
  to_exponential = 255 '<FF>'
},

And, rpc_xprt is invalid.

PID: 3035   CPU: 1   COMMAND: "rpciod/1" | PID: 2710   CPU: 0   COMMAND: "vsftpd"
#0 [] crash_kexec at ffffffff800aaa4b   |  #0 [] schedule at ffffffff80063036
#1 [] __die at ffffffff8006520f         |  #1 [] rpc_wait_bit_interruptible at ffffffff88367420
#2 [] do_page_fault at ffffffff80066e21 |  #2 [] __wait_on_bit at ffffffff80063ac8
#3 [] error_exit at ffffffff8005dde9    |  #3 [] out_of_line_wait_on_bit at ffffffff80063b62
#4 [] xprt_autoclose at ffffffff883649fe|  #4 [] __rpc_execute at ffffffff883679a1
#5 [] run_workqueue at ffffffff8004d1c5 |  #5 [] rpc_call_sync at ffffffff88362aad
#6 [] worker_thread at ffffffff80049b2f |  #6 [] rpc_ping at ffffffff88362b16
#7 [] kthread at ffffffff800323b8       |  #7 [] rpc_create_client at ffffffff88363665
#8 [] kernel_thread at ffffffff8005dfb1 |  #8 [] nsm_mon_unmon at ffffffff883fa936
                                        |  #9 [] nsm_monitor at ffffffff883faa5c
                                        | #10 [] nlmclnt_proc at ffffffff883f6f9b
                                        | #11 [] do_setlk at ffffffff884150b2
                                        | #12 [] fcntl_setlk at ffffffff80039924
                                        | #13 [] sys_fcntl at ffffffff8002e2a6
                                        | #14 [] system_call at ffffffff8005d116
                                        |
 RBX: ffff81007e42f400 : rpc_xprt       |  ffff81007e42f400 : nlm_rqst
 RAX: ffff81002cbce4c0 : rpc_xprt_ops   |  ffff81002cbce4c0 : nlm_host

rpc_xprt is invalid. but, I think that nlm_rqst and nlm_rqst is valid.

crash> nlm_rqst ffff81007e42f400
struct nlm_rqst {
 a_flags = 0,
 a_host = 0xffff81002cbce4c0,

crash> nlm_host ffff81002cbce4c0
struct nlm_host {
 h_next = 0x0,
:
   sin_addr = {
     s_addr = 209721516 => 0xc8018ac => c.80.18.ac => 12.128.24.172 => 172.24.128.12
:
 h_name = "172.24.128.12\000\000\000\000\000\000",
 h_version = 4,
 h_proto = 6,

I think that vsftpd uses address 0xffff81007e42f400.

152 nlmclnt_proc(struct inode *inode, int cmd, struct file_lock *fl)
154         struct nlm_host         *host;
155         struct nlm_rqst         *call; <= here
:
169         host = nlmclnt_lookup_host(NFS_ADDR(inode), proto, vers);
170         if (host == NULL)
171                 return -ENOLCK;
172
173         call = nlm_alloc_call(host);  =>  kzalloc()
174         if (call == NULL)
175                 return -ENOMEM;

Comment 1 Flavio Leitner 2010-07-06 21:56:17 UTC

My first thought here is that the rpc_xprt was allocated and then freed, but a work queue was left scheduled which would call ->xprt_autoclose() later. In the meanwhile, the memory has been reallocated for another purpose, so when the work queue was scheduled, the rpc_xprt seems corrupted.

I looked over the changelog and didn't see any recent change (-128.el5 or newer) that could be introducing a problem like that.

But I need to read the code carefully before say anything more concrete.

Comment 3 Jeff Layton 2010-07-19 16:31:10 UTC

I basically agree with Flavio's assessment.

What prevents the xprt from being torn down while there's still a task_cleanup job in the workqueue? I don't see anything that does right offhand. The task_cleanup job doesn't hold a reference to the xprt or anything.

There may be some magic with the xprt->state handling that is supposed to prevent this, but it's not clear to me. Steve, Bruce -- any insight?

Comment 4 Jeff Layton 2010-07-19 16:46:37 UTC

The other thing that might be interesting is to know whether the rpc_clnt that points to the rpc_xprt has also been freed.

Comment 5 Flavio Leitner 2010-07-28 15:14:45 UTC

Hi Jeff,

Unfortunately, the stack trace there didn't show rpc_clnt because the work
queue received a pointer to struct rpc_xprt * directly. So, I have searched
the kernel memory space looking for the rpc_xprt pointer and subtracted the
offset to rpc_clnt.cl_xprt. Here is the output:

crash> size -o rpc_clnt.cl_xprt
struct rpc_clnt {
    [0x8] struct rpc_xprt *cl_xprt;
}

      [1]                [2]      [3]
ffff810073d05bb8 ffff810073d05bb0  X
ffff810073d05bd0 ffff810073d05bc8  X
ffff810073d05be8 ffff810073d05be0  X
ffff810073d05d50 ffff810073d05d48  ~
ffff810073d05d70 ffff810073d05d68  X
ffff810073d05d90 ffff810073d05d88  X
ffff810073d05da8 ffff810073d05da0  X
ffff810073d05dc0 ffff810073d05db8  X
ffff810073d05dd8 ffff810073d05dd0  ~
ffff810073d05e10 ffff810073d05e08  ~


The column [1] are the addresses containing the struct rpc_xprt * pointer
value. The column [2] are the address of [1] minus 0x8 and the column [3]
means what I think about the contents. X: not a match, ~ could be, but unlikely.

The vmcore is available for you at:
megatron.gsslab.rdu.redhat.com:/cores/20100702001907/work

As far as I could see, there is no valid rpc_clnt using the rpc_xprt pointer.

fbl

Comment 6 Jeff Layton 2010-07-28 18:31:00 UTC

In looking over this code, I don't see anything that prevents the xprt from being freed before the task_cleanup workqueue job runs. It seems like the xprt_autoclose workqueue job ought to hold a reference to the xprt until it completes.

The XPRT_LOCKED bit is set over the duration of the job, so it's possible that that is supposed to prevent this somehow. It's not completely clear to me if so.

Comment 7 Jeff Layton 2010-07-28 18:49:21 UTC

Bruce pointed out this patch, that might be the culprit:

commit 66af1e558538137080615e7ad6d1f2f80862de01
Author: Trond Myklebust <Trond.Myklebust>
Date:   Tue Nov 6 10:18:36 2007 -0500

    SUNRPC: Fix a race in xs_tcp_state_change()
    
    When scheduling the autoclose RPC call, we want to ensure that we don't
    race against the test_bit() call in xprt_clear_locked().
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust>

....it's hard to be sure though from the core. How reproducible is this problem?

Comment 8 Jeff Layton 2010-07-28 18:50:35 UTC

...and even with that patch, I don't see anything that prevents the xprt from being freed while xprt_autoclose is queued to the workqueue.

Comment 9 Flavio Leitner 2010-07-28 19:15:43 UTC

This is a tricky one, indeed.

Version-Release number of selected component:
 Red Hat Enterprise Linux Version Number: RHEL5
 Release Number                         : 5.3
 Architecture                           : x86_64
 Kernel Version                         : 2.6.18-128.2.1
 Related Package Version                : none
 Related Middleware / Application       : none

Drivers or hardware or architecture dependency:
 none

How reproducible:
 Unknown

Step to Reproduce:
  Unknown

Actual Results:
 The system panicked.

Expected Results:
 The system doesn't panic.

Hardware configuration:

 Model       : PRIMERGY BX920S1
 CPU Info    : Intel(R) Xeon(R) CPU 2.93 GHz
 Memory Info : 2GB

Comment 10 J. Bruce Fields 2010-07-28 19:27:35 UTC

I wonder whether we need something like

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index dcd0132..2a1f664 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1129,6 +1129,7 @@ static void xprt_destroy(struct kref *kref)
 	rpc_destroy_wait_queue(&xprt->sending);
 	rpc_destroy_wait_queue(&xprt->resend);
 	rpc_destroy_wait_queue(&xprt->backlog);
+	cancel_work_sync(&xprt->task_cleanup);
 	/*
 	 * Tear down transport state and free the rpc_xprt
 	 */

(totally untested)?

I don't see what prevents task_cleanup running after xprt destruction in the latest upstream.

Comment 11 Jeff Layton 2010-07-28 19:34:24 UTC

Adding Trond to the cc list to get his opinion...

Trond this is older RHEL-5 code, but it seems upstream is similar in this regard.

The question can be summarized as "what prevents an xprt from being freed before the task_cleanup job can run?". It seems like the task_cleanup job ought to hold an xprt reference, or xs_destroy (or something along those lines) should be canceling the task_cleanup job (as Bruce's patch suggests).

wrt Bruce's patch -- yeah, that seems reasonable at first glance. I was thinking it might be better though in xs_destroy since we cancel the connect_worker there already.

Thoughts?

Comment 12 Jeff Layton 2010-07-28 19:35:52 UTC

Oops, missed trond on the cc list. Trond, the summary is in comment 11 on this bug. Any help would be appreciated.

Comment 13 J. Bruce Fields 2010-07-30 15:46:50 UTC

xs_destroy is socket-specific and task_cleanup is an xprt thing, so I think it should go in the xprt cleanup somewhere, assuming cancelling is the right thing to do.

Comment 15 Jeff Layton 2010-08-05 13:01:45 UTC

Bruce sent a patch to Trond upstream and he has taken it in for 2.6.36:

http://www.spinics.net/lists/netdev/msg137026.html

Comment 16 RHEL Program Management 2010-08-05 13:09:11 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 21 Jarod Wilson 2010-09-10 21:40:12 UTC

in kernel-2.6.18-219.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 28 errata-xmlrpc 2011-01-13 21:41:12 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Comment 29 Jeff Layton 2011-08-10 11:45:01 UTC

*** Bug 585594 has been marked as a duplicate of this bug. ***

Comment 30 Jeff Layton 2011-09-19 10:41:17 UTC

*** Bug 709505 has been marked as a duplicate of this bug. ***

Comment 31 Jeff Layton 2013-06-07 23:54:35 UTC

*** Bug 915757 has been marked as a duplicate of this bug. ***