Bug 448076

Summary: memory corruption due to portmap call succeeding after parent rpc_clnt has been freed
Product: Red Hat Enterprise Linux 4 Reporter: Jeff Layton <jlayton>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: urgent    
Version: 4.7CC: dhoward, qcai, rlerch, staubach, steved, tao, vgoyal, vmayatsk
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The RPC client stores the result of a portmap call at a place in memory that can be freed and reallocated under the right circumstances. However, under some circumstances, the result of the portmap call was freed from memory too early, which may have resulted in memory corruption. With this update, reference counting has been added to the memory location where the portmap result is stored, and will only free it after it has been used.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:07:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 432867    
Bug Blocks: 457231, 458752, 461297    
Attachments:
Description Flags
patch -- prevent mem corruption due to portmap call succeeding after parent rpc_clnt has been freed none

Description Jeff Layton 2008-05-23 11:27:29 UTC
+++ This bug was initially created as a clone of Bug #432867 +++

While working on bug 254195 yesterday, I hit the following on a RHEL5 debug kernel:
q

Feb 13 15:14:26 dhcp231-179 kernel: Slab corruption: (Tainted: GF    )
start=ffff81000eebf2b0, len=512
Feb 13 15:14:26 dhcp231-179 kernel: Redzone: 0x5a2cf071/0x5a2cf071.
Feb 13 15:14:26 dhcp231-179 kernel: Last user:
[<ffffffff8830cca4>](rpc_destroy_client+0xc6/0xcc [sunrpc])
Feb 13 15:14:26 dhcp231-179 kernel: 
Feb 13 15:14:26 dhcp231-179 kernel: Call Trace:
Feb 13 15:14:26 dhcp231-179 kernel:  <IRQ>  [<ffffffff80007137>] check_poison_ob
j+0x76/0x1d1
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff880e1bc4>]
:8139too:rtl8139_poll+0x1f6/0x424
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff880e1bc4>]
:8139too:rtl8139_poll+0x1f6/0x424
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8000cae2>]
cache_alloc_debugcheck_after+0x30/0x1c1
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff80011374>]
__kmalloc_track_caller+0x11b/0x127
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8002fbbe>] __alloc_skb+0x61/0x128
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff880e1a13>]
:8139too:rtl8139_poll+0x45/0x424
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff880e1bc4>]
:8139too:rtl8139_poll+0x1f6/0x424
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8000cdcc>] net_rx_action+0xa9/0x1bb
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff80012983>] __do_softirq+0x67/0xf3
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8006839f>] _spin_unlock+0x17/0x20
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff800613d0>] call_softirq+0x1c/0x28
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff800704dc>] do_softirq+0x35/0xa0
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff80070642>] do_IRQ+0xfb/0x104
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8006ecef>] default_idle+0x0/0x69
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff80060652>] ret_from_intr+0x0/0xf
Feb 13 15:14:26 dhcp231-179 kernel:  <EOI>  [<ffffffff8006ed2b>]
default_idle+0x3c/0x69
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8006ed29>] default_idle+0x3a/0x69
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8004b278>] cpu_idle+0x9a/0xbd
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff80446824>] start_kernel+0x243/0x248
Feb 13 15:14:26 dhcp231-179 kernel:  [<ffffffff8044622f>] _sinittext+0x22f/0x236
Feb 13 15:14:26 dhcp231-179 kernel: 
Feb 13 15:14:26 dhcp231-179 kernel: 130: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00
00 6b 6b
Feb 13 15:14:26 dhcp231-179 kernel: Prev obj: start=ffff81000eebf098, len=512
Feb 13 15:14:26 dhcp231-179 kernel: Redzone: 0x170fc2a5/0x170fc2a5.
Feb 13 15:14:26 dhcp231-179 kernel: Last user:
[<ffffffff883f1905>](nlm_lookup_file+0x115/0x210 [lockd])
Feb 13 15:14:26 dhcp231-179 kernel: 000: 00 00 00 00 00 00 00 00 10 00 01 00 01
01 00 00
Feb 13 15:14:26 dhcp231-179 kernel: 010: 00 00 dd cb 1a 00 c2 4c 12 9e 00 00 00
00 00 00
Feb 13 15:14:26 dhcp231-179 kernel: Next obj: start=ffff81000eebf4c8, len=512
Feb 13 15:14:26 dhcp231-179 kernel: Redzone: 0x5a2cf071/0x5a2cf071.
Feb 13 15:14:26 dhcp231-179 kernel: Last user:
[<ffffffff80040e85>](do_execve+0x233/0x243)
Feb 13 15:14:26 dhcp231-179 kernel: 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b
Feb 13 15:14:26 dhcp231-179 kernel: 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b

It's hard to confirm since this is memory corruption, but I suspect the problem
is this:

...the patch that I have for bug 254195 calls rpc_shutdown_client on the NLM
rpc_client when taking down lockd. If however, there is a portmap call still
pending then that is *not* taken down. When the pmap call completes, it can end
up setting the cl_port on the already freed clnt. If that memory has already
been reallocated then this is a memory corruptor.

There were some patches that went upstream to fix this by making the portmap
call take a reference to the xprt (and maybe also moving the cl_port field into
that structure). That should fix this problem, so I'll plan to dig those up and
also to see if we can get a consistent reproducer somehow.

-- Additional comment from jlayton on 2008-02-14 15:42 EST --
Hmmm...except that those patches look to be fairly invasive and probably not
kabi-friendly. We may have to fix this by doing something different than how
upstream did it...


-- Additional comment from jlayton on 2008-02-15 09:57 EST --
A quick check indicates that RHEL4 has the same problem. Fixing this isn't going
to be trivial. The best method I can see is that when we spawn a new portmap
call that we alloc a small memory region with a refcounter (maybe a kref even)
and a field for the port number. The main rpc task and the portmap task will
each hold a reference. The portmap task will release the reference when it gets
the port (or an error), and the main task will release the ref when the portmap
call completes (or the task is torn down).


-- Additional comment from jlayton on 2008-04-04 14:56 EST --
Created an attachment (id=301324)
patch -- add new reference counted response container for portmap calls

First pass at a patch to fix this problem. It seems to work, but I'm not sure
I'm cleaning up everything correctly if the child RPC task exits abnormally
(times out, etc).

The patch adds a new struct that's managed with a kref. The both the parent and
child RPC client will have a reference to it. The intent is to drop this
reference after the child task copies the port number into the container from
the RPC response. The parent will drop the reference when it copies the result
from the container into the cl_port field. We'll also need to drop the
reference if the tasks are killed.

It's this last part that I'm not sure I have correct, but I think I'm pretty
close.

I've also done my best to preserve kabi, but it may need more massaging to
prevent breakage.


-- Additional comment from jlayton on 2008-05-22 07:26 EST --
Created an attachment (id=306361)
SUNRPC: prevent memory corruption from successful portmap call with dead parent
task

Updated patch. I believe this one gets the refcounting right when the task is
killed.


-- Additional comment from jlayton on 2008-05-22 11:33 EST --
Created an attachment (id=306393)
patch -- SUNRPC: prevent memory corruption from successful portmap call with
dead parent task

Oof -- the last patch was wrongwrongwrong...

This patch does the right thing. It's also smaller.


-- Additional comment from jlayton on 2008-05-22 12:35 EST --
Actually, the original patch was right. We have to do it this way (hook the put
in rpc_destroy_client), so that the child task does a put regardless of how the
RPC task exits.


-- Additional comment from jlayton on 2008-05-23 06:38 EST --
Created an attachment (id=306475)
SUNRPC: prevent memory corruption from successful portmap call with dead parent
task

Small revision. Since there can be several portmap calls in flight at the same
time, it's possible that the cl_port will be non-zero when pmap_getport_done
runs. If a portmap call succeeds, and then a following one fails we don't want
to clobber the port we got from the earlier call, so check to make sure cl_port
zero before copying the port from the container.


-- Additional comment from jlayton on 2008-05-23 06:59 EST --
Created an attachment (id=306479)
small test program to set a blocking lock on a file

To reproduce the memory corruption:

1) set up an NFS server on a machine booted to a kernel with memory poisoning
enabled. The kernel-debug packages have this enabled already. It'll also be
easier to reproduce this if the server's kernel has the patches for bug 254195,
since that helps guarantee that the parent RPC task is killed when NFS is
stopped. After starting the nfs service, wait 90s to make sure you're past the
server's grace period.

2) mount an export from this server on another host (any RHEL client will do). 


3) use the test program to set a lock on a file on the server:

    server# lockit /export/testfile
    hit return to exit

4) do a blocking lock of the same file on the client (make sure you're past the
server's lock grace period):

    client# lockit /mnt/nfs/testfile

(you'll get no output since the program is blocked waiting on the lock)

5) on the client, set up iptables rules to drop all traffic to and from the
server to simulate a network partition:

    client# iptables -A INPUT -s <server addr> -j DROP 
    client# iptables -A OUTPUT -d <server addr> -j DROP 

6) release the lock on the server (hit return), and then run:

    server# service nfs stop

...on the server to bring down nfs and kill off lockd.

7) unpartition the network:

    client# service iptables stop

...the last 2 steps need to be timed right so that the portmap calls are still
being reattempted, but the parent RPC callback is dead.

When you remove the network parition, the portmap call succeeds and the port is
copied into the parent RPC task. Since the parent RPC task should be dead at
this point however, the result is copied into a rpc_clnt struct that has
already been freed.

The results of this are not always evident right away. The corruption will not
be noticed until the memory is reused for other purposes.

Comment 2 RHEL Program Management 2008-06-05 14:22:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Jeff Layton 2008-07-22 17:25:40 UTC
Created attachment 312371 [details]
patch -- prevent mem corruption due to portmap call succeeding after parent rpc_clnt has been freed

Comment 9 Vivek Goyal 2008-07-30 19:36:30 UTC
Committed in 78.3.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 10 RHEL Program Management 2008-09-03 12:53:07 UTC
Updating PM score.

Comment 13 Jeff Layton 2009-01-28 12:32:01 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Cause: The RPC client will store the result of a portmap call in a place in memory that can be freed and reallocated in the right circumstances.
Consequence: memory corruption, random crashes
Fix: add a patch to add reference counting to the memory location where the portmap result is stored and only free it when it will no longer be used
Result: prevents memory corruption

Comment 16 Ryan Lerch 2009-02-24 03:46:41 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,4 +1 @@
-Cause: The RPC client will store the result of a portmap call in a place in memory that can be freed and reallocated in the right circumstances.
+The RPC client stores the result of a portmap call at a place in memory that can be freed and reallocated under the right circumstances. However, under some circumstances, the result of the portmap call was freed from memory too early, which may have resulted in memory corruption. With this update, reference counting has been added to the memory location where the portmap result is stored, and will only free it after it has been used.-Consequence: memory corruption, random crashes
-Fix: add a patch to add reference counting to the memory location where the portmap result is stored and only free it when it will no longer be used
-Result: prevents memory corruption

Comment 20 errata-xmlrpc 2009-05-18 19:07:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html