+++ This bug was initially created as a clone of Bug #432867 +++ While working on bug 254195 yesterday, I hit the following on a RHEL5 debug kernel: q Feb 13 15:14:26 dhcp231-179 kernel: Slab corruption: (Tainted: GF ) start=ffff81000eebf2b0, len=512 Feb 13 15:14:26 dhcp231-179 kernel: Redzone: 0x5a2cf071/0x5a2cf071. Feb 13 15:14:26 dhcp231-179 kernel: Last user: [<ffffffff8830cca4>](rpc_destroy_client+0xc6/0xcc [sunrpc]) Feb 13 15:14:26 dhcp231-179 kernel: Feb 13 15:14:26 dhcp231-179 kernel: Call Trace: Feb 13 15:14:26 dhcp231-179 kernel: <IRQ> [<ffffffff80007137>] check_poison_ob j+0x76/0x1d1 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff880e1bc4>] :8139too:rtl8139_poll+0x1f6/0x424 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff880e1bc4>] :8139too:rtl8139_poll+0x1f6/0x424 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8000cae2>] cache_alloc_debugcheck_after+0x30/0x1c1 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff80011374>] __kmalloc_track_caller+0x11b/0x127 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8002fbbe>] __alloc_skb+0x61/0x128 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff880e1a13>] :8139too:rtl8139_poll+0x45/0x424 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff880e1bc4>] :8139too:rtl8139_poll+0x1f6/0x424 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8000cdcc>] net_rx_action+0xa9/0x1bb Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff80012983>] __do_softirq+0x67/0xf3 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8006839f>] _spin_unlock+0x17/0x20 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff800613d0>] call_softirq+0x1c/0x28 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff800704dc>] do_softirq+0x35/0xa0 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff80070642>] do_IRQ+0xfb/0x104 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8006ecef>] default_idle+0x0/0x69 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff80060652>] ret_from_intr+0x0/0xf Feb 13 15:14:26 dhcp231-179 kernel: <EOI> [<ffffffff8006ed2b>] default_idle+0x3c/0x69 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8006ed29>] default_idle+0x3a/0x69 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8004b278>] cpu_idle+0x9a/0xbd Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff80446824>] start_kernel+0x243/0x248 Feb 13 15:14:26 dhcp231-179 kernel: [<ffffffff8044622f>] _sinittext+0x22f/0x236 Feb 13 15:14:26 dhcp231-179 kernel: Feb 13 15:14:26 dhcp231-179 kernel: 130: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00 00 6b 6b Feb 13 15:14:26 dhcp231-179 kernel: Prev obj: start=ffff81000eebf098, len=512 Feb 13 15:14:26 dhcp231-179 kernel: Redzone: 0x170fc2a5/0x170fc2a5. Feb 13 15:14:26 dhcp231-179 kernel: Last user: [<ffffffff883f1905>](nlm_lookup_file+0x115/0x210 [lockd]) Feb 13 15:14:26 dhcp231-179 kernel: 000: 00 00 00 00 00 00 00 00 10 00 01 00 01 01 00 00 Feb 13 15:14:26 dhcp231-179 kernel: 010: 00 00 dd cb 1a 00 c2 4c 12 9e 00 00 00 00 00 00 Feb 13 15:14:26 dhcp231-179 kernel: Next obj: start=ffff81000eebf4c8, len=512 Feb 13 15:14:26 dhcp231-179 kernel: Redzone: 0x5a2cf071/0x5a2cf071. Feb 13 15:14:26 dhcp231-179 kernel: Last user: [<ffffffff80040e85>](do_execve+0x233/0x243) Feb 13 15:14:26 dhcp231-179 kernel: 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Feb 13 15:14:26 dhcp231-179 kernel: 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b It's hard to confirm since this is memory corruption, but I suspect the problem is this: ...the patch that I have for bug 254195 calls rpc_shutdown_client on the NLM rpc_client when taking down lockd. If however, there is a portmap call still pending then that is *not* taken down. When the pmap call completes, it can end up setting the cl_port on the already freed clnt. If that memory has already been reallocated then this is a memory corruptor. There were some patches that went upstream to fix this by making the portmap call take a reference to the xprt (and maybe also moving the cl_port field into that structure). That should fix this problem, so I'll plan to dig those up and also to see if we can get a consistent reproducer somehow. -- Additional comment from jlayton on 2008-02-14 15:42 EST -- Hmmm...except that those patches look to be fairly invasive and probably not kabi-friendly. We may have to fix this by doing something different than how upstream did it... -- Additional comment from jlayton on 2008-02-15 09:57 EST -- A quick check indicates that RHEL4 has the same problem. Fixing this isn't going to be trivial. The best method I can see is that when we spawn a new portmap call that we alloc a small memory region with a refcounter (maybe a kref even) and a field for the port number. The main rpc task and the portmap task will each hold a reference. The portmap task will release the reference when it gets the port (or an error), and the main task will release the ref when the portmap call completes (or the task is torn down). -- Additional comment from jlayton on 2008-04-04 14:56 EST -- Created an attachment (id=301324) patch -- add new reference counted response container for portmap calls First pass at a patch to fix this problem. It seems to work, but I'm not sure I'm cleaning up everything correctly if the child RPC task exits abnormally (times out, etc). The patch adds a new struct that's managed with a kref. The both the parent and child RPC client will have a reference to it. The intent is to drop this reference after the child task copies the port number into the container from the RPC response. The parent will drop the reference when it copies the result from the container into the cl_port field. We'll also need to drop the reference if the tasks are killed. It's this last part that I'm not sure I have correct, but I think I'm pretty close. I've also done my best to preserve kabi, but it may need more massaging to prevent breakage. -- Additional comment from jlayton on 2008-05-22 07:26 EST -- Created an attachment (id=306361) SUNRPC: prevent memory corruption from successful portmap call with dead parent task Updated patch. I believe this one gets the refcounting right when the task is killed. -- Additional comment from jlayton on 2008-05-22 11:33 EST -- Created an attachment (id=306393) patch -- SUNRPC: prevent memory corruption from successful portmap call with dead parent task Oof -- the last patch was wrongwrongwrong... This patch does the right thing. It's also smaller. -- Additional comment from jlayton on 2008-05-22 12:35 EST -- Actually, the original patch was right. We have to do it this way (hook the put in rpc_destroy_client), so that the child task does a put regardless of how the RPC task exits. -- Additional comment from jlayton on 2008-05-23 06:38 EST -- Created an attachment (id=306475) SUNRPC: prevent memory corruption from successful portmap call with dead parent task Small revision. Since there can be several portmap calls in flight at the same time, it's possible that the cl_port will be non-zero when pmap_getport_done runs. If a portmap call succeeds, and then a following one fails we don't want to clobber the port we got from the earlier call, so check to make sure cl_port zero before copying the port from the container. -- Additional comment from jlayton on 2008-05-23 06:59 EST -- Created an attachment (id=306479) small test program to set a blocking lock on a file To reproduce the memory corruption: 1) set up an NFS server on a machine booted to a kernel with memory poisoning enabled. The kernel-debug packages have this enabled already. It'll also be easier to reproduce this if the server's kernel has the patches for bug 254195, since that helps guarantee that the parent RPC task is killed when NFS is stopped. After starting the nfs service, wait 90s to make sure you're past the server's grace period. 2) mount an export from this server on another host (any RHEL client will do). 3) use the test program to set a lock on a file on the server: server# lockit /export/testfile hit return to exit 4) do a blocking lock of the same file on the client (make sure you're past the server's lock grace period): client# lockit /mnt/nfs/testfile (you'll get no output since the program is blocked waiting on the lock) 5) on the client, set up iptables rules to drop all traffic to and from the server to simulate a network partition: client# iptables -A INPUT -s <server addr> -j DROP client# iptables -A OUTPUT -d <server addr> -j DROP 6) release the lock on the server (hit return), and then run: server# service nfs stop ...on the server to bring down nfs and kill off lockd. 7) unpartition the network: client# service iptables stop ...the last 2 steps need to be timed right so that the portmap calls are still being reattempted, but the parent RPC callback is dead. When you remove the network parition, the portmap call succeeds and the port is copied into the parent RPC task. Since the parent RPC task should be dead at this point however, the result is copied into a rpc_clnt struct that has already been freed. The results of this are not always evident right away. The corruption will not be noticed until the memory is reused for other purposes.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 312371 [details] patch -- prevent mem corruption due to portmap call succeeding after parent rpc_clnt has been freed
Committed in 78.3.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Updating PM score.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: The RPC client will store the result of a portmap call in a place in memory that can be freed and reallocated in the right circumstances. Consequence: memory corruption, random crashes Fix: add a patch to add reference counting to the memory location where the portmap result is stored and only free it when it will no longer be used Result: prevents memory corruption
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1 @@ -Cause: The RPC client will store the result of a portmap call in a place in memory that can be freed and reallocated in the right circumstances. +The RPC client stores the result of a portmap call at a place in memory that can be freed and reallocated under the right circumstances. However, under some circumstances, the result of the portmap call was freed from memory too early, which may have resulted in memory corruption. With this update, reference counting has been added to the memory location where the portmap result is stored, and will only free it after it has been used.-Consequence: memory corruption, random crashes -Fix: add a patch to add reference counting to the memory location where the portmap result is stored and only free it when it will no longer be used -Result: prevents memory corruption
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html