Bug 253754
Summary: | use after free in nlm subsystem | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Jeff Layton <jlayton> | ||||||||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 4.4 | CC: | dhoward, jburke, jtluka, riek, staubach, steved, vmayatsk | ||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
URL: | http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=324671 | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2009-05-18 19:19:02 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 430698, 461297 | ||||||||||||
Attachments: |
|
Description
Jeff Layton
2007-08-21 18:14:17 UTC
Ok, I think I see what could be the issue here. From lockd(): /* Exit the RPC thread */ svc_exit_thread(rqstp); /* release rpciod */ rpciod_down(); ...svc_exit_thread does this: /* Release the server */ if (serv) svc_destroy(serv); which eventually kfree's serv. So this codepath seems to be pretty much always using memory after free. This might be reproducible with slab poisoning turned on, presuming you can hit it so that there's an outstanding lock call that needs to be cleaned up when rpciod_down is called. It looks like a recent upstream patch removed the rpciod_up/down calls from lockd, but I don't think that will work for RHEL4. Created attachment 172425 [details]
untested proposed patch
I think a patch like this would probably fix this particular issue (though it's
untested so far, even for compilation). It just separates out the freeing of
"serv" from everything else, and delaying freeing that until after rpciod_down
returns.
The problem is going to be reproducing and testing this since this race looks
like it'll be tricky to hit. The race is (I'm pretty sure), something like
this:
lockd
nlmsvc_retry_blocked
nlmsvc_grant_blocked
nlmsvc_async_call which queues up nlmsvc_granted_callback for
rpciod
...then before that call actually gets serviced by rpciod, you'll need lockd to
begin exiting and call rpciod_down. It's probably reproducible but may take a
while.
...that last comment should be nlmsvc_grant_callback (not granted)... I'm thinking a reproducer may could happen something like this: 1) local process locks file on exported filesystem on NFS server 2) nfs client tries to lock same file and blocks waiting for callback from server 3) set up iptables rules to drop traffic between client and server 4) have local process unlock file 5) do service nfs stop on server 6) remove iptables rules to allow communication to continue The iptables rule would hopefully delay the callback long enough to allow lockd to call svc_exit_thread. We'd also want the server to run a kernel with memory poisoning. Reproduced using the steps above on a kernel with CONFIG_DEBUG_SLAB (though it actually crashes when you run service nfs stop): lockd: couldn't shutdown host module! nfsd: last server has exited nfsd: unexporting all filesystems rpciod: active tasks at shutdown?! RPC: error 5 connecting to server 10.11.231.179 eip: ffffffffa00a7b6c ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:119 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 rpcsec_gss_krb5 auth_rpcgss des sunrpc xennet dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod xenblk sd_mod scsi_mod Pid: 1970, comm: rpciod Not tainted 2.6.9-56.EL.bz253754.1xenU RIP: e030:[<ffffffff80296687>] <ffffffff80296687>{_spin_lock_bh+51} RSP: e02b:ffffff801ad61da8 EFLAGS: 00010296 RAX: 0000000000000016 RBX: ffffff801b217428 RCX: 00000000000017d2 RDX: 00000000000017d2 RSI: 0000000000000000 RDI: ffffffff8032e960 RBP: ffffff801b2173f8 R08: ffffff800275ba08 R09: ffffff801b217428 R10: 0000000100000000 R11: 0000000000000000 R12: ffffff801b225bb0 R13: 0000000000000000 R14: 00000000fffffff4 R15: ffffff801d30e048 FS: 0000002a958a1b00(0000) GS:ffffffff80421680(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process rpciod (pid: 1970, threadinfo ffffff801ad60000, task ffffff801b2657f0) Stack: ffffff801b225b90 ffffffffa00a7b6c 0000000000000003 ffffff801b225b90 ffffff801b224330 ffffffffa013d3ec ffffff801b225ba0 ffffff801d30e048 0000000000000002 ffffff801b2173f8 Call Trace:<ffffffffa00a7b6c>{:sunrpc:svc_wake_up+16} <ffffffffa013d3ec>{:lockd:nlmsvc_grant_callback+375} <ffffffffa013c100>{:lockd:lockd+0} <ffffffffa00a4ed1>{:sunrpc:__rpc_execute+745} <ffffffff8012b0b7>{__wake_up+54} <ffffffffa013c100>{:lockd:lockd+0} <ffffffffa00a4476>{:sunrpc:__rpc_wake_up_task+368} <ffffffffa00a4fa3>{:sunrpc:__rpc_schedule+74} <ffffffffa00a52bb>{:sunrpc:rpciod_killall+56} <ffffffffa00a558f>{:sunrpc:rpciod+601} <ffffffff8012da88>{autoremove_wake_function+0} <ffffffff80132e6a>{do_exit+3073} <ffffffff8012da88>{autoremove_wake_function+0} <ffffffff8010e05a>{child_rip+8} <ffffffffa013c100>{:lockd:lockd+0} <ffffffffa00a5336>{:sunrpc:rpciod+0} <ffffffff8010e052>{child_rip+0} Code: 0f 0b 87 9a 2a 80 ff ff ff ff 77 00 f0 ff 0b 0f 88 33 03 00 RIP <ffffffff80296687>{_spin_lock_bh+51} RSP <ffffff801ad61da8> <0>Kernel panic - not syncing: Oops This may also be an upstream bug. The code for this seems to be quite a bit different now, but I'm thinking it's still possible to hit this problem there. I'll plan to test this on something rawhideish soon... I don't think the test patch in comment #2 is the correct approach. rpciod_down doesn't guarantee that there rpciod actually goes down. So it's possible that even with that patch that nlmsvc_grant_callback would get called after lockd has exited. What's really needed I think is a reference counter for the svc_serv struct. We'd increment the counter before the nlmsvc_async_call and decrement it in nlmsvc_grant_callback after the spinlock is unlocked. lockd could check that before exiting and if it's non-zero then call rpciod_wake_up(), and schedule() and loop. I think I might have reproduced this on rawhide as well, but it took me several tries and I'm not certain it's the exact same problem: general protection fault: 0000 [1] SMP CPU 0 Modules linked in: nfsd exportfs lockd nfs_acl auth_rpcgss autofs4 rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video output sbs button dock battery ac floppy rtc_cmos 8139too 8139cp mii sg dm_snapshot dm_zero dm_mirror dm_mod ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd Pid: 1564, comm: rpciod/0 Not tainted 2.6.23-0.130.rc3.git5.killattr.1.fc8debug #1 RIP: 0010:[<ffffffff8112e852>] [<ffffffff8112e852>] _raw_spin_trylock+0x2/0x2a RSP: 0018:ffff810018a37e38 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b8b RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b73 RBP: 6b6b6b6b6b6b6b73 R08: 0000000000000002 R09: 0000000000000000 R10: ffffffff881952a1 R11: 0000000000000005 R12: 0000000000000000 R13: ffff810014cbd248 R14: ffffffff814cdf00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff813ec000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002aaab7cc7000 CR3: 000000001f7cc000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rpciod/0 (pid: 1564, threadinfo ffff810018a36000, task ffff810018a34000) Stack: ffffffff8127301e 000000000000ea60 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b73 ffffffff881952a1 ffff8100140b48f0 ffff8100140b4800 ffff8100140b4800 ffff8100140b48f0 ffffffff88190f15 ffffffff88190978 0000000000000000 Call Trace: [<ffffffff8127301e>] _spin_lock_bh+0x33/0x58 [<ffffffff881952a1>] :sunrpc:svc_wake_up+0x2b/0x8b [<ffffffff88190f15>] :sunrpc:rpc_async_schedule+0x0/0xc [<ffffffff88190978>] :sunrpc:rpc_exit_task+0x33/0x81 [<ffffffff88190d6b>] :sunrpc:__rpc_execute+0x7d/0x227 [<ffffffff810466a5>] run_workqueue+0x92/0x15e [<ffffffff81047055>] worker_thread+0x0/0xe7 [<ffffffff81047132>] worker_thread+0xdd/0xe7 [<ffffffff81049f4a>] autoremove_wake_function+0x0/0x2e [<ffffffff81049e08>] kthread+0x47/0x73 [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff8100cb18>] child_rip+0xa/0x12 [<ffffffff8100c22c>] restore_args+0x0/0x30 [<ffffffff81049c91>] kthreadd+0x117/0x13c [<ffffffff81049dc1>] kthread+0x0/0x73 [<ffffffff8100cb0e>] child_rip+0x0/0x12 Code: 87 07 31 d2 85 c0 0f 9f c2 85 d2 74 18 65 8b 04 25 24 00 00 RIP [<ffffffff8112e852>] _raw_spin_trylock+0x2/0x2a RSP <ffff810018a37e38> Kernel panic - not syncing: Aiee, killing interrupt handler! I *think* this is the same issue, but it's very hard to tell. The changes to the rpc layer make the stack less clear, but svc_wake_up isn't called from many places -- mainly only from NLM functions, so I think the problem is the same here. Created attachment 191861 [details]
test patch 2
This patch adds a reference count to the svc_serv struct, increments it before
doing the async call and decrements it in the completion function. If the count
is non-zero when lockd goes down, it wakes up rpciod and sleeps briefly.
The patch seems to work, but I'm not certain it's the right approach. This
could keep lockd up indefinitely if the call can't be completed for some
reason. A better scheme might be to kill off the async call, but I'm not clear
on whether there's a way to actually do that...
Was also able to crash RHEL4 in a similar fashion with the following steps (though it took me a couple of tries -- not sure why...): 1) lock file locally on server 2) lock file on client (and block) 3) service nfs restart on server 4) release lock on server Stack trace: eip: ffffffffa00a7b6c ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:119 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: nfsd exportfs lockd(U) nfs_acl md5 ipv6 rpcsec_gss_krb5 auth_rpcgss des sunrpc xennet dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod xenblk sd_mod scsi_mod Pid: 2132, comm: rpciod Not tainted 2.6.9-58.EL.bz253754.2xenU RIP: e030:[<ffffffff8029671f>] <ffffffff8029671f>{_spin_lock_bh+51} RSP: e02b:ffffff801bd19db8 EFLAGS: 00010292 RAX: 0000000000000016 RBX: ffffff801d460130 RCX: 0000000000038bc9 RDX: 0000000000038bc9 RSI: 0000000000000000 RDI: ffffffff8032eae0 RBP: ffffff801d460100 R08: ffffff801e03aa08 R09: ffffff801d460130 R10: 0000000100000000 R11: 0000000000000000 R12: ffffff801d086320 R13: 000000000000000b R14: 000000000000000a R15: ffffff8002a0e2f8 FS: 0000002a958a1b00(0000) GS:ffffffff80421800(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process rpciod (pid: 2132, threadinfo ffffff801bd18000, task ffffff801bc69030) Stack: ffffff801d086300 ffffffffa00a7b6c ffffff801e03aa08 ffffff801d086300 ffffff801d0877a8 ffffffffa013d3fa ffffff801d086310 ffffff8002a0e2f8 0000000000000001 ffffff80030dc490 Call Trace:<ffffffffa00a7b6c>{:sunrpc:svc_wake_up+16} <ffffffffa013d3fa>{:lockd:nlmsvc_grant_callback+389} <ffffffffa013c100>{:lockd:lockd+0} <ffffffffa00a4ed1>{:sunrpc:__rpc_execute+745} <ffffffffa013c100>{:lockd:lockd+0} <ffffffffa00a4fa3>{:sunrpc:__rpc_schedule+74} <ffffffffa00a5463>{:sunrpc:rpciod+301} <ffffffff8012dadc>{autoremove_wake_function+0} <ffffffff8012dadc>{autoremove_wake_function+0} <ffffffff8010e05a>{child_rip+8} <ffffffffa013c100>{:lockd:lockd+0} <ffffffffa00a5336>{:sunrpc:rpciod+0} <ffffffff8010e052>{child_rip+0} Code: 0f 0b 67 9b 2a 80 ff ff ff ff 77 00 f0 ff 0b 0f 88 33 03 00 RIP <ffffffff8029671f>{_spin_lock_bh+51} RSP <ffffff801bd19db8> <0>Kernel panic - not syncing: Oops I played with a new approach on this today... Rather than having the functions call svc_wake_up(block->b_daemon), the patch created a new lockd_wake_up() function that wakes up whatever the current lockd task is. It seems to prevent the panics I've seen, but the current iteration used a semaphore to do the locking, and that's a no-no in a RPC callback (we don't want sleeping functions holding up rpciod). So I need a new locking scheme for it, but I think that approach might be the best one... Created attachment 199971 [details]
patch -- add lockd_wake_up() function and call it instead of svc_wake_up()
This patch seems to eliminate the panics. I think this is the best thing we can
do, but maybe I'm not considering something.
This just changes the svc_wake_up calls to a call that wakes up whatever lockd
thread happens to be up (if there is one). This eliminates the panics, but I'm
not clear on whether there are any issues with this approach that I'm not
considering...
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I've got a patch that I think fixes this, but it may help expose some other problems so I'm not going to propose this for 4.7 after all. We'll have a look again for 4.8... The problem that I think this might increase exposure to is bug 432867. It's a preexisting bug, and may not be one we can reasonably fix in RHEL without breaking kabi. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. I've backported the patchset that I pushed upstream to fix this. It seems to work for RHEL5, but on RHEL4, I can still make it fail to release a block and leave a host and call in queue. There's a difference there of some sort. It seems like the nlm_host being attached to the block is the wrong kind (client rather than server or vice versa) so we don't end up invalidating the lock. ...added dprintk to RHEL5: nlmsvc_act_unlock: block->b_host=ffff81000e858b08 host=ffff81000e858b08 block->b_host->h_server=1 host->h_server=1 ...similar dprintk for RHEL4 with similar reproducer. nlmsvc_traverse_blocks: block->b_host=000001001c7f63c0 host=000001001c7f65c0 block->b_host->h_server=0 host->h_server=1 ...so the difference seems to be that the block for RHEL5 is attached to a nlm_host with h_server=1, but RHEL4 attaches it to a h_server=0 host. This makes the lockd shutdown code leave the block on the list. RHEL4 is missing this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=686517f1ad1630c11964d668b556aab79b8c942e ...the host attached to the nlm_block is being pulled from the wrong list, so we're not able to properly cancel it when lockd comes down. This patch seems to fix the problem in RHEL4, but I need to look over it a bit more closely and make sure it's OK. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Updating PM score. Committed in 78.21.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Created attachment 326987 [details]
final patchset -- fix use-after-free problems in lockd
This is the final set of 4 patches
Patchset is in -89.EL kernel. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html |