http://marc.info/?l=linux-netdev&m=122593044330973&w=2 "The following code causes a kernel panic on Linux 2.6.26: http://darkircop.org/unix.c I haven't investigated the bug so I'm not sure what is causing it, and don't know if it's exploitable. The code passes unix sockets from one process to another using unix sockets. The bug probably has to do with closing file descriptors."
Created attachment 322676 [details] Reproducer - http://darkircop.org/unix.c
Every Linux kernel is vulnerable to this as far as I can tell. The problem is that __scm_destroy() can close a socket via fput() which can lead back into __scm_destroy() and so on and so forth. I'll attach the patch I'm currently testing, it's based upon a suggested implementation from Linus.
Created attachment 322702 [details] potential fix for __scm_destroy() recursion
I managed to reproduce the problem easily on: kernel-rt-2.6.24.7-91.el5rt.i686 kernel-2.6.9-78.0.8.EL.i686 I had a little problem reproducing it on kernel-2.6.18-92.1.17.el5.i686, but a while loop helps.
Created attachment 322845 [details] Another reproducer - http://darkircop.org/unix2.c
Upstream commits: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f8d570a http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3b53fbf Luis, please ensure that the patch you added to -92 is the same one as f8d570a/3b53fbf. Thanks.
(In reply to comment #9) > Upstream commits: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f8d570a > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3b53fbf Dave, looks like Andrea is still seeing problems with this patch? http://marc.info/?l=linux-netdev&m=122598444310928&w=2 Thanks, Eugene
Created attachment 323161 [details] second part of fix As well as the __scm_destroy() recursion patch, this fix for AF_UNIX garbage collection is needed to cure all of the discovered problems.
Andrea's problems are fully resolved if the __scm_destroy() and the AF_UNIX garbage collector patch are both applied.
(In reply to comment #13) > Created an attachment (id=323161) [details] > second part of fix > > As well as the __scm_destroy() recursion patch, this fix > for AF_UNIX garbage collection is needed to cure all of the > discovered problems. This is: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6209344
Note this is a prereq patch for the other 2: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=1fd05ba5a2f2aa8e7b9b52ef55df850e2e7d54c9
FWIW (should have done this earlier), I'm trying the test case on a 122.el5 kernel and its not crashing. sendmsg always fails with an -EPIPE (which is odd, given that it was created with socketpair). Investigating as to why
scartch that, it just took several tries to get it to lock up the system.
From dann frazier in oss-security list: "Thanks for following up. fyi, our testing of this fix has uncovered additional issues. Local/unprivileged users can cause soft lockups and take out system processes by triggering the OOM killer: http://marc.info/?l=linux-netdev&m=122721862313564&w=2" Dave, take note.
(In reply to comment #20) > From dann frazier in oss-security list: > > "Thanks for following up. > > fyi, our testing of this fix has uncovered additional issues. > Local/unprivileged users can cause soft lockups and take out system > processes by triggering the OOM killer: > http://marc.info/?l=linux-netdev&m=122721862313564&w=2" Bug reported at: http://marc.info/?l=linux-netdev&m=122721862313564&w=2
I tested 2.6.24.7-94.el5rt x86_64 by running unix or unix2 in a loop. It can invoke the oom-killer pretty quickly, but I did not see the soft lockups that Dann observed. Dave, any comments? --- master invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0 Pid: 1798, comm: master Not tainted 2.6.24.7-94.el5rt #1 Call Trace: [<ffffffff81087cca>] out_of_memory+0x9d/0x2cb [<ffffffff8108acd5>] __alloc_pages+0x27d/0x312 [<ffffffff810a3a44>] alloc_page_vma+0xb7/0xc6 [<ffffffff8109e36c>] read_swap_cache_async+0x4f/0x103 [<ffffffff81093d45>] swapin_readahead+0x61/0xcd [<ffffffff810952c8>] handle_mm_fault+0x408/0x764 [<ffffffff81289ec0>] do_page_fault+0x3ba/0x76d [<ffffffff810336d4>] ? default_wake_function+0x0/0x14 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14 [<ffffffff812882d9>] error_exit+0x0/0x51 [<ffffffff8113c7fd>] ? copy_user_generic_string+0x2d/0x40 [<ffffffff810bde13>] ? core_sys_select+0x200/0x275 [<ffffffff81056cd4>] ? getnstimeofday+0x31/0x88 [<ffffffff8113a2d0>] ? rb_insert_color+0x68/0xe3 [<ffffffff81041b34>] ? timespec_add_safe+0x37/0x64 [<ffffffff8105401e>] ? enqueue_hrtimer+0xda/0xe8 [<ffffffff81054c41>] ? ktime_get_ts+0x46/0x4b [<ffffffff810be03f>] ? sys_select+0x7e/0xa6 [<ffffffff8100c22e>] ? system_call_ret+0x0/0x5 Node 0 DMA per-cpu: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 161 Cold: hi: 62, btch: 15 usd: 56 Active:9 inactive:32 dirty:0 writeback:0 unstable:0 free:1174 slab:122808 mapped:1 pagetables:377 bounce:0 Node 0 DMA free:1988kB min:52kB low:64kB high:76kB active:0kB inactive:0kB present:9696kB pages_scanned:0 al l_unreclaimable? yes lowmem_reserve[]: 0 484 484 484 Node 0 DMA32 free:2708kB min:2788kB low:3484kB high:4180kB active:156kB inactive:0kB present:495940kB pages_ scanned:174218 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB Node 0 DMA32: 17*4kB 0*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2708kB Swap cache: add 1952, delete 1952, find 4/6, race 0+0 Free swap = 1040760kB Total swap = 1048568kB master invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0 Pid: 1798, comm: master Not tainted 2.6.24.7-94.el5rt #1 Call Trace: [<ffffffff8108782e>] oom_kill_process+0x58/0xfe [<ffffffff81087e58>] out_of_memory+0x22b/0x2cb [<ffffffff8108acd5>] __alloc_pages+0x27d/0x312 [<ffffffff810a3a44>] alloc_page_vma+0xb7/0xc6 [<ffffffff8109e36c>] read_swap_cache_async+0x4f/0x103 [<ffffffff81093d45>] swapin_readahead+0x61/0xcd [<ffffffff810952c8>] handle_mm_fault+0x408/0x764 [<ffffffff81289ec0>] do_page_fault+0x3ba/0x76d [<ffffffff810336d4>] ? default_wake_function+0x0/0x14 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14 [<ffffffff812882d9>] error_exit+0x0/0x51 [<ffffffff8113c7fd>] ? copy_user_generic_string+0x2d/0x40 [<ffffffff810bde13>] ? core_sys_select+0x200/0x275 [<ffffffff81056cd4>] ? getnstimeofday+0x31/0x88 [<ffffffff8113a2d0>] ? rb_insert_color+0x68/0xe3 [<ffffffff81041b34>] ? timespec_add_safe+0x37/0x64 [<ffffffff8105401e>] ? enqueue_hrtimer+0xda/0xe8 [<ffffffff81054c41>] ? ktime_get_ts+0x46/0x4b [<ffffffff810be03f>] ? sys_select+0x7e/0xa6 [<ffffffff8100c22e>] ? system_call_ret+0x0/0x5 Node 0 DMA per-cpu: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 164 Cold: hi: 62, btch: 15 usd: 59 Active:9 inactive:32 dirty:0 writeback:0 unstable:0 free:1188 slab:122759 mapped:1 pagetables:377 bounce:0 Node 0 DMA free:1988kB min:52kB low:64kB high:76kB active:0kB inactive:0kB present:9696kB pages_scanned:0 al l_unreclaimable? yes lowmem_reserve[]: 0 484 484 484 Node 0 DMA32 free:2764kB min:2788kB low:3484kB high:4180kB active:156kB inactive:0kB present:495940kB pages_ scanned:622 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB Node 0 DMA32: 15*4kB 3*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2772kB Swap cache: add 1968, delete 1968, find 5/8, race 0+0 Free swap = 1040760kB Total swap = 1048568kB [...]
I had never seen the OOM killer triggers, but rather I did see that the program could get stuck but be killable still by Ctrl-C. The problem is that the child processes can still queue new FDs over the AF_UNIX socket to the parents side, while the parent is exit()'ing and (via exit time FD closing) running UNIX garbage collection on those FDs. There is no easy way at all to fix this. There isn't something like a one-to-one relationship between sockets and processes, there is rather potentially a many-to-one relationship. So ideas like "don't allow sending FD over AF_UNIX socket for process that is exit()'ing" are totally out of the question. One idea that might work, however, is to throttle when UNIX garbage collection is in progress. I can't say how easy the implementation would be. The following might work: 1) Add wait_queue to net/unix/garbage.c 2) Create a helper function that sleeps until gc_in_progress is false 3) At the end of unix_gc() where gc_in_progress is cleared to false, perform a wakeup on the waitq added in #1 4) At all net/unix/af_unix.c calls of scm_send(), first invoke the "wait until gc_in_progress==false" thing added in #3 This should make sendmsg()'s block while any UNIX garbage collection is in progress. Note that this will kill scalability in the case where many UNIX sockets are being closed while many other UNIX sockets are doing SCM fp passing. I don't know how common that is, probably not enough to care.
Created attachment 324662 [details] Implementation of David's suggestion Here's my attempt at implementing David's suggestion. I've been running this for an hour or so now and haven't had a soft lockup or oom-killer trigger yet.
Patch looks mostly fine, could you please post this to netdev with proper commit message and signoff? I'd like to get this fixed upstream. Thanks Dann.
Sent: http://marc.info/?l=linux-netdev&m=122765505415944&w=2
(In reply to comment #27) > Sent: > http://marc.info/?l=linux-netdev&m=122765505415944&w=2 Updated patch: http://marc.info/?l=linux-netdev&m=122771908731133&w=2
(In reply to comment #28) > (In reply to comment #27) > > Sent: > > http://marc.info/?l=linux-netdev&m=122765505415944&w=2 > > Updated patch: > http://marc.info/?l=linux-netdev&m=122771908731133&w=2 This is a different bug triggered by the same reproducers. I have filed a new bug for this. Please refer to bug 473259. Thanks.
Debian mention of this issue: http://security-tracker.debian.net/tracker/CVE-2008-5029
A user posted an exploit[1] to bugtraq last Friday. It is the same reproducer as the one posted in comment #1. SecurityFocus listed it as a new vulnerability -- Linux Kernel Malformed 'msghdr' Structure Local Denial of Service[2]. This is incorrect, and it should be CVE-2008-5029. Take note. [1] http://seclists.org/bugtraq/2009/Jan/0000.html [2] http://www.securityfocus.com/bid/33079/info
This issue has been addressed in following products: Red Hat Enterprise Linux 3 Via RHSA-2009:1550 https://rhn.redhat.com/errata/RHSA-2009-1550.html
This was addressed via: MRG Realtime for RHEL 5 Server (RHSA-2009:0009) Red Hat Enterprise Linux version 4 (RHSA-2009:0014) Red Hat Enterprise Linux (v. 5.2.z server) (RHSA-2009:0021) Red Hat Enterprise Linux version 5 (RHSA-2009:0225) Red Hat Enterprise Linux version 3 (RHSA-2009:1550)