Bug 470201 (CVE-2008-5029) - CVE-2008-5029 kernel: Unix sockets kernel panic
Summary: CVE-2008-5029 kernel: Unix sockets kernel panic
Keywords:
Status: CLOSED ERRATA
Alias: CVE-2008-5029
Product: Security Response
Classification: Other
Component: vulnerability
Version: unspecified
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Red Hat Product Security
QA Contact:
URL:
Whiteboard:
Depends On: 470429 470430 470431 470432 470433 470434 470435 470436
Blocks: CVE-2008-5300 510746
TreeView+ depends on / blocked
 
Reported: 2008-11-06 09:07 UTC by Eugene Teo (Security Response)
Modified: 2019-09-29 12:27 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-21 17:49:39 UTC
Embargoed:


Attachments (Terms of Use)
Reproducer - http://darkircop.org/unix.c (2.51 KB, text/plain)
2008-11-06 09:08 UTC, Eugene Teo (Security Response)
no flags Details
potential fix for __scm_destroy() recursion (1.85 KB, patch)
2008-11-06 12:04 UTC, David Miller
no flags Details | Diff
Another reproducer - http://darkircop.org/unix2.c (2.67 KB, text/plain)
2008-11-07 11:22 UTC, Eugene Teo (Security Response)
no flags Details
second part of fix (6.53 KB, patch)
2008-11-11 09:31 UTC, David Miller
no flags Details | Diff
Implementation of David's suggestion (2.23 KB, patch)
2008-11-25 20:24 UTC, dann frazier
no flags Details | Diff
Proposed patch for real-time kernel (2.49 KB, patch)
2008-11-27 06:00 UTC, Eugene Teo (Security Response)
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0009 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2009-01-22 10:43:54 UTC
Red Hat Product Errata RHSA-2009:0014 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2009-01-14 18:05:34 UTC
Red Hat Product Errata RHSA-2009:0021 0 normal SHIPPED_LIVE Important: kernel security update 2009-02-25 01:04:12 UTC
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC
Red Hat Product Errata RHSA-2009:1550 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2009-11-03 21:59:47 UTC

Description Eugene Teo (Security Response) 2008-11-06 09:07:59 UTC
http://marc.info/?l=linux-netdev&m=122593044330973&w=2

"The following code causes a kernel panic on Linux 2.6.26:
http://darkircop.org/unix.c

I haven't investigated the bug so I'm not sure what is causing it, and don't know if it's exploitable.  The code passes unix sockets from one process to another using unix sockets.  The bug probably has to do with closing file descriptors."

Comment 1 Eugene Teo (Security Response) 2008-11-06 09:08:41 UTC
Created attachment 322676 [details]
Reproducer - http://darkircop.org/unix.c

Comment 3 David Miller 2008-11-06 12:04:03 UTC
Every Linux kernel is vulnerable to this as far as I can tell.

The problem is that __scm_destroy() can close a socket via fput()
which can lead back into __scm_destroy() and so on and so forth.

I'll attach the patch I'm currently testing, it's based upon a
suggested implementation from Linus.

Comment 4 David Miller 2008-11-06 12:04:40 UTC
Created attachment 322702 [details]
potential fix for __scm_destroy() recursion

Comment 6 Eugene Teo (Security Response) 2008-11-07 06:26:40 UTC
I managed to reproduce the problem easily on:
kernel-rt-2.6.24.7-91.el5rt.i686
kernel-2.6.9-78.0.8.EL.i686

I had a little problem reproducing it on kernel-2.6.18-92.1.17.el5.i686, but a while loop helps.

Comment 7 Eugene Teo (Security Response) 2008-11-07 11:22:42 UTC
Created attachment 322845 [details]
Another reproducer - http://darkircop.org/unix2.c

Comment 9 Eugene Teo (Security Response) 2008-11-10 02:49:07 UTC
Upstream commits:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f8d570a
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3b53fbf

Luis, please ensure that the patch you added to -92 is the same one as f8d570a/3b53fbf. Thanks.

Comment 10 Eugene Teo (Security Response) 2008-11-10 04:49:20 UTC
(In reply to comment #9)
> Upstream commits:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f8d570a
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3b53fbf

Dave, looks like Andrea is still seeing problems with this patch?

http://marc.info/?l=linux-netdev&m=122598444310928&w=2

Thanks, Eugene

Comment 13 David Miller 2008-11-11 09:31:21 UTC
Created attachment 323161 [details]
second part of fix

As well as the __scm_destroy() recursion patch, this fix
for AF_UNIX garbage collection is needed to cure all of the
discovered problems.

Comment 14 David Miller 2008-11-11 09:32:01 UTC
Andrea's problems are fully resolved if the __scm_destroy() and
the AF_UNIX garbage collector patch are both applied.

Comment 16 Eugene Teo (Security Response) 2008-11-12 04:53:45 UTC
(In reply to comment #13)
> Created an attachment (id=323161) [details]
> second part of fix
> 
> As well as the __scm_destroy() recursion patch, this fix
> for AF_UNIX garbage collection is needed to cure all of the
> discovered problems.

This is: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6209344

Comment 18 Neil Horman 2008-11-12 20:58:26 UTC
FWIW (should have done this earlier), I'm trying the test case on a 122.el5 kernel and its not crashing.  sendmsg always fails with an -EPIPE (which is odd, given that it was created with socketpair).  Investigating as to why

Comment 19 Neil Horman 2008-11-12 21:05:36 UTC
scartch that, it just took several tries to get it to lock up the system.

Comment 20 Eugene Teo (Security Response) 2008-11-22 02:05:01 UTC
From dann frazier in oss-security list:

"Thanks for following up.

fyi, our testing of this fix has uncovered additional issues.
Local/unprivileged users can cause soft lockups and take out system
processes by triggering the OOM killer:
 http://marc.info/?l=linux-netdev&m=122721862313564&w=2"

Dave, take note.

Comment 22 Eugene Teo (Security Response) 2008-11-22 02:50:04 UTC
(In reply to comment #20)
> From dann frazier in oss-security list:
> 
> "Thanks for following up.
> 
> fyi, our testing of this fix has uncovered additional issues.
> Local/unprivileged users can cause soft lockups and take out system
> processes by triggering the OOM killer:
>  http://marc.info/?l=linux-netdev&m=122721862313564&w=2"

Bug reported at:
http://marc.info/?l=linux-netdev&m=122721862313564&w=2

Comment 23 Eugene Teo (Security Response) 2008-11-25 03:42:37 UTC
I tested 2.6.24.7-94.el5rt x86_64 by running unix or unix2 in a loop. It can invoke the oom-killer pretty quickly, but I did not see the soft lockups that Dann observed. Dave, any comments?

---
master invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0
Pid: 1798, comm: master Not tainted 2.6.24.7-94.el5rt #1

Call Trace:
 [<ffffffff81087cca>] out_of_memory+0x9d/0x2cb
 [<ffffffff8108acd5>] __alloc_pages+0x27d/0x312
 [<ffffffff810a3a44>] alloc_page_vma+0xb7/0xc6
 [<ffffffff8109e36c>] read_swap_cache_async+0x4f/0x103
 [<ffffffff81093d45>] swapin_readahead+0x61/0xcd
 [<ffffffff810952c8>] handle_mm_fault+0x408/0x764
 [<ffffffff81289ec0>] do_page_fault+0x3ba/0x76d
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff812882d9>] error_exit+0x0/0x51
 [<ffffffff8113c7fd>] ? copy_user_generic_string+0x2d/0x40
 [<ffffffff810bde13>] ? core_sys_select+0x200/0x275
 [<ffffffff81056cd4>] ? getnstimeofday+0x31/0x88
 [<ffffffff8113a2d0>] ? rb_insert_color+0x68/0xe3
 [<ffffffff81041b34>] ? timespec_add_safe+0x37/0x64
 [<ffffffff8105401e>] ? enqueue_hrtimer+0xda/0xe8
 [<ffffffff81054c41>] ? ktime_get_ts+0x46/0x4b
 [<ffffffff810be03f>] ? sys_select+0x7e/0xa6
 [<ffffffff8100c22e>] ? system_call_ret+0x0/0x5

Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 161   Cold: hi:   62, btch:  15 usd:  56
Active:9 inactive:32 dirty:0 writeback:0 unstable:0
 free:1174 slab:122808 mapped:1 pagetables:377 bounce:0
Node 0 DMA free:1988kB min:52kB low:64kB high:76kB active:0kB inactive:0kB present:9696kB pages_scanned:0 al
l_unreclaimable? yes
lowmem_reserve[]: 0 484 484 484
Node 0 DMA32 free:2708kB min:2788kB low:3484kB high:4180kB active:156kB inactive:0kB present:495940kB pages_
scanned:174218 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB
Node 0 DMA32: 17*4kB 0*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2708kB
Swap cache: add 1952, delete 1952, find 4/6, race 0+0
Free swap  = 1040760kB
Total swap = 1048568kB
master invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0
Pid: 1798, comm: master Not tainted 2.6.24.7-94.el5rt #1

Call Trace:
 [<ffffffff8108782e>] oom_kill_process+0x58/0xfe
 [<ffffffff81087e58>] out_of_memory+0x22b/0x2cb
 [<ffffffff8108acd5>] __alloc_pages+0x27d/0x312
 [<ffffffff810a3a44>] alloc_page_vma+0xb7/0xc6
 [<ffffffff8109e36c>] read_swap_cache_async+0x4f/0x103
 [<ffffffff81093d45>] swapin_readahead+0x61/0xcd
 [<ffffffff810952c8>] handle_mm_fault+0x408/0x764
 [<ffffffff81289ec0>] do_page_fault+0x3ba/0x76d
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff812882d9>] error_exit+0x0/0x51
 [<ffffffff8113c7fd>] ? copy_user_generic_string+0x2d/0x40
 [<ffffffff810bde13>] ? core_sys_select+0x200/0x275
 [<ffffffff81056cd4>] ? getnstimeofday+0x31/0x88
 [<ffffffff8113a2d0>] ? rb_insert_color+0x68/0xe3
 [<ffffffff81041b34>] ? timespec_add_safe+0x37/0x64
 [<ffffffff8105401e>] ? enqueue_hrtimer+0xda/0xe8
 [<ffffffff81054c41>] ? ktime_get_ts+0x46/0x4b
 [<ffffffff810be03f>] ? sys_select+0x7e/0xa6
 [<ffffffff8100c22e>] ? system_call_ret+0x0/0x5

Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 164   Cold: hi:   62, btch:  15 usd:  59
Active:9 inactive:32 dirty:0 writeback:0 unstable:0
 free:1188 slab:122759 mapped:1 pagetables:377 bounce:0
Node 0 DMA free:1988kB min:52kB low:64kB high:76kB active:0kB inactive:0kB present:9696kB pages_scanned:0 al
l_unreclaimable? yes
lowmem_reserve[]: 0 484 484 484
Node 0 DMA32 free:2764kB min:2788kB low:3484kB high:4180kB active:156kB inactive:0kB present:495940kB pages_
scanned:622 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB
Node 0 DMA32: 15*4kB 3*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2772kB
Swap cache: add 1968, delete 1968, find 5/8, race 0+0
Free swap  = 1040760kB
Total swap = 1048568kB
[...]

Comment 24 David Miller 2008-11-25 10:58:08 UTC
I had never seen the OOM killer triggers, but rather I did see that
the program could get stuck but be killable still by Ctrl-C.

The problem is that the child processes can still queue new
FDs over the AF_UNIX socket to the parents side, while the
parent is exit()'ing and (via exit time FD closing) running
UNIX garbage collection on those FDs.

There is no easy way at all to fix this.  There isn't something
like a one-to-one relationship between sockets and processes,
there is rather potentially a many-to-one relationship.  So ideas
like "don't allow sending FD over AF_UNIX socket for process that
is exit()'ing" are totally out of the question.

One idea that might work, however, is to throttle when UNIX garbage
collection is in progress.  I can't say how easy the implementation
would be.

The following might work:

1) Add wait_queue to net/unix/garbage.c
2) Create a helper function that sleeps until gc_in_progress is false
3) At the end of unix_gc() where gc_in_progress is cleared to false,
   perform a wakeup on the waitq added in #1
4) At all net/unix/af_unix.c calls of scm_send(), first invoke the
   "wait until gc_in_progress==false" thing added in #3

This should make sendmsg()'s block while any UNIX garbage collection
is in progress.  Note that this will kill scalability in the case where
many UNIX sockets are being closed while many other UNIX sockets are
doing SCM fp passing.

I don't know how common that is, probably not enough to care.

Comment 25 dann frazier 2008-11-25 20:24:33 UTC
Created attachment 324662 [details]
Implementation of David's suggestion

Here's my attempt at implementing David's suggestion. I've been running this for an hour or so now and haven't had a soft lockup or oom-killer trigger yet.

Comment 26 David Miller 2008-11-25 22:25:57 UTC
Patch looks mostly fine, could you please post this to netdev
with proper commit message and signoff?

I'd like to get this fixed upstream.

Thanks Dann.

Comment 27 dann frazier 2008-11-25 23:30:38 UTC
Sent:
 http://marc.info/?l=linux-netdev&m=122765505415944&w=2

Comment 28 Eugene Teo (Security Response) 2008-11-27 01:14:56 UTC
(In reply to comment #27)
> Sent:
>  http://marc.info/?l=linux-netdev&m=122765505415944&w=2

Updated patch:
http://marc.info/?l=linux-netdev&m=122771908731133&w=2

Comment 33 Eugene Teo (Security Response) 2008-11-27 13:06:54 UTC
(In reply to comment #28)
> (In reply to comment #27)
> > Sent:
> >  http://marc.info/?l=linux-netdev&m=122765505415944&w=2
> 
> Updated patch:
> http://marc.info/?l=linux-netdev&m=122771908731133&w=2

This is a different bug triggered by the same reproducers. I have filed a new bug for this. Please refer to bug 473259. Thanks.

Comment 34 Jan Lieskovsky 2008-12-09 15:59:04 UTC
Debian mention of this issue:

http://security-tracker.debian.net/tracker/CVE-2008-5029

Comment 35 Eugene Teo (Security Response) 2009-01-05 06:05:16 UTC
A user posted an exploit[1] to bugtraq last Friday. It is the same reproducer as the one posted in comment #1. SecurityFocus listed it as a new vulnerability -- Linux Kernel Malformed 'msghdr' Structure Local Denial of Service[2]. This is incorrect, and it should be CVE-2008-5029. Take note.

[1] http://seclists.org/bugtraq/2009/Jan/0000.html
[2] http://www.securityfocus.com/bid/33079/info

Comment 37 errata-xmlrpc 2009-11-03 22:03:08 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 3

Via RHSA-2009:1550 https://rhn.redhat.com/errata/RHSA-2009-1550.html

Comment 40 Vincent Danen 2010-12-21 17:49:39 UTC
This was addressed via:

MRG Realtime for RHEL 5 Server (RHSA-2009:0009)
Red Hat Enterprise Linux version 4 (RHSA-2009:0014)
Red Hat Enterprise Linux (v. 5.2.z server) (RHSA-2009:0021)
Red Hat Enterprise Linux version 5 (RHSA-2009:0225)
Red Hat Enterprise Linux version 3 (RHSA-2009:1550)


Note You need to log in before you can comment on or make changes to this bug.