Bug 681578 - (CVE-2011-1083) CVE-2011-1083 kernel: excessive in kernel CPU consumption when creating large nested epoll structures
CVE-2011-1083 kernel: excessive in kernel CPU consumption when creating large...
Status: CLOSED ERRATA
Product: Security Response
Classification: Other
Component: vulnerability (Show other bugs)
unspecified
All Linux
medium Severity medium
: ---
: ---
Assigned To: Red Hat Product Security
impact=moderate,public=20110225,repor...
: Security
Depends On: 681688 681689 681690 681691 681692 681693 748668 761343
Blocks: 717854 784298
  Show dependency treegraph
 
Reported: 2011-03-02 11:07 EST by Petr Matousek
Modified: 2015-07-31 02:38 EDT (History)
31 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-24 09:53:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
increase the ep path limits and add some debug output (1.13 KB, patch)
2012-03-12 17:06 EDT, Jason Baron
no flags Details | Diff
allow unlimited number of depth 1 paths (318 bytes, patch)
2012-03-16 15:57 EDT, Jason Baron
no flags Details | Diff

  None (edit)
Description Petr Matousek 2011-03-02 11:07:57 EST
Description of problem:
The epoll subsystem allows users to create large nested epoll structures,
which the kernel will then to walk with preemption disabled, causing a denial of
service via excessive CPU consumption in the kernel.

References:
http://thread.gmane.org/gmane.linux.kernel/1105744
http://thread.gmane.org/gmane.linux.kernel/1105744/focus=1105888
http://seclists.org/oss-sec/2011/q1/337

Acknowledgements:                                                                                                                                                                            

Red Hat would like to thank Nelson Elhage for reporting this issue.
Comment 6 Eugene Teo (Security Response) 2011-06-30 04:14:15 EDT
Statement:

This issue affected the versions of Linux kernel as shipped with Red Hat Enterprise Linux 4, 5, 6, and Red Hat Enterprise MRG. It was addressed in Red Hat Enterprise Linux 5 and 6 via RHSA-2012:0150 and RHSA-2012:0862 respectively. There is no plan to address this flaw in Red Hat Enterprise Linux 4. Future updates may address this issue in Red Hat Enterprise MRG.
Comment 14 Eugene Teo (Security Response) 2011-10-24 23:38:50 EDT
Created kernel tracking bugs for this issue

Affects: fedora-all [bug 748668]
Comment 17 errata-xmlrpc 2012-02-21 00:58:12 EST
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2012:0150 https://rhn.redhat.com/errata/RHSA-2012-0150.html
Comment 18 David Castro 2012-02-24 18:01:05 EST
It may be useful for others to know this patch caused significant problems with dovecot 2.0.13/epoll on our CentOS 5.7 machines (2.6.18-238.19.1.el5, 2.6.18-274.3.1.el5, 2.6.18-274.17.1.el5).  Other dovecot versions may also be affected, but we've not determined that yet.  

http://dovecot.org/list/dovecot/2012-February/064004.html

IMAP & POP were affected and we isolated which servers were affected via the following message in our logs:

    Panic: epoll_ctl(add, 6) failed: Invalid argument

More information in the link to the dovecot mailing list above
Comment 19 Jason Baron 2012-02-26 10:45:17 EST
Hi David,

The patch does put additional limits on the amount of nesting and number of epoll fds that can be attached to an fd. I thought that the limits were higher than anybody would hit in practice but perhaps not. I can re-spin a 'debug' patch that will print more info in this situation, if you are willing to test that. Otherwise, if have a re-producer that I can try, that would be appreciated. Do you have any sense of how many epoll fds are required per fd?

Thanks,

-Jason
Comment 20 Jason Baron 2012-02-26 10:52:49 EST
Another question here is that I see that this was caused in the context of a ksplice update. I'm wondering if this issue can be re-produced without ksplice too? So we can narrow down if this is ksplice or not.

Thanks,

-Jason
Comment 21 David Castro 2012-02-27 15:28:22 EST
Hi Jason,

Thanks for getting back to us on this.  We're going to be attempting to see if the latest EL5 kernel will have an issue sans-ksplice, since the dovecot folks also recommended doing that.  Will let you know what we find.

If we do still see the issue, we'd definitely be willing to re-produce with the debug patch to get more info.

Thanks,
David
Comment 22 Jason Baron 2012-03-05 09:59:46 EST
Hi David,

Just wondering if you got a chance to test this?

Thanks!

-Jason
Comment 23 David Castro 2012-03-06 03:29:18 EST
Hey Jason,

We haven't been able to tackle it yet, however I should get to testing it this week.  We have to verify that we can reproduce the problem with enough generated traffic using ksplice in a test environment, then swap out with the kernel update to see if the issue persists.

Will provide an update soon on the results.

David
Comment 24 David Castro 2012-03-10 08:21:40 EST
Hey Jason,

Well, it looks like I can reproduce the issue in a testing environment with the latest CentOS 5.x kernel and no ksplice uptrack.  

Here's the info:
=========

Linux n99.XXXX 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

dcastro@n99:~
$ sudo /usr/sbin/uptrack-show
Warning: The cron output configuration options have been removed.
Please visit <http://www.ksplice.com/uptrack/notification-options>
for more information.
Installed updates:
None

Effective kernel version is 2.6.18-308.1.1.el5

And...flipping back to:

Linux n99.XXXX 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

and indeed, the problems go away.


To reproduce:
========

Used mstone to general load, using 1000 IMAP logins, 1000 Maildir folders consisting of 21 IMAP folders with cur/new/tmp sub-directories.  No actual mail, indexes, cache files in any of the mail folders.  Just a uidvalidity file that gets generated by dovecot.

Consistently dovecot becomes unusable and must be restarted to function normally again. 

From our dovecot logs:
==============
2012-03-10T07:20:42.019732-05:00 n99 dovecot: imap-login: proxy(bob@579): started proxying to XXX.XX.XX.XX:143: user=<bob@579>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
2012-03-10T07:20:42.027678-05:00 n99 dovecot: imap-login: Panic: epoll_ctl(add, 6) failed: Invalid argument
2012-03-10T07:20:42.028028-05:00 n99 dovecot: imap-login: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0 [0x370503baa0] -> /usr/lib64/dovecot/libdovecot.so.0 [0x370503baf6] -> /usr/lib64/dovecot/libdovecot.so.0 [0x370503afb3] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handle_add+0x118) [0x3705047708] -> /usr/lib64/dovecot/libdovecot.so
.0(io_add+0xa5) [0x3705046e15] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_init_finish+0x1c6) [0x37050355a6] -> /usr/lib64/dovecot/libdovecot-login.so.0(main+0x136) [0x3f6100bdf6] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x370301d994] -> dovecot/imap-login(main+0x39) [0x402069]
2012-03-10T07:20:42.028434-05:00 n99 dovecot: master: Error: service(imap-login): child 10947 killed with signal 6 (core dumps disabled)
2012-03-10T07:20:42.028471-05:00 n99 dovecot: master: Error: service(imap-login): command startup failed, throttling

Let me know if you'd like me to try this with a different kernel, enable core dumps for dovecot, etc.

Cheers,
David
Comment 25 Jason Baron 2012-03-12 17:06:11 EDT
Created attachment 569503 [details]
increase the ep path limits and add some debug output

So it looks like we have run up against some of the new epoll path limits code. I have a kernel patch that doubles the limits, and should print to the log when we would have hit the old soft limits. If its possible can you apply this patch to latest RHEL5 kernel and report back? If you can't re-build the kernel I can look into supplying you with a re-built kernel. Thanks!

-Jason
Comment 26 David Castro 2012-03-13 16:36:31 EDT
Thanks Jason.  I'll try to give the patch a whirl later this week.

David
Comment 27 Timo Sirainen 2012-03-16 13:14:10 EDT
Dovecot master process creates one pipe whose read side it passes to all the child processes. All of the Dovecot child processes listen on this pipe to find out when the master process dies. There are systems with tens of thousands of Dovecot child processes, so simply increasing the limit to 2000 won't help.

Postfix behaves in a similar way by polling on a pipe fd shared by all the same service processes. So if e.g. Postfix starts up over 1000 smtp or smtpd processes it'll fail in the same way.

I'd think there are also other software using a pipe to find out when the other side is dead.

Can this shared pipe be handled as a special case somehow?
Comment 28 Timo Sirainen 2012-03-16 13:20:16 EDT
Sorry, I meant write side of the pipe is passed to Dovecot child processes. They are listening for EPOLLERR | EPOLLHUP from the pipe.
Comment 29 Jason Baron 2012-03-16 15:41:15 EDT
Just so I understand, you're saying that all of the 'child' processes will do an epoll_create followed by an epoll_ctl() to attach to a single pipe? And then they are all woken up at once?

If so, this bug fix is about preventing deep nesting. So in this case it sounds like the depth is '1'. We could probably increase the level '1' nesting really large - and then leave the level '2' and beyond where they currently are, to fix this.

Thanks!

-Jason
Comment 30 Jason Baron 2012-03-16 15:43:10 EDT
Or maybe there is no level '1' limit?
Comment 31 Jason Baron 2012-03-16 15:57:53 EDT
Created attachment 570690 [details]
allow unlimited number of depth 1 paths

Here's a simple patch, which allows all depth 1 paths. The argument being that you are limited to the number of open files and processes that you can create by the sysadmin. This should still prevent the nasty infinite wakeup paths. Do the limits seem sane then?

unlimited depth 1 paths (limited by open files and processes you can create)
500 depth 2
100 depth 3
50 depth 4
10 depth 5

I didn't think there were any apps with > 1000 depth 1 paths. But I guess I was wrong :(
Comment 32 Timo Sirainen 2012-03-16 16:02:12 EDT
Yes, tons of processes doing a single epoll_create() and adding epoll_ctl() to a single pipe in it, all woken up at once when the read end of the pipe closes. A depth of 1 sounds like it, but I don't really understand what the nesting stuff in this bug is about.
Comment 33 Jason Baron 2012-03-16 16:10:33 EDT
So this bug is about the ability to attach epoll fds to other fds. So even though the max depth of these paths is 4. you can get 1000^5  or more wakeups, which effectively brings down the box.

The fix is to limit these deep paths, such that you can't get all these wakeups. So we probably could be ok with an unlimited number of depth 1 paths. And impose the limitations just on deeper paths.

The other point here is that sane software can't be creating these 'infinite' wakeup problems, b/c otherwise they wouldn't work. So fundamentally, there has to be a sane limit we can impose.
Comment 34 Jason Baron 2012-03-16 16:15:08 EDT
Also, if somebody could verify the patch from comment #31, that would be greatly appreciated. This is an important fix, which should probably be included asap. Thanks!

-Jason
Comment 35 David Castro 2012-03-16 19:18:16 EDT
Jason,

I'll be sure to test with the comment #33 patch in as well.  I'm hoping to get to it over the weekend.

David
Comment 36 Jason Baron 2012-03-16 19:36:35 EDT
Thanks David. 

Just to be clear, I think we just want to test the patch from comment #31. I wouldn't bother with the patch from comment #25.

-Jason
Comment 37 David Castro 2012-03-19 07:13:01 EDT
Jason,

Good news.  It does indeed appear that dovecot is happier, at the same load tested as before, with this new patch.  FYI, I simply built the patched kernel with the following options, the patch in #31 added to the spec file, using configs/kernel-2.6.18-x86_64.config and the kernel-2.6.18-308.el5 kernel source:

  rpmbuild -bb --with baseonly --target=`uname -m` kernel.spec

So, as for the issues that were causing us problems (Panic: epoll_ctl...), this patch looks to prevent them.  

Many thanks,

David
Comment 38 Jason Baron 2012-03-19 14:49:18 EDT
Hi David,

Thanks for taking the time to confirm the fix. Thanks for helping us get to the bottom of this issue!

Thanks,

-Jason
Comment 39 Jason Baron 2012-03-19 14:52:01 EDT
I've created a new bug:

Bug 804778 - Excessive epoll nesting fix too restrictive

For tracking purposes. Please re-direct all future comments about this to that bz.

Thanks!

-Jason
Comment 40 Jonathan Peatfield 2012-04-10 15:47:27 EDT
Hmm trying to read https://bugzilla.redhat.com/show_bug.cgi?id=804778 gives me an Access Denied error.  Is that bz intended to be marked private?
Comment 41 Nick Urbanik 2012-05-04 01:56:11 EDT
Bug 804778 remains unreadable, after nearly an additional month.  Is it a big secret?
Comment 42 Robert Scheck 2012-05-04 05:17:05 EDT
I have cross-filed case 00635459 in the Red Hat customer portal, because we
see Postfix watchdog timeouts that seem to be related to that from our point
of view. And as bug #804778 is still marked as private, I'm abusing this one
for the time being.
Comment 43 Alan Brown 2012-05-23 07:42:19 EDT
<aol> 

Me too

CaseID 00647510

</aol>
Comment 44 Alan Brown 2012-05-23 09:47:44 EDT
Jason: Is there a test or hotfix kernel with this patch onboard?
Comment 45 Richard Karhuse 2012-05-25 16:08:05 EDT
This also appears to hang Apache under any sort of reasonable load.

CaseID:  00647579 

Also, see bug:  https://bugzilla.redhat.com/show_bug.cgi?id=807860

Some other info from the Net for Apache:  http://bugs.centos.org/view.php?id=5634
Comment 46 Jason Baron 2012-05-25 16:27:18 EDT
In response to comment #44, yes there is a kernel which is currently under going testing with a fix for this issue. It should be released shortly (I realize that this is a critical issue). The fixed as mentioned is being tracked under bz #804778. Thanks.
Comment 47 Jonathan Peatfield 2012-05-25 17:14:12 EDT
It would probably avoid repeated questions if either bz #804778 was readable/public or an brief comment as to why it isn't visible...
Comment 48 Jason Baron 2012-05-25 17:22:56 EDT
There really isn't any additional info in 804778, I was only referring to it b/c it should be updated to indicate when the fix is released. There is no reason for it not to be more public short of my lack of knowledge on how to make bugzillas more visible. I just tried to open it up more. Let me know if you can't view it. Thanks.
Comment 49 Jonathan Peatfield 2012-05-25 17:27:51 EDT
Sorry for the noise.  Sadly #804778 still shows "You are not authorized to access bug #804778." it doesn't give any reasons.
Comment 50 Akemi Yagi 2012-06-02 11:33:55 EDT
I can see that the patch is now in the latest kernel 2.6.18-308.8.1.el5 released on 2012-05-29.

* Mon Apr 30 2012 Alexander Gordeev <agordeev@redhat.com> [2.6.18-308.7.1.el5]
- [fs] epoll: Don't limit non-nested epoll paths (Jason Baron) [809380 804778]
Comment 51 errata-xmlrpc 2012-06-20 03:38:01 EDT
This issue has been addressed in following products:

  Red Hat Enterprise Linux 6

Via RHSA-2012:0862 https://rhn.redhat.com/errata/RHSA-2012-0862.html
Comment 53 errata-xmlrpc 2012-07-31 16:06:34 EDT
This issue has been addressed in following products:

  Red Hat Enterprise Linux 6.2 EUS - Server Only

Via RHSA-2012:1129 https://rhn.redhat.com/errata/RHSA-2012-1129.html

Note You need to log in before you can comment on or make changes to this bug.