Bug 17175
Summary: | under load: YPBINDPROC_DOMAIN: Domain not bound | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Scott Doty <scott> |
Component: | ypserv | Assignee: | Alexander Larsson <alexl> |
Status: | CLOSED RAWHIDE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 | CC: | filo_rom, teg |
Target Milestone: | --- | Keywords: | Security |
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2002-03-29 03:28:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Scott Doty
2000-09-01 11:33:23 UTC
results of looking for bug with ypbind running with "-no-ping" flag: after hammering yp for 9 hours, no failure. Why would this be "security"? Vulnerability to a denial of service attack is a "level 0" insecurity as defined in the rainbow series. http://www.radium.ncsc.mil/tpep/library/rainbow/index.html ...however, for the life of me, I can't find the table that listed "denial of service" as a "level 0 insecurity" (or threat or vulnerability). So until I find that reference, I'll just point out that a DoS is commonly considered a security issue. Why the question? Does your handling change if it is "only" a DoS? Florien, do your new packages fix this? If so, please release them as a security errata. What sort of load is required to get this. Can you mail me the setup you use to reproduce this? Get the test case scripts from ftp://ftp.sonic.net/pub/users/scott/pinstripe-yp/__try__/yphammer.tar.gz -Scott And oh yes: though it's been a while, I believe I used the h_one_match_per_second.pl script to demonstrate that the problem wasn't related to load, but only to ypbind's occasional "repinging" of servers. (Of course, a server undergoing a lot of yp load will be much more likely to use ypbind during the reping/failure window...) -Scott I just ran my "yphammer" tools on a RH 7.2 box -- failure induced. I'm now running ypbind with the "-no-ping" flag to see if our workaround still works. -Scott I got loads of errors from h_ypcat.pl after a while. It complains that the portmapper didn't work. Do you get that too? It seems there is some sort of DoS protection in the portmapper. If it gets a load of requests from one ip it refuses to handle that ip for a while. If ypbind where to reping the yp server when this happened that could cause it to become unbound i guess. I'm not sure this is what you're experiencing though. I will try it out more tomorrow. Can you try the packages at: http://people.redhat.com/alexl/RPMS/ I'm running the yphammer scripts now, and I haven't seen it become unbound yet. I'm trying yphammer on the new packages -- while the server no longer becomes unbound, I'm seeing this error: do_ypcall: clnt_call: RPC: Timed out I couldn't find anything about portmap in /var/log/messages or /var/log/secure -Scott Additionally, cat.out is showing the following error: yp_all: clnt_call: RPC: Timed out No such map passwd.byname. Reason: RPC failure on NIS operation With everything running full-bore, the ypmatch script children were timing out on every read. Killing the ypcat processes seemed to bring them back. -Scott That somewhat matches the behavior I saw yesterday except for the error message. The ypcat script made yp not work from that particular ip for a while (it worked from another machine at the same time though). I suspect there is a rate limiter per client ip-number somewhere. Today i saw the "clnt_call: RPC: Timed out" too. I think this just means that the yp server is overloaded and the reply timed out. But ypbind never became unbound. I suspect these changes fixed it: 2001-10-09 Thorsten Kukuk <kukuk> * src/serv_list.c (update_bindingfile): Make more robust, don't truncate old files before we have all data. 2001-08-13 Thorsten Kukuk <kukuk> * src/serv_list.c (find_domain): Fix comment, get read lock back if we try a second time to find a active server. (test_bindings): Don't search for fastest server if current one is valid and set with ypset. (test_bindings): Don't invalidate old data before we have new data. Well, this behavior still isn't going to work with a production server with this sort of load. The equivalent of multiple ypcat(1) processes running isn't out of the question, if sendmail is set to do GCOS matching. I think the "rate limiting" is a bogus decision, as a ypall() necessarily works by a query for each line in the database. In our case: # ypcat passwd | wc -l 18581 BTW, it appears that ypmatch() in the c library for 7.2 talks directly with ypserv -- stracing the yphammer match script shows this to be the case. Also, ypcat(1) and ypmatch(1) work without ypbind running at all. I'm currently running h_ypmatch.pl with multiple children to see if that breaks it. Right now I'm hammering it with 500 "match" children and no "cat" children -- so far, so good. Finally, note that the h_ypmatch.pl script has an off-by-one error when randomly selecting a uname to look up: --- h_ypmatch.pl~ Tue Mar 26 21:20:45 2002 +++ h_ypmatch.pl Thu Mar 28 14:06:06 2002 @@ -41,7 +41,7 @@ } close(Y) || die("close ypcat failed"); $userbracket = scalar @users; -$userbracket++; +#$userbracket++; warn("reloaded"); 0; } Yeah, I noticed and fixed the off by one bug. Basically, if you overload a server, any server, yp or not, it's gonna start dropping requests. There is just no theoretical way to serve infinite amounts of clients. You're gonna start droping packets, overflowing queues etc. And the clients *are* eventually gonna start getting timeouts, even if the server machine has enough bandwidth and memory to queue the requests. One thing you can do to combat this is to limit the amount of work you do for any particular host, so that an overload from one host doesn't make you DoS other hosts. I'm not sure ypserv does this, it was pure speculation from my part, but the behaviour looked like it might. If you think that this is "bogus" then you can upgrade your machine to an infinitely fast machine on an infinitely fast network to fix this problem. That said, there might be some way to make it scale slightly better for this load so that you can handle a bit higher load. But there is no magic bullet, you need to scale up the server machine as workloads increase. About ypcat and ypmatch working with ypbind not running, that is only partially true. You must have ran ypbind once, and the binding files it writes in /var/yp/bindings for the current domain must still be valid (e.g. the yp server didn't change ip or port). The binding files are normally removed if you use the ypbind init script to shut down ypbind, but if you kill it manually they'll still be there. All yp accesses (even the ones in libc) go directly to the server. All ypbind does is look up the best yp server to use and writes its address and port in /var/yp/bindings. No communication ever goes "through" ypbind. Anyway, I'll consider this particular bug fixed. |