Red Hat Bugzilla – Bug 17175
under load: YPBINDPROC_DOMAIN: Domain not bound
Last modified: 2008-05-01 11:37:58 EDT
With both RH 6.2 and RH 7.0 beta, ypbind can be made to become unbound from
the nis domain for several seconds. We observed this behavior when our RH
mail hosts would bounce mail to legitimate addresses occasionally. Since
and RH 7.0 beta have different ypbinds, one would suspect a problem with
(I have reservations with that interpretation, the reasons for which I will
not post here --
please email me directly for that, as well as the tools to duplicate the
failure on RH 7.0 beta.)
* Debugging output from ypbind -d:
Check new for fastest server.
ping host '127.0.0.1', domain 'nisdomain'
(last two messages repeated 24 times)
Answer for domain 'nisdomain' from server '127.0.0.1'
Inspection of the ypbind code reveals that running ypbind with "-no-ping"
avoid this problem. I am testing that now.
127.0.0.1 : * : none : no
* /etc/rc.d/init.d/ypbind change:
#OTHER_YPBIND_OPTS="-broadcast" #(commented out)
results of looking for bug with ypbind running with "-no-ping" flag:
after hammering yp for 9 hours, no failure.
Why would this be "security"?
Vulnerability to a denial of service attack is a "level 0" insecurity as defined
the rainbow series.
...however, for the life of me, I can't find the table that listed
"denial of service" as a "level 0 insecurity" (or threat or vulnerability).
So until I find that reference, I'll just point out that a DoS is commonly
considered a security issue.
Why the question? Does your handling change if it is "only" a DoS?
Florien, do your new packages fix this? If so, please release them as a
What sort of load is required to get this. Can you mail me the setup you use to
Get the test case scripts from
And oh yes: though it's been a while, I believe I used the
h_one_match_per_second.pl script to demonstrate that the problem
wasn't related to load, but only to ypbind's occasional "repinging"
(Of course, a server undergoing a lot of yp load will be much more likely
to use ypbind during the reping/failure window...)
I just ran my "yphammer" tools on a RH 7.2 box -- failure induced.
I'm now running ypbind with the "-no-ping" flag to see if our workaround
I got loads of errors from h_ypcat.pl after a while. It complains that the
portmapper didn't work. Do you get that too?
It seems there is some sort of DoS protection in the portmapper. If it gets a
load of requests from one ip it refuses to handle that ip for a while. If ypbind
where to reping the yp server when this happened that could cause it to become
unbound i guess.
I'm not sure this is what you're experiencing though. I will try it out more
Can you try the packages at:
I'm running the yphammer scripts now, and I haven't seen it become unbound yet.
I'm trying yphammer on the new packages -- while the server no longer
becomes unbound, I'm seeing this error:
do_ypcall: clnt_call: RPC: Timed out
I couldn't find anything about portmap in /var/log/messages or /var/log/secure
Additionally, cat.out is showing the following error:
yp_all: clnt_call: RPC: Timed out
No such map passwd.byname. Reason: RPC failure on NIS operation
With everything running full-bore, the ypmatch script children were timing
out on every read. Killing the ypcat processes seemed to bring them back.
That somewhat matches the behavior I saw yesterday except for the error message.
The ypcat script made yp not work from that particular ip for a while (it worked
from another machine at the same time though). I suspect there is a rate limiter
per client ip-number somewhere.
Today i saw the "clnt_call: RPC: Timed out" too. I think this just means that
the yp server is overloaded and the reply timed out. But ypbind never became
I suspect these changes fixed it:
2001-10-09 Thorsten Kukuk <email@example.com>
* src/serv_list.c (update_bindingfile): Make more robust, don't
truncate old files before we have all data.
2001-08-13 Thorsten Kukuk <firstname.lastname@example.org>
* src/serv_list.c (find_domain): Fix comment, get read lock back if
we try a second time to find a active server.
(test_bindings): Don't search for fastest server if current one
is valid and set with ypset.
(test_bindings): Don't invalidate old data before we have new data.
Well, this behavior still isn't going to work with a production server
with this sort of load. The equivalent of multiple ypcat(1) processes
running isn't out of the question, if sendmail is set to do GCOS matching.
I think the "rate limiting" is a bogus decision, as a ypall() necessarily
works by a query for each line in the database. In our case:
# ypcat passwd | wc -l
BTW, it appears that ypmatch() in the c library for 7.2 talks directly
with ypserv -- stracing the yphammer match script shows this to be the
case. Also, ypcat(1) and ypmatch(1) work without ypbind running at all.
I'm currently running h_ypmatch.pl with multiple children to see if
that breaks it. Right now I'm hammering it with 500 "match" children and
no "cat" children -- so far, so good.
Finally, note that the h_ypmatch.pl script has an off-by-one error when
randomly selecting a uname to look up:
--- h_ypmatch.pl~ Tue Mar 26 21:20:45 2002
+++ h_ypmatch.pl Thu Mar 28 14:06:06 2002
@@ -41,7 +41,7 @@
close(Y) || die("close ypcat failed");
$userbracket = scalar @users;
Yeah, I noticed and fixed the off by one bug.
Basically, if you overload a server, any server, yp or not, it's gonna start
dropping requests. There is just no theoretical way to serve infinite amounts of
clients. You're gonna start droping packets, overflowing queues etc. And the
clients *are* eventually gonna start getting timeouts, even if the server
machine has enough bandwidth and memory to queue the requests.
One thing you can do to combat this is to limit the amount of work you do for
any particular host, so that an overload from one host doesn't make you DoS
other hosts. I'm not sure ypserv does this, it was pure speculation from my
part, but the behaviour looked like it might.
If you think that this is "bogus" then you can upgrade your machine to an
infinitely fast machine on an infinitely fast network to fix this problem.
That said, there might be some way to make it scale slightly better for this
load so that you can handle a bit higher load. But there is no magic bullet, you
need to scale up the server machine as workloads increase.
About ypcat and ypmatch working with ypbind not running, that is only partially
true. You must have ran ypbind once, and the binding files it writes in
/var/yp/bindings for the current domain must still be valid (e.g. the yp server
didn't change ip or port). The binding files are normally removed if you use the
ypbind init script to shut down ypbind, but if you kill it manually they'll
still be there.
All yp accesses (even the ones in libc) go directly to the server. All ypbind
does is look up the best yp server to use and writes its address and port in
/var/yp/bindings. No communication ever goes "through" ypbind.
Anyway, I'll consider this particular bug fixed.