Bug 17175

Summary:	under load: YPBINDPROC_DOMAIN: Domain not bound
Product:	[Retired] Red Hat Linux	Reporter:	Scott Doty <scott>
Component:	ypserv	Assignee:	Alexander Larsson <alexl>
Status:	CLOSED RAWHIDE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	high
Version:	7.0	CC:	filo_rom, teg
Target Milestone:	---	Keywords:	Security
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2002-03-29 03:28:16 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Scott Doty 2000-09-01 11:33:23 UTC

With both RH 6.2 and RH 7.0 beta, ypbind can be made to become unbound from
the nis domain for several seconds.  We observed this behavior when our RH
6.2
mail hosts would bounce mail to legitimate addresses occasionally.  Since
RH 6.2
and RH 7.0 beta have different ypbinds, one would suspect a problem with
ypserv.
(I have reservations with that interpretation, the reasons for which I will
not post here --
please email me directly for that, as well as the tools to duplicate the
failure on RH 7.0 beta.) 

* Debugging output from ypbind -d:
Check new for fastest server.
ping host '127.0.0.1', domain 'nisdomain'
ypbindproc_domain_2_svc (nisdomain)      
Status: YPBIND_FAIL_VAL                  
(last two messages repeated 24 times)    
Answer for domain 'nisdomain' from server '127.0.0.1'

Inspection of the ypbind code reveals that running ypbind with "-no-ping"
will
avoid this problem.  I am testing that now.

* /etc/yp.conf:
ypserver 127.0.0.1

* /var/yp/securenets:
255.0.0.0 127.0.0.1

* /etc/ypserv.conf:
127.0.0.1       :       *       :       none    :       no

* /etc/rc.d/init.d/ypbind change:
#OTHER_YPBIND_OPTS="-broadcast" #(commented out)

 -Scott <scott>

Comment 1 Scott Doty 2000-09-01 19:41:24 UTC

results of looking for bug with ypbind running with "-no-ping" flag:
after hammering yp for 9 hours, no failure.

Comment 2 Trond Eivind Glomsrxd 2000-09-01 20:15:41 UTC

Why would this be "security"?

Comment 3 Scott Doty 2000-09-01 20:19:38 UTC

Vulnerability to a denial of service attack is a "level 0" insecurity as defined
in
the rainbow series.

http://www.radium.ncsc.mil/tpep/library/rainbow/index.html

Comment 4 Scott Doty 2000-09-01 22:22:09 UTC

 ...however, for the life of me, I can't find the table that listed
"denial of service" as a "level 0 insecurity" (or threat or vulnerability).

So until I find that reference, I'll just point out that a DoS is commonly
considered a security issue.

Why the question?  Does your handling change if it is "only" a DoS?

Comment 5 Matt Wilson 2000-10-07 23:44:18 UTC

Florien, do your new packages fix this?  If so, please release them as a
security errata.

Comment 6 Alexander Larsson 2002-03-26 20:45:38 UTC

What sort of load is required to get this. Can you mail me the setup you use to
reproduce this?

Comment 7 Scott Doty 2002-03-26 23:52:35 UTC

Get the test case scripts from
ftp://ftp.sonic.net/pub/users/scott/pinstripe-yp/__try__/yphammer.tar.gz

 -Scott

Comment 8 Scott Doty 2002-03-27 00:12:37 UTC

And oh yes:  though it's been a while, I believe I used the
h_one_match_per_second.pl script to demonstrate that the problem
wasn't related to load, but only to ypbind's occasional "repinging"
of servers.

(Of course, a server undergoing a lot of yp load will be much more likely
to use ypbind during the reping/failure window...)

 -Scott

Comment 9 Scott Doty 2002-03-27 01:22:43 UTC

I just ran my "yphammer" tools on a RH 7.2 box -- failure induced.
I'm now running ypbind with the "-no-ping" flag to see if our workaround
still works.

 -Scott

Comment 10 Alexander Larsson 2002-03-27 01:59:24 UTC

I got loads of errors from h_ypcat.pl after a while. It complains that the
portmapper didn't work. Do you get that too?

It seems there is some sort of DoS protection in the portmapper. If it gets a
load of requests from one ip it refuses to handle that ip for a while. If ypbind
where to reping the yp server when this happened that could cause it to become
unbound i guess. 

I'm not sure this is what you're experiencing though. I will try it out more
tomorrow.

Comment 11 Alexander Larsson 2002-03-27 16:13:50 UTC

Can you try the packages at:
http://people.redhat.com/alexl/RPMS/

I'm running the yphammer scripts now, and I haven't seen it become unbound yet.

Comment 12 Scott Doty 2002-03-27 22:34:50 UTC

I'm trying yphammer on the new packages -- while the server no longer
becomes unbound, I'm seeing this error:
   do_ypcall: clnt_call: RPC: Timed out

I couldn't find anything about portmap in /var/log/messages or /var/log/secure

 -Scott

Comment 13 Scott Doty 2002-03-27 22:55:42 UTC

Additionally, cat.out is showing the following error:

  yp_all: clnt_call: RPC: Timed out
  No such map passwd.byname. Reason: RPC failure on NIS operation

With everything running full-bore, the ypmatch script children were timing
out on every read.  Killing the ypcat processes seemed to bring them back.

 -Scott

Comment 14 Alexander Larsson 2002-03-27 23:58:29 UTC

That somewhat matches the behavior I saw yesterday except for the error message.
The ypcat script made yp not work from that particular ip for a while (it worked
from another machine at the same time though). I suspect there is a rate limiter
per client ip-number somewhere.

Today i saw the "clnt_call: RPC: Timed out" too. I think this just means that
the yp server is overloaded and the reply timed out. But ypbind never became
unbound.

I suspect these changes fixed it:

2001-10-09  Thorsten Kukuk  <kukuk>

	* src/serv_list.c (update_bindingfile): Make more robust, don't
          truncate old files before we have all data.

2001-08-13  Thorsten Kukuk  <kukuk>

	* src/serv_list.c (find_domain): Fix comment, get read lock back if
	  we try a second time to find a active server.
	  (test_bindings): Don't search for fastest server if current one
	  is valid and set with ypset.
	  (test_bindings): Don't invalidate old data before we have new data.

Comment 15 Scott Doty 2002-03-28 22:27:08 UTC

Well, this behavior still isn't going to work with a production server
with this sort of load.  The equivalent of multiple ypcat(1) processes
running isn't out of the question, if sendmail is set to do GCOS matching.
I think the "rate limiting" is a bogus decision, as a ypall() necessarily
works by a query for each line in the database.  In our case:

  # ypcat passwd | wc -l
    18581

BTW, it appears that ypmatch() in the c library for 7.2 talks directly
with ypserv -- stracing the yphammer match script shows this to be the
case.  Also, ypcat(1) and ypmatch(1) work without ypbind running at all.

I'm currently running h_ypmatch.pl with multiple children to see if
that breaks it.  Right now I'm hammering it with 500 "match" children and
no "cat" children -- so far, so good.

Finally, note that the h_ypmatch.pl script has an off-by-one error when
randomly selecting a uname to look up:

--- h_ypmatch.pl~       Tue Mar 26 21:20:45 2002
+++ h_ypmatch.pl        Thu Mar 28 14:06:06 2002
@@ -41,7 +41,7 @@
        }
 close(Y) || die("close ypcat failed");
 $userbracket = scalar @users;
-$userbracket++;
+#$userbracket++;
 warn("reloaded");
 0;
 }

Comment 16 Alexander Larsson 2002-03-29 03:28:11 UTC

Yeah, I noticed and fixed the off by one bug.

Basically, if you overload a server, any server, yp or not, it's gonna start
dropping requests. There is just no theoretical way to serve infinite amounts of
clients. You're gonna start droping packets, overflowing queues etc. And the
clients *are* eventually gonna start getting timeouts, even if the server
machine has enough bandwidth and memory to queue the requests.

One thing you can do to combat this is to limit the amount of work you do for
any particular host, so that an overload from one host doesn't make you DoS
other hosts. I'm not sure ypserv does this, it was pure speculation from my
part, but the behaviour looked like it might.

If you think that this is "bogus" then you can upgrade your machine to an
infinitely fast machine on an infinitely fast network to fix this problem.

That said, there might be some way to make it scale slightly better for this
load so that you can handle a bit higher load. But there is no magic bullet, you
need to scale up the server machine as workloads increase.

About ypcat and ypmatch working with ypbind not running, that is only partially
true. You must have ran ypbind once, and the binding files it writes in
/var/yp/bindings for the current domain must still be valid (e.g. the yp server
didn't change ip or port). The binding files are normally removed if you use the
ypbind init script to shut down ypbind, but if you kill it manually they'll
still be there.

All yp accesses (even the ones in libc) go directly to the server. All ypbind
does is look up the best yp server to use and writes its address and port in
/var/yp/bindings. No communication ever goes "through" ypbind.

Comment 17 Alexander Larsson 2002-04-08 15:45:36 UTC

Anyway, I'll consider this particular bug fixed.