On an uptodate redhat 6.2 SMP machine new ypbind-mt, i.e. ypbind-1.7-
0.6.x.i386.rpm failed after only 3 hours of operation with message in logs
about svc_run failure.
On other uniprocessor machines it runs but cron job rmmod -as produces
constantly these errors that did not happen with ypbind-3.3:
yp_all: clnt_call: RPC: Timed out
this error will happen also on other occasions (since new ypbind failed on
me once I made a cron job to test it on all other machines but it produces
very often the previous error). Again, none of these problems happened
Upgrading ypserv to the latest 1.3.11 helped a great deal (though still not
perfect). Until upgrading ypserv would occasionally complain of "having" too
many children, not any more.
jakub: do you think this is glibc related?
Hard to say, maybe in the way glibc reacts if it doesn't get a reply quickly
(longer timeout would be better?) but then again why no timeouts with old
ypbind? I'd guess ypbind-mt is (overly?) aggressively querying ypserv and
older version of ypserv simply couldn't keep up with 70 or so clients that were
now also doing every 15min checks on which server is fastest. Newer version of
ypserv does this much better - not sure why - Changelog mentions explicitly
only RPC protocol fixes and better handling of fork calls (last thing could
explain why no more complaints from ypserv about too many children). Still
occasionally RPC timeout problems occur but at rate that it really doesn't
matter (once every day?).
Forgot to add a comment about dying ypbind-mt which is pretty serious to us.
It happened another time since the first time but we had a monitoring process
restart it authomatically. This is on our busiest machine so I cannot guess if
this is so due to load or the fact that this machine is dual CPU (but ypbind-mt
should be threaded already and safe, hm?) as it never happened on workstations
with exactly the same package. It could be a problem in glibc threads - is
svc_run thread safe?
Do you still see this? Can you try our later packages?
I haven't seen this at all in recent versions. And reporter mentioned that it
was basically "fixed" for him, so i'm closing this bug.