169815 – ypbind needs to report when it has lost all bindings

Bug 169815 - ypbind needs to report when it has lost all bindings

Summary: ypbind needs to report when it has lost all bindings

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	ypbind
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Chris Feist
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-10-03 22:47 UTC by Kurtis D. Rader
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 18:53:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ypbind_log_msg.patch (3.29 KB, text/plain) 2006-05-16 23:02 UTC, IBM Bug Proxy	no flags	Details
ypbind_log_msg.patch (6.93 KB, text/plain) 2006-05-18 17:22 UTC, IBM Bug Proxy	no flags	Details
ypbind_log_msg.patch (7.00 KB, text/plain) 2006-05-23 19:25 UTC, IBM Bug Proxy	no flags	Details
ypbind_log_msg.patch (7.00 KB, text/plain) 2007-09-03 06:00 UTC, IBM Bug Proxy	no flags	Details
View All

Description Kurtis D. Rader 2005-10-03 22:47:49 UTC

Description of problem:
When binding to a specific set of servers and not using broadcasting
the ypbind daemon does not emit any error when it is unable to bind
to any server. Also, even if broadcasting is enabled the error message
emitted when no NIS servers can be bound to is less than clear:

    ypbind[1234]: broadcast: RPC: : Timeout.

Furthermore, the ypbind code as a whole leaves a lot to be desired. For
example, Error messages for conditions that a sysadmin needs to know
about are logged as level LOG_DEBUG. In other cases unexpected failures
result in no diagnostic message, not even at LOG_DEBUG level. For
example, if the malloc of the "pings" array in ping_all() fails the
function silently returns. In other cases it would make more sense to
use a LOG_INFO level rather than LOG_DEBUG. For example, in ping_all()
if a host doesn't respond a sysadmin might want to know that without
having to also wade through a lot of debug output.

There are also coding practices that lead to errors. For example, the
(struct binding *)->server array has no sentinal element yet there
are numerous loops which iterate over the array checking only for
server.host==NULL as the termination condition. This will, obviously,
lead to walking off the end of the array if _MAXSERVER servers are
defined for a domain.

In the ping_all() function implementation that is compiled when
USE_BROADCAST is defined the remove_bindingfile() function is called
if ypservers were registered with the portmapper but none responded
to the clnt_call(YPPROC_DOMAIN_NONACK). Yet if no servers were found
registered with the portmapper it simply returns without calling
remove_bindingfile().  That seems inconsistent, and incorrect, to me.
Also, since "found" is initialized to -1 and never set to zero the
following condition will never be true and thus remove_bindingfile()
will never be called:

  if (!found)
    remove_bindingfile(list->domain);

For both of those reasons it isn't possible to check for the absence
of a domain binding file as an indication that binding has been lost.

The non USE_BROADCAST version of ping_all() has similar problems.

Also, the documentation states:

    If all given server are down, ypbind will not switch to
    use broadcast.

Yet the logic of do_binding() is

      if (!ping_all (&domainlist[i]) && domainlist[i].use_broadcast)
        do_broadcast (&domainlist[i]);

Obviously it was intended that you could specify a list of NIS hostnames
as well as use the "domain $nisdomain broadcast" directive. So it would
appear the documentation needs some clarification. The man page also
needs to document the limit on number of servers for a given domain.

In short, this code needs a thorough review. But for this specific
issue it would appear sufficient to fix the ping_all() function to use
a common exit code path when servers are defined and none respond. In
that code path add a log_msg() call. Whether LOG_WARN or LOG_ERR log
level is appropriate is debatable.

Version-Release number of selected component (if applicable):
ypbind 1.12-5

Comment 1 IBM Bug Proxy 2006-05-16 23:02:56 UTC

Created attachment 129280 [details]
ypbind_log_msg.patch

Comment 2 IBM Bug Proxy 2006-05-16 23:03:15 UTC

----- Additional Comments From samudrala.com(prefers email via sri.com)  2006-05-16 19:04 EDT -------
 
Fix ypbind to log error/info messages wnen a server dosen't respond

This patch fixes ypbind to log error/info messages when a server doesn't
respond. Specifically it 
- adds a new LOG_ERR level log message when no response is received
  from any server listed in the configuration file.
- changes the conditional LOG_DEBUG level log message to unconditional
  LOG_INFO level log message when a particular server doesn't respond.
- Fixes a couple of bugs in ping_all() routines in serv_list.c
  - 'found' incorrectly initialised to -1 instead of 0.
  - remove_binding_file() not called in certain cases when no server
    is responding.

The patch is against ypbind-1.12-5.21.9.src.rpm.

Comment 3 IBM Bug Proxy 2006-05-17 16:42:03 UTC

----- Additional Comments From mikosh.com  2006-05-17 12:00 EDT -------
Testing this patch I found that it does log error messages when losing the
binding to an explicit NIS server.  However, it always displays the server name
listed in the /etc/yp.conf file rather than the currently bound server.  For
example:

In my test example, here is the /etc/yp.conf file I used:

domain yptest server linux6
broadcast

There are two NIS servers in my test env:  linux6, and linux7

Initially, I have ypserv down on linux6 and up on linux7, and when I start
ypbind, the following messages are logged:

May 17 10:36:58 linux3 ypbind: ypbind startup succeeded
May 17 10:36:58 linux3 ypbind: bound to NIS server linux7.rsbc.ibm.com

Then, when I bring ypserv down on linux7 and up on linux6 the following messages
are logged: 

May 17 10:37:38 linux3 ypbind[31948]: host 'linux6' doesn't answer.
May 17 10:37:38 linux3 ypbind[31948]: No response for domain 'yptest' from any
server

The above error message should indicate that 'linux7' doesn't answer as it was
the server it was bound to.

As I continued to toggle which server was and up down, the same above two error
messages were displayed; however, it always indicated 'linux6' doesn't answer. 
Again, I suspect that the patch is not using the server name of the current
server, but rather the name from the yp.conf file.

In addition, it would be very helpful if a message was logged when the client
was rebound to a server, and which one.

Comment 4 IBM Bug Proxy 2006-05-17 16:57:54 UTC

----- Additional Comments From samudrala.com(prefers email via sri.com)  2006-05-17 13:01 EDT -------
Ross,

Instead of the broadcast, could you try the 
following configuration that explicity lists
all the servers in /etc/yp.conf.

domain yptest server linux6
domain yptest server linux7

In the 
  host 'linux6' doesn't answer.
message, the hostname displayed is the name of the server that
didn't respond.
ping_all() routine tries all the the servers that are explicitly
listed in yp.conf. If linux7 is not listed in yp.conf, you will not
see a host doesn't answer for linux7.

Also i am not sure where from you are getting these messages from.
 May 17 10:36:58 linux3 ypbind: ypbind startup succeeded
 May 17 10:36:58 linux3 ypbind: bound to NIS server linux7.rsbc.ibm.com

I didn't see them in my log and also i could not locate them in the source
code. Could you see where these messages are present in the source?

Comment 5 IBM Bug Proxy 2006-05-17 18:52:36 UTC

changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mikosh.com




------- Additional Comments From samudrala.com(prefers email via sri.com)  2006-05-17 14:54 EDT -------
Ross,
Please see my response in the previous comment.

Comment 6 IBM Bug Proxy 2006-05-17 19:53:00 UTC

----- Additional Comments From mikosh.com  2006-05-17 15:56 EDT -------
Sridhar,

When I changed the yp.conf file to:

domain yptest server linux6
domain yptest server linux7

It does display the correct server name that isn't responding.  I also found
that the:

May 17 10:36:58 linux3 ypbind: ypbind startup succeeded
May 17 10:36:58 linux3 ypbind: bound to NIS server linux7.rsbc.ibm.com

messages are coming from the /etc/init.d/ypbind startup script.

However, if the 'broadcast' statement is included in the /etc/yp.conf the error
message can be incorrect.

Comment 7 IBM Bug Proxy 2006-05-17 21:53:18 UTC

----- Additional Comments From jagana.com  2006-05-17 17:57 EDT -------
I have asked Scott Stevens of Credit Suisse to reconfirm their configuration 
setup so that we can address this issue for that configuration. Still waiting 
for his response. I don't think at this point we would try and fix for all the 
configurations yp.conf allows.

Comment 8 IBM Bug Proxy 2006-05-18 17:22:55 UTC

Created attachment 129463 [details]
ypbind_log_msg.patch

Comment 9 IBM Bug Proxy 2006-05-18 17:23:21 UTC

changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #16939|0                           |1
        is obsolete|                            |




------- Additional Comments From samudrala.com(prefers email via sri.com)  2006-05-18 13:27 EDT -------
 
Fix ypbind to log error/warn messages when a server doesn't respond and another
bug in broadcast mode.

This is the updated patch that includes a fix for the bug noticed by Ross
during testing when broadcast is enabled in /etc/yp.conf.
It turned out to be a much serious bug than just the incorrect hostname in the 

message. In broadcast mode, when a response is received from a server, an entry

for that server needs to be added to the list of bindings. Instead of going 
through the list and adding it in an empty slot, the current code blindly 
overwrites the first entry in the list.

Comment 10 IBM Bug Proxy 2006-05-18 17:53:03 UTC

----- Additional Comments From mikosh.com  2006-05-18 13:57 EDT -------
Patch works great with and without broadcast

Comment 11 IBM Bug Proxy 2006-05-18 18:52:39 UTC

changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Owner|jagana.com           |dmosby.com




------- Additional Comments From jagana.com  2006-05-18 14:58 EDT -------
Reassigning this bug to Dale for mirroring the request to RedHat and pass it 
further to Univ of Illinois for RPM build

Comment 12 IBM Bug Proxy 2006-05-23 19:25:05 UTC

Created attachment 129885 [details]
ypbind_log_msg.patch

Comment 13 IBM Bug Proxy 2006-05-23 19:25:38 UTC

changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #16990|0                           |1
        is obsolete|                            |




------- Additional Comments From samudrala.com(prefers email via sri.com)  2006-05-23 13:21 EDT -------
 
Updated patch to fix ypbind error/warning messages.

This is an updated patch to ypbind that addresses the comments from CSFB.
With this patch, we display
- warning message 'NIS server <hostname> not responding for domain
<domainname>' when we loose the connection to last bound server.
- error message 'No response for domain <domainname> from any NIS server' if we
don't get response from any server after trying all the configured servers and
broadcast if enabled.

I already did some testing on Ross's setup and it looks fine.
Ross, Could you do some additional testing and validate that it doesn't break
anything?

Comment 14 IBM Bug Proxy 2006-05-24 17:57:28 UTC

----- Additional Comments From mikosh.com  2006-05-24 14:00 EDT -------
From my testing, it appears that the new patch addresses the customer's refined
request.

Comment 15 Steve Dickson 2006-06-05 15:45:54 UTC

While the patches in both Comment #8 and Comment #12 appear to be fairly
sane, I am concern about the increase  verbosity that they will cause...
Sure, these type of error messages are good for IBM but its not clear
other customers will need to or care to see these type of messages...
The last thing we want to do is fill up /var/log/message with (what
could be seen as) useless error messages.

So I would suggest we introduce a -l flag (for logging connect messages)
or even a -v flag (for increasing verbosity) that would turn this types
of messages on...

Comment 16 IBM Bug Proxy 2006-06-05 16:52:26 UTC

----- Additional Comments From jagana.com  2006-06-05 12:54 EDT -------
It doesn't increase verbosity since the message is displayed *only* when it is 
unable to connect to an active server or no server is responding and in fact, 
this message should help the customers in responding to the problem faster. 
BTW, Earlier comments might mislead you but please look at comment #16 (copied 
below) and which is what has been implemented in the patch:

- warning message 'NIS server <hostname> not responding for domain
<domainname>' when we loose the connection to last bound server.
- error message 'No response for domain <domainname> from any NIS server' if we
don't get response from any server after trying all the configured servers and
broadcast if enabled.

Comment 17 IBM Bug Proxy 2006-08-15 03:36:57 UTC

changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |FIXEDAWAITINGTEST
         Resolution|                            |FIX_BY_IBM




------- Additional Comments From dmosby.com (prefers email at k7fw.com)  2006-08-14 23:43 EDT -------
PMR is closed. Patch created.

Comment 18 David Aquilina 2006-12-06 23:06:03 UTC

IBM, 

We're now adding this functionality to RHEL4, and the engineer assigned to that
task would like some clarification on the following portion of your patch.
Specifically, they would like to know how the following section of code is
related to logging server binding activities: 

+      /* Find an empty slot or an entry that matches the server */
+      for (active = 0; active < _MAXSERVER; ++active) {
+         if (in_use->server[active].host == NULL)
+            break;
+         if (in_use->server[active].addr.s_addr == addr->sin_addr.s_addr)
+             break;
+      }
+
+      if (active == _MAXSERVER) {
+         log_msg(LOG_ERR, "eachresult: exceeded the _MAXSERVER limit\n");
+         return 0;
+      }
+
+      /* Add the server to the list only if it is a new one */
+      if (in_use->server[active].host == NULL) {
+        in_use->server[active].host = strdup(host->h_name);
+        in_use->server[active].addr.s_addr = addr->sin_addr.s_addr;
+        in_use->server[active].port = addr->sin_port;
+        in_use->server[active].family = host->h_addrtype;
+        log_msg(LOG_DEBUG,
+                 "Adding hostname:%s, addr:%s, port:%d active_idx:%d\n",
+                 in_use->server[active].host,
+                 inet_ntoa(in_use->server[active].addr),
+                 in_use->server[active].port, active);
+      }

Thanks!

Comment 19 IBM Bug Proxy 2006-12-07 08:05:47 UTC

----- Additional Comments From samudrala.com (prefers email at sri.com)  2006-12-07 03:01 EDT -------
The code pointed out fixes another bug that i forgot to mention in the patch
description. The existing code simply overwrites the first entry in the bound
server array with the new address. The patch fixes it by finding the first empty
slot and inserts the new address in that slot.

Comment 20 IBM Bug Proxy 2007-06-28 22:26:10 UTC

----- Additional Comments From mranweil.com (prefers email at mjr.com)  2007-06-28 18:22 EDT -------
This patch is included in ypbind-1.12-5.21.10.src.rpm which is part of RHEL3.9.

But this is not included in ypbind-1.17.2-13.src.rpm, which is part of RHEL4.5,
nor ypbind-1.19-7.el5.src.rpm, which is in RHEL5.  There appear to be some other
changes there, I don't know if this is still a problem on those releases.

So this was reported in RHEL3 and is fixed in RHEL3.  I think we can close it. 
Any objections?  Anyone know if it's still a problem on RHEL4 or RHEL5?

Comment 21 IBM Bug Proxy 2007-08-09 20:50:45 UTC

------- Comment From chavez.com 2007-08-09 16:46 EDT-------
Unless there are objections, this bug will be closed  Aug 13.

Comment 22 IBM Bug Proxy 2007-09-03 06:00:45 UTC

Created attachment 185031 [details]
ypbind_log_msg.patch

Comment 23 RHEL Program Management 2007-10-19 18:53:37 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.