Description of problem: When binding to a specific set of servers and not using broadcasting the ypbind daemon does not emit any error when it is unable to bind to any server. Also, even if broadcasting is enabled the error message emitted when no NIS servers can be bound to is less than clear: ypbind[1234]: broadcast: RPC: : Timeout. Furthermore, the ypbind code as a whole leaves a lot to be desired. For example, Error messages for conditions that a sysadmin needs to know about are logged as level LOG_DEBUG. In other cases unexpected failures result in no diagnostic message, not even at LOG_DEBUG level. For example, if the malloc of the "pings" array in ping_all() fails the function silently returns. In other cases it would make more sense to use a LOG_INFO level rather than LOG_DEBUG. For example, in ping_all() if a host doesn't respond a sysadmin might want to know that without having to also wade through a lot of debug output. There are also coding practices that lead to errors. For example, the (struct binding *)->server array has no sentinal element yet there are numerous loops which iterate over the array checking only for server.host==NULL as the termination condition. This will, obviously, lead to walking off the end of the array if _MAXSERVER servers are defined for a domain. In the ping_all() function implementation that is compiled when USE_BROADCAST is defined the remove_bindingfile() function is called if ypservers were registered with the portmapper but none responded to the clnt_call(YPPROC_DOMAIN_NONACK). Yet if no servers were found registered with the portmapper it simply returns without calling remove_bindingfile(). That seems inconsistent, and incorrect, to me. Also, since "found" is initialized to -1 and never set to zero the following condition will never be true and thus remove_bindingfile() will never be called: if (!found) remove_bindingfile(list->domain); For both of those reasons it isn't possible to check for the absence of a domain binding file as an indication that binding has been lost. The non USE_BROADCAST version of ping_all() has similar problems. Also, the documentation states: If all given server are down, ypbind will not switch to use broadcast. Yet the logic of do_binding() is if (!ping_all (&domainlist[i]) && domainlist[i].use_broadcast) do_broadcast (&domainlist[i]); Obviously it was intended that you could specify a list of NIS hostnames as well as use the "domain $nisdomain broadcast" directive. So it would appear the documentation needs some clarification. The man page also needs to document the limit on number of servers for a given domain. In short, this code needs a thorough review. But for this specific issue it would appear sufficient to fix the ping_all() function to use a common exit code path when servers are defined and none respond. In that code path add a log_msg() call. Whether LOG_WARN or LOG_ERR log level is appropriate is debatable. Version-Release number of selected component (if applicable): ypbind 1.12-5
Created attachment 129280 [details] ypbind_log_msg.patch
----- Additional Comments From samudrala.com(prefers email via sri.com) 2006-05-16 19:04 EDT ------- Fix ypbind to log error/info messages wnen a server dosen't respond This patch fixes ypbind to log error/info messages when a server doesn't respond. Specifically it - adds a new LOG_ERR level log message when no response is received from any server listed in the configuration file. - changes the conditional LOG_DEBUG level log message to unconditional LOG_INFO level log message when a particular server doesn't respond. - Fixes a couple of bugs in ping_all() routines in serv_list.c - 'found' incorrectly initialised to -1 instead of 0. - remove_binding_file() not called in certain cases when no server is responding. The patch is against ypbind-1.12-5.21.9.src.rpm.
----- Additional Comments From mikosh.com 2006-05-17 12:00 EDT ------- Testing this patch I found that it does log error messages when losing the binding to an explicit NIS server. However, it always displays the server name listed in the /etc/yp.conf file rather than the currently bound server. For example: In my test example, here is the /etc/yp.conf file I used: domain yptest server linux6 broadcast There are two NIS servers in my test env: linux6, and linux7 Initially, I have ypserv down on linux6 and up on linux7, and when I start ypbind, the following messages are logged: May 17 10:36:58 linux3 ypbind: ypbind startup succeeded May 17 10:36:58 linux3 ypbind: bound to NIS server linux7.rsbc.ibm.com Then, when I bring ypserv down on linux7 and up on linux6 the following messages are logged: May 17 10:37:38 linux3 ypbind[31948]: host 'linux6' doesn't answer. May 17 10:37:38 linux3 ypbind[31948]: No response for domain 'yptest' from any server The above error message should indicate that 'linux7' doesn't answer as it was the server it was bound to. As I continued to toggle which server was and up down, the same above two error messages were displayed; however, it always indicated 'linux6' doesn't answer. Again, I suspect that the patch is not using the server name of the current server, but rather the name from the yp.conf file. In addition, it would be very helpful if a message was logged when the client was rebound to a server, and which one.
----- Additional Comments From samudrala.com(prefers email via sri.com) 2006-05-17 13:01 EDT ------- Ross, Instead of the broadcast, could you try the following configuration that explicity lists all the servers in /etc/yp.conf. domain yptest server linux6 domain yptest server linux7 In the host 'linux6' doesn't answer. message, the hostname displayed is the name of the server that didn't respond. ping_all() routine tries all the the servers that are explicitly listed in yp.conf. If linux7 is not listed in yp.conf, you will not see a host doesn't answer for linux7. Also i am not sure where from you are getting these messages from. May 17 10:36:58 linux3 ypbind: ypbind startup succeeded May 17 10:36:58 linux3 ypbind: bound to NIS server linux7.rsbc.ibm.com I didn't see them in my log and also i could not locate them in the source code. Could you see where these messages are present in the source?
changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mikosh.com ------- Additional Comments From samudrala.com(prefers email via sri.com) 2006-05-17 14:54 EDT ------- Ross, Please see my response in the previous comment.
----- Additional Comments From mikosh.com 2006-05-17 15:56 EDT ------- Sridhar, When I changed the yp.conf file to: domain yptest server linux6 domain yptest server linux7 It does display the correct server name that isn't responding. I also found that the: May 17 10:36:58 linux3 ypbind: ypbind startup succeeded May 17 10:36:58 linux3 ypbind: bound to NIS server linux7.rsbc.ibm.com messages are coming from the /etc/init.d/ypbind startup script. However, if the 'broadcast' statement is included in the /etc/yp.conf the error message can be incorrect.
----- Additional Comments From jagana.com 2006-05-17 17:57 EDT ------- I have asked Scott Stevens of Credit Suisse to reconfirm their configuration setup so that we can address this issue for that configuration. Still waiting for his response. I don't think at this point we would try and fix for all the configurations yp.conf allows.
Created attachment 129463 [details] ypbind_log_msg.patch
changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #16939|0 |1 is obsolete| | ------- Additional Comments From samudrala.com(prefers email via sri.com) 2006-05-18 13:27 EDT ------- Fix ypbind to log error/warn messages when a server doesn't respond and another bug in broadcast mode. This is the updated patch that includes a fix for the bug noticed by Ross during testing when broadcast is enabled in /etc/yp.conf. It turned out to be a much serious bug than just the incorrect hostname in the message. In broadcast mode, when a response is received from a server, an entry for that server needs to be added to the list of bindings. Instead of going through the list and adding it in an empty slot, the current code blindly overwrites the first entry in the list.
----- Additional Comments From mikosh.com 2006-05-18 13:57 EDT ------- Patch works great with and without broadcast
changed: What |Removed |Added ---------------------------------------------------------------------------- Owner|jagana.com |dmosby.com ------- Additional Comments From jagana.com 2006-05-18 14:58 EDT ------- Reassigning this bug to Dale for mirroring the request to RedHat and pass it further to Univ of Illinois for RPM build
Created attachment 129885 [details] ypbind_log_msg.patch
changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #16990|0 |1 is obsolete| | ------- Additional Comments From samudrala.com(prefers email via sri.com) 2006-05-23 13:21 EDT ------- Updated patch to fix ypbind error/warning messages. This is an updated patch to ypbind that addresses the comments from CSFB. With this patch, we display - warning message 'NIS server <hostname> not responding for domain <domainname>' when we loose the connection to last bound server. - error message 'No response for domain <domainname> from any NIS server' if we don't get response from any server after trying all the configured servers and broadcast if enabled. I already did some testing on Ross's setup and it looks fine. Ross, Could you do some additional testing and validate that it doesn't break anything?
----- Additional Comments From mikosh.com 2006-05-24 14:00 EDT ------- From my testing, it appears that the new patch addresses the customer's refined request.
While the patches in both Comment #8 and Comment #12 appear to be fairly sane, I am concern about the increase verbosity that they will cause... Sure, these type of error messages are good for IBM but its not clear other customers will need to or care to see these type of messages... The last thing we want to do is fill up /var/log/message with (what could be seen as) useless error messages. So I would suggest we introduce a -l flag (for logging connect messages) or even a -v flag (for increasing verbosity) that would turn this types of messages on...
----- Additional Comments From jagana.com 2006-06-05 12:54 EDT ------- It doesn't increase verbosity since the message is displayed *only* when it is unable to connect to an active server or no server is responding and in fact, this message should help the customers in responding to the problem faster. BTW, Earlier comments might mislead you but please look at comment #16 (copied below) and which is what has been implemented in the patch: - warning message 'NIS server <hostname> not responding for domain <domainname>' when we loose the connection to last bound server. - error message 'No response for domain <domainname> from any NIS server' if we don't get response from any server after trying all the configured servers and broadcast if enabled.
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |FIXEDAWAITINGTEST Resolution| |FIX_BY_IBM ------- Additional Comments From dmosby.com (prefers email at k7fw.com) 2006-08-14 23:43 EDT ------- PMR is closed. Patch created.
IBM, We're now adding this functionality to RHEL4, and the engineer assigned to that task would like some clarification on the following portion of your patch. Specifically, they would like to know how the following section of code is related to logging server binding activities: + /* Find an empty slot or an entry that matches the server */ + for (active = 0; active < _MAXSERVER; ++active) { + if (in_use->server[active].host == NULL) + break; + if (in_use->server[active].addr.s_addr == addr->sin_addr.s_addr) + break; + } + + if (active == _MAXSERVER) { + log_msg(LOG_ERR, "eachresult: exceeded the _MAXSERVER limit\n"); + return 0; + } + + /* Add the server to the list only if it is a new one */ + if (in_use->server[active].host == NULL) { + in_use->server[active].host = strdup(host->h_name); + in_use->server[active].addr.s_addr = addr->sin_addr.s_addr; + in_use->server[active].port = addr->sin_port; + in_use->server[active].family = host->h_addrtype; + log_msg(LOG_DEBUG, + "Adding hostname:%s, addr:%s, port:%d active_idx:%d\n", + in_use->server[active].host, + inet_ntoa(in_use->server[active].addr), + in_use->server[active].port, active); + } Thanks!
----- Additional Comments From samudrala.com (prefers email at sri.com) 2006-12-07 03:01 EDT ------- The code pointed out fixes another bug that i forgot to mention in the patch description. The existing code simply overwrites the first entry in the bound server array with the new address. The patch fixes it by finding the first empty slot and inserts the new address in that slot.
----- Additional Comments From mranweil.com (prefers email at mjr.com) 2007-06-28 18:22 EDT ------- This patch is included in ypbind-1.12-5.21.10.src.rpm which is part of RHEL3.9. But this is not included in ypbind-1.17.2-13.src.rpm, which is part of RHEL4.5, nor ypbind-1.19-7.el5.src.rpm, which is in RHEL5. There appear to be some other changes there, I don't know if this is still a problem on those releases. So this was reported in RHEL3 and is fixed in RHEL3. I think we can close it. Any objections? Anyone know if it's still a problem on RHEL4 or RHEL5?
------- Comment From chavez.com 2007-08-09 16:46 EDT------- Unless there are objections, this bug will be closed Aug 13.
Created attachment 185031 [details] ypbind_log_msg.patch
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.