Description of problem: When running the nbd-client and nbd-server on separate RHEL4U3 2.6.9-34.ELsmp based servers the nbd-client connection to an nbd-server fails to negotiate. There is an active thread on the nbd-general mailing list that speaks to this issue with the nbd-server. The relevant threads can be found here: http://sourceforge.net/mailarchive/forum.php?thread_id=26829697&forum_id=40388 http://sourceforge.net/mailarchive/forum.php?thread_id=28939130&forum_id=40388 The executive summary of those threads is: the nbd-server's accept() is hanging, leaving the nbd-client's negotitation hanging. The concern is that the kernel is what is causing this bug in nbd's user-space socket code. When a stock kernel.org kernel >= 2.6.15 is used on the nbd-server system the hanging nbd-client negotiation with the nbd-server has not been reproduced. Version-Release number of selected component (if applicable): 2.6.9-34.ELsmp How reproducible: This nbd-server accept() hang happens frequently in production on numerous servers but a reliable synthetic reproducer test has not been identified. Steps to Reproduce: 1. start nbd-server (be it version 2.7.3 thru 2.8.5) on a server running 2.6.9-34.ELsmp 2. make nbd-client connection to nbd-server while server is busy with moderate IO 3. rebooting nbd-client system (or restarting nbd-client) eventually causes the nbd-client's negociation to hang Actual results: nbd-client will eventually hang waiting for nbd-server's accept() with the following output: Negotiation: .. Expected results: the nbd-client should complete the connection to the nbd-server with something like: Negotiation: ..size = 102400KB bs=1024, sz=102400 Additional info:
This thread that you provided : http://sourceforge.net/mailarchive/forum.php?thread_id=28939130&forum_id=40388 Seems to give the answer that you need already. This isn't a bug in the kernels tcp accept code, but rather an application bug. You need to use version 2.8.6 or later of the ndb server code. Upgrade your nbd software and try again.