Bug 201049

Summary: nbd-server on RHEL4U3 kernel fails to accept nbd-client connection
Product: Red Hat Enterprise Linux 4 Reporter: Mike Snitzer <msnitzer>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-16 11:19:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Snitzer 2006-08-02 15:21:44 UTC
Description of problem:

When running the nbd-client and nbd-server on separate RHEL4U3 2.6.9-34.ELsmp
based servers the nbd-client connection to an nbd-server fails to negotiate.  

There is an active thread on the nbd-general mailing list that speaks to this
issue with the nbd-server.  The relevant threads can be found here:
http://sourceforge.net/mailarchive/forum.php?thread_id=26829697&forum_id=40388
http://sourceforge.net/mailarchive/forum.php?thread_id=28939130&forum_id=40388

The executive summary of those threads is: the nbd-server's accept() is hanging,
leaving the nbd-client's negotitation hanging.  The concern is that the kernel
is what is causing this bug in nbd's user-space socket code. 

When a stock kernel.org kernel >= 2.6.15 is used on the nbd-server system the
hanging nbd-client negotiation with the nbd-server has not been reproduced.

Version-Release number of selected component (if applicable):
2.6.9-34.ELsmp

How reproducible:
This nbd-server accept() hang happens frequently in production on numerous
servers but a reliable synthetic reproducer test has not been identified. 

Steps to Reproduce:
1. start nbd-server (be it version 2.7.3 thru 2.8.5) on a server running
2.6.9-34.ELsmp
2. make nbd-client connection to nbd-server while server is busy with moderate IO
3. rebooting nbd-client system (or restarting nbd-client) eventually causes the
nbd-client's negociation to hang
  
Actual results:
nbd-client will eventually hang waiting for nbd-server's accept() with the
following output: 
Negotiation: ..

Expected results:

the nbd-client should complete the connection to the nbd-server with something like:

Negotiation: ..size = 102400KB
bs=1024, sz=102400

Additional info:

Comment 1 Neil Horman 2006-08-16 11:19:03 UTC
This thread that you provided :
http://sourceforge.net/mailarchive/forum.php?thread_id=28939130&forum_id=40388
Seems to give the answer that you need already.  This isn't a bug in the kernels
tcp accept code, but rather an application bug.  You need to use version 2.8.6
or later of the ndb server code.  Upgrade your nbd software and try again.