Bug 162078 - ccsd performance problems
ccsd performance problems
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: ccs (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-29 14:44 EDT by Lon Hohberger
Modified: 2009-04-16 16:17 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHEL4 U2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-04 13:33:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ccsd local socket patch (11.83 KB, patch)
2005-06-29 19:00 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Lon Hohberger 2005-06-29 14:44:37 EDT
Description of problem:

ccsd uses reserved ports to authenticate that the local user is, in fact, root.
 This is good for security purposes.

A client handshake / set of gets operates like this:

        foo = ccs_connect();
        while (ccs_get(foo, "query", &response) == 0) {
                handle_response(response);
        }
        ccs_disconnect(foo);

For large numbers of queries, however, the connect() will wait for a long time
sometimes -- several seconds.  My guess is that this is related to the fact that
for each ccs_connect(), ccs_disconnect() and ccs_get() call, we're binding to a
reserved port and subsequently connect()ing to ccsd.  My simple cluster
configuration does 531 connect() calls on reserved ports when starting up - and
it pauses every few seconds.  In that time period, the setup_socket_ipv6() call
hangs several times for around 3 seconds.

Version-Release number of selected component (if applicable): RHEL4 GA


How reproducible: Sometimes.

Steps to Reproduce:
1. Create a cluster with lots of services.
2. Start rgmanager with "clurgmgrd -fd".  Sometimes, it can take whole minutes
to "build resource trees".  In this instance, it's simply querying ccsd for
information in a systematic fashion.
  
Actual results:
rgmanager (and probably other apps) take a long time to read the configuration
information from ccsd.

Expected results:
Fast response time from ccsd.


Known workarounds:

* This does not happen with "ccsd -4".  Rgmanager starts up *very* quickly with
the -4 option.


Additional info:

* There's no specific behavior as to how frequent the connect code hangs. 
Sometimes it's after 20 connections, sometimes it's after 300.  I suspect it's
related to running out of reserved ports.
* This might be a case of the socket getting SOREUSEADDR in libccs for ipv4, but
not ipv6
Comment 1 Lon Hohberger 2005-06-29 14:46:41 EDT
Correction: SOREUSEADDR is set, but the way we do port selection might not be
appropriate.
Comment 3 Lon Hohberger 2005-06-29 19:00:29 EDT
Created attachment 116155 [details]
ccsd local socket patch

This patch allows libccs/ccsd to use local (UNIX domain) sockets for
communication, which obviates the TIME_WAIT and limited count of available
ports we have with IP protocols.  The permissions on the socket are &~077 when
created, so only root should be allowed to communicate over that socket.

This patch is compatible with existing installations:

* All applications built statically against the older libccs.a (which only uses
IP for communications) are forward-compatible with the new ccsd, and
* All apps built against the new libccs (with UNIX domain socket support) will
fall back to IPv6/IPv4 if local socket communication with ccsd is unavailable.
* Administrators may disable ccsd's use of UNIX domain sockets by running it
with the new -I option.
Comment 4 Lon Hohberger 2005-06-29 19:03:13 EDT
Note: Existing users of linux-cluster will only benefit from this patch after a
rebuild of each affected application, as most are (currently) statically built
against libccs.
Comment 6 Jonathan Earl Brassow 2005-10-04 13:33:48 EDT
In RHEL4 U2

Note You need to log in before you can comment on or make changes to this bug.