Bug 749139

Summary: Ricci frequently times out
Product: Red Hat Enterprise Linux 6 Reporter: Gonzalo Servat <gservat>
Component: ricciAssignee: Chris Feist <cfeist>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: an.euroford, cluster-maint, jwest
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-30 23:11:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gonzalo Servat 2011-10-26 10:00:33 UTC
Description of problem:

Running luci in a dedicated VM (RH6.1) and ricci on a physical box (RH6.1). When clicking around in luci, some requests load fine and intermittently others just hang. It is clearly ricci hanging, as restarting ricci on the client makes luci return an error. Normally clicking a few more times on the same link eventually loads the page.

Luci was responding fine up until the point that I got my first 1 node cluster configured with a running service (MySQL).

Version-Release number of selected component (if applicable):

luci-0.23.0-13.el6.x86_64
ricci-0.16.2-35.el6.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. Open luci (https://<ip>:8084)
2. Click on "Manage Clusters"
3. Even clicking on manage clusters often hangs. Otherwise any other link will intermittently hang.
  
Actual results:

Browser shows "Waiting for <luci host>..." forever (never loads).

Expected results:

Quickly load up the requested page.

Additional info:

Starting up ricci in debug mode shows the following for a successful request:

client added
ClientInstance.cpp:144: exception: SSL_read() error: SSL_ERROR_SYSCALL: Success
request completed in 299 milliseconds
client removed

Whenever a hanging request takes place, I see the following:

client added
ClientInstance.cpp:144: exception: Receive timeout
request completed in 121544 milliseconds
client removed

Luci simply shows "20:19:50,243 ERROR [luci.lib.ricci_communicator] An empty XML response was recei
ved from <ricci host>:11111"

Comment 1 Gonzalo Servat 2011-10-26 10:01:45 UTC
FWIW, starting up ricci shows:

# ricci -f -d -u ricci
failed to load authorized CAs
failed to load authorized CAs

Is this concerning?

Comment 3 Chris Feist 2011-10-26 14:34:26 UTC
Which version of modcluster do you have installed on both nodes?

Comment 4 Chris Feist 2011-10-26 14:37:20 UTC
Also, can you temporarily disable selinux, reboot and verify that you get the same "failed to load authorized CAs" error.

Comment 6 Gonzalo Servat 2011-10-26 20:48:10 UTC
SELinux is already disabled, Chris.

As for modcluster, it is only installed on the ricci (client) side:

modcluster-0.16.2-10.el6.x86_64

Comment 7 Gonzalo Servat 2011-10-26 20:49:42 UTC
Interesting... yesterday I deleted the cluster and started again. Now I can't reproduce the problem! And starting up ricci doesn't show the "failed to load authorized CAs" anymore!?

Comment 10 Chris Feist 2011-10-27 16:27:03 UTC
I'm closing this bug now since we don't have a reproducer, but if it happens again, please re-open this bug and send the contents of your '/var/lib/ricci' directory.  This is where ricci attempts to open up it's certificates, if there are errors in there, we should be able to detect them.

Thanks!

Comment 11 Gonzalo Servat 2012-01-06 03:45:33 UTC
Hi Chris,

I am able to replicate this issue again, however the luci part is running on RHEL 6.1, and the ricci/modcluster client is RHEL 5.7.

When I try to create the cluster, it just hangs and on starting up ricci in debug mode, it shows:

# ricci -d -f -u 102
failed to load authorized CAs
failed to load authorized CAs
client added
client added
exception: SSL_read() error: SSL_ERROR_SYSCALL: Success
request completed in 119 milliseconds
client added
exception: SSL_read() error: SSL_ERROR_SYSCALL: Success
request completed in 77 milliseconds
exception: SSL_read() error: SSL_ERROR_SYSCALL: Success
request completed in 141 milliseconds
client removed
client removed
client removed

The files in /var/lib/ricci/certs are:

# find .
.
./clients
./clients/client_cert_sJw8uX
./cacert.config
./privkey.pem
./cacert.pem

Any ideas?

Comment 13 euroford 2012-03-19 07:45:41 UTC
In ricci_defines.h
#define CLIENT_AUTH_CAs_PATH    "/var/lib/ricci/certs/auth_CAs.pem"

ricci read the client CA from this file.

Comment 14 Chris Feist 2012-04-30 23:11:31 UTC
I don't believe running luci on 6.1 and the clients on 5.7 is supported.  If you're configuring 5.7 nodes you'll want to use conga on 5.7.