Bug 212613

Summary: luci - if node is already authenticated, user-supplied node password is ignored
Product: Red Hat Enterprise Linux 5 Reporter: Len DiMaggio <ldimaggi>
Component: congaAssignee: Ryan McCabe <rmccabe>
Status: CLOSED CURRENTRELEASE QA Contact: Corey Marthaler <cmarthal>
Severity: low Docs Contact:
Priority: low    
Version: 5.0CC: bstevens, cluster-maint, djansa, jlaska, kupcevic, rmccabe
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-23 16:43:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Len DiMaggio 2006-10-27 18:44:05 UTC
Description of problem:
If, in creating a new cluster, a node's password is entered incorectly, that
node is still rebooted and the cluster creation is attempted - but it fails.

Version-Release number of selected component (if applicable):
luci-0.8-21.el5 and ricci-0.8-21.el5

How reproducible:
100%

Steps to Reproduce:
1. Create a new cluster
2. Mis-type the password on one of the nodes
3. When the user tries to access the cluster or view the cluser list, lucci
displays this error: 

An error occurred when trying to contact any of the nodes in the <name> cluster.
  
Actual results:
lucci displays this error: 

An error occurred when trying to contact any of the nodes in the <name> cluster.

Expected results:
The bad password should be caught before the cluster creation/node rebooting is
performed.

Additional info:

This is the /var/lib/ricci/queue entry that is created on the node with the bad
password:

-----------------------------------------------
<?xml version="1.0"?>
<batch batch_id="2022661279" status="4">
        <module name="rpm" status="0">
                <response API_version="1.0" sequence="">
                        <function_response function_name="install">
                                <var mutable="false" name="success"
type="boolean" value="true"/>
                        </function_response>
                </response>
        </module>
        <module name="reboot" status="0">
                <response API_version="1.0" sequence="">
                        <function_response function_name="reboot_now">
                                <var mutable="false" name="success"
type="boolean" value="true"/>
                        </function_response>
                </response>
        </module>
        <module name="cluster" status="4">
                <response API_version="1.0" sequence="">
                        <function_response function_name="set_cluster.conf">
                                <var mutable="false" name="success"
type="boolean" value="false"/>
                                <var mutable="false" name="error_code"
type="int" value="-1"/>
                                <var mutable="false" name="error_description"
type="string" value="failed to create /etc/cluster/"/>
                        </function_response>
                </response>
        </module>
        <module name="cluster" status="5">
                <request API_version="1.0">
                        <function_call name="start_node">
                                <var mutable="false" name="cluster_startup"
type="boolean" value="true"/>
                        </function_call>
                </request>
        </module>
</batch>
-----------------------------------------------

Here's what shows up in the debug log on the luci server - 
The cluster name is "badpassword"
-----------------------------------------------
Oct 27 08:27:47 tng3-1 luci[26626]: received from tng3-2.lab.msp.redhat.com XML
"<?xml version="1.0" ?><ricci authenticated="true" success="0" version="1.0">  
   <batch batch_id="820899843" status="0">                 <module
name="cluster" status="0">                         <response API_version="1.0"
sequence="">                          <function_response function_name="status">
                                 <var mutable="false" name="status" type="xml">
                          <cluster alias="nodes_2_3" cluster_version="5"
minQuorum="1" name="nodes_2_3" quorate="true" votes="2">                       
                            <node clustered="true"
name="tng3-2.lab.msp.redhat.com" online="true" uptime="3746" votes="1"/>       
             <node clustered="true" name="tng3-3.lab.msp.redhat.com"
online="true" uptime="3746" votes="1"/>                    </cluster>          
                            </var>                             <var
mutable="false" name="success" type="boolean" value="true"/>                   
  </function_response>                    </response>                </module> 
     </batch> </ricci>"
Oct 27 08:27:48 tng3-1 luci[26626]: Connected to tng3-4.lab.msp.redhat.com:11111
Oct 27 08:27:48 tng3-1 luci[26626]: Received XML "<?xml version="1.0"?> <ricci
authenticated="true" hostname="tng3-4.lab.msp.redhat.com" os="Red Hat Enterprise
Linux Server release 4.91 (Tikanga)" version="1.0" xen_host="false"/> " from
host tng3-4.lab.msp.redhat.com
Oct 27 08:27:48 tng3-1 luci[26626]: Received header from
tng3-4.lab.msp.redhat.com: "<?xml version="1.0" ?><ricci authenticated="true"
hostname="tng3-4.lab.msp.redhat.com" os="Red Hat Enterprise Linux Server release
4.91 (Tikanga)" version="1.0" xen_host="false"/>"
Oct 27 08:27:48 tng3-1 luci[26626]: [auth 1] reported cluster_info = (,) for
tng3-4.lab.msp.redhat.com
Oct 27 08:27:48 tng3-1 luci[26626]: tng3-4.lab.msp.redhat.com reports it's in
cluster :; we expect badpassword
Oct 27 08:27:48 tng3-1 luci[26626]: no ricci agent could be found for cluster
badpassword
-----------------------------------------------

Comment 1 Ryan McCabe 2006-10-31 17:41:30 UTC
what i believe happened here is the cluster node has an associated storage
system. when you add a cluster, it creates storage system nodes as a convenience
so that storage can be probed in the storage area. when you remove a cluster,
luci will only unauthenticate to the ricci agent on the host if neither it is a
member of a managed cluster and it has a storage system. if luci does not
unauthenticate, the password provided in the dialog is ignored, as
authentication is not needed.

if there is neither a cluster node nor a storage system for the host where you
saw this, could you please change the state of the bug back to new?

Comment 2 Len DiMaggio 2006-10-31 18:12:06 UTC
The nodes in question had been defined as storage nodes - via the homebase 'add
a system' selection first. I was adding them as cluster nodes after that and
mistyped a password. 



Comment 3 Ryan McCabe 2006-10-31 21:12:04 UTC
ok, this is the expected behavior then. passwords are checked only if the nodes
are not already authenticated.

Comment 4 Len DiMaggio 2006-11-01 02:29:21 UTC
I don't think we can close out this bug - the net result for the user is that a
bad/mistyped password results in a node being rebooted and and unusable cluster.
If a bad password is entered, the node in question should not be rebooted.

Comment 5 Ryan McCabe 2006-11-01 04:10:38 UTC
i think you're seeing two different issues here: the one relating to
authentication and a second one that's causing a cluster node to misbehave when
during cluster deployment. i think the correlation is incidental.

currently, we can't check whether a password is correct without unauthenticating
first, and we've decided we don't want to do that if the node has been entered
as a storage system.

from this excerpt of the queue file:

<module name="cluster" status="4">
  <response API_version="1.0" sequence="">
    <function_response function_name="set_cluster.conf">
      <var mutable="false" name="success" type="boolean" value="false"/>
      <var mutable="false" name="error_code" type="int" value="-1"/>
      <var mutable="false" name="error_description" type="string" value="failed
to create /etc/cluster/"/>
    </function_response>
  </response>
</module>

it looks like what's causing the cluster creation to fail is the problem
described in bz# 212582 (SELinux enforcing prevents creation of
/etc/cluster/cluster.conf).

are you able to reproduce this with the latest SELinux policy and the latest
conga build?

Comment 6 Len DiMaggio 2006-11-01 14:07:21 UTC
Good point! I think that you're correct on the 2nd point - the failure to create
the cluster - I'll retest that with the new (22.el5) today.

On the other point, maybe it's a GUI/usability issue. If a user first defines a
node as a storage system, and that node is successfully authenticated, then
maybe we need a different approach for having users select those storage nodes
for inlcusion in a cluster. Right now, the user enters the node name and
password, and the password is not checked. I'm thinking that discarding any user
input may cause confusion.

Maybe we should list the authenticated nodes in a pull-down list and only
require the user to enter the names of nodes that are not already authenticated? 


Comment 7 Len DiMaggio 2006-11-02 15:14:50 UTC
Changed the summary to reflect the actual problem - "luci and ricci - if node is
already authenticated, user-supplied node password is ignored"

I haven't been able to find any real functional impact to the user. 

   - If the node has already been authenticated - by being added to the system
list before it is added to a cluster - then whatever the user enters as a
password is ignored and the node is added to the cluster.

   - If the node is authenticated when it is added to a cluster, then the
password is verified before it is rebooted as part of the cluster creation process.

So - I guess that it's not an issue for beta2 - but we should look at it for GA
- whenever we require a user to fill in a field, we should do something with the
data.



Comment 8 Ryan McCabe 2006-11-02 16:31:47 UTC
In order to reliably verify the user is authenticated to hosts, we'd have to
make one extra connection to each host to be added. If you're attempting to add
a large cluster, this overhead may be unacceptable. I have code in -HEAD that
should print a warning message informing the user that because they're already
authenticated, the password the supplied was ignored, if this case arises.

Do you think that's a sufficient fix for the problem?

Comment 9 Len DiMaggio 2007-01-24 16:29:52 UTC
With:
ricci-0.8-30.el5
luci-0.8-30.el5

No error is displayed to the user, but the cluster creation is successful.
Marking the bz as verified.