Bug 1502874

Summary: OSP11 -> OSP12 upgrade: running upgrade-non-controller.sh script for a split stack compute node prompts for heat-admin user password
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-commonAssignee: Marios Andreou <mandreou>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: dbecker, jjoyce, jschluet, jslagle, mandreou, mbracho, mbultel, mburns, morazi, rhel-osp-director-maint, sclewis, slinaber
Target Milestone: rcKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-7.6.3-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1503247 (view as bug list) Environment:
Last Closed: 2017-12-13 22:15:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1399762, 1503247    

Description Marius Cornea 2017-10-16 21:49:15 UTC
Description of problem:
OSP11 -> OSP12 upgrade: running upgrade-non-controller.sh script for a split stack compute node prompts for a heat-admin user password:

(undercloud) [stack@undercloud-0 ~]$ upgrade-non-controller.sh --upgrade 192.168.0.60
Mon Oct 16 17:41:11 EDT 2017 upgrade-non-controller.sh Logging to upgrade-non-controller.sh-192.168.0.60
No server with a name or ID of '192.168.0.60' exists.
Mon Oct 16 17:41:12 EDT 2017 upgrade-non-controller.sh 192.168.0.60 not known to nova. Trying it as an IP address
PING 192.168.0.60 (192.168.0.60) 56(84) bytes of data.
64 bytes from 192.168.0.60: icmp_seq=1 ttl=64 time=0.367 ms

--- 192.168.0.60 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.367/0.367/0.367/0.000 ms
Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.
heat-admin.0.60's password: 


Note that as we describe in the docs the user is required to add the 'stack' user:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/html/director_installation_and_usage/chap-configuring_basic_overcloud_requirements_on_pre_provisioned_nodes


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.2-0.20171007062244.el7ost.noarch
openstack-tripleo-common-containers-7.6.2-0.20171007061449.el7ost.noarch
openstack-tripleo-common-7.6.2-0.20171007061449.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy split stack OSP11 deployment with 1 controller + 1 compute
2. Upgrade to OSP12

Actual results:
When running the upgrade-non-controller script to upgrade the compute node a prompt asking for heat-admin user password shows up.

Expected results:
There script should be able to reach the node passwordless via stack user.

Additional info:

Comment 1 Marios Andreou 2017-10-17 05:15:18 UTC
o/ Marius there is actually an override available in the upgrade_non_controller.sh https://github.com/openstack/tripleo-common/blob/e9a82e329ee03ba02ccc6776db651a7fa1e20037/scripts/upgrade-non-controller.sh#L31 

Can you check to see if that solves the problem:

    export UPGRADE_NODE_USER="stack"
    upgrade-non-controller.sh --upgrade <foo>

We can consider adding a dedicated option (--admin-user) if that is necessary/desired.

Comment 2 Marius Cornea 2017-10-17 09:32:14 UTC
(In reply to Marios Andreou from comment #1)
> o/ Marius there is actually an override available in the
> upgrade_non_controller.sh
> https://github.com/openstack/tripleo-common/blob/
> e9a82e329ee03ba02ccc6776db651a7fa1e20037/scripts/upgrade-non-controller.
> sh#L31 
> 
> Can you check to see if that solves the problem:
> 
>     export UPGRADE_NODE_USER="stack"
>     upgrade-non-controller.sh --upgrade <foo>
> 
> We can consider adding a dedicated option (--admin-user) if that is
> necessary/desired.

OK, that works(failed on something else). I think adding an --admin-user would be more user friendly so if we can  do it would be great. 


This is the new failure:

/usr/bin/tripleo-ansible-inventory uses the heat-admin as ansible_ssh_user by default but it has the --ansible_ssh_user option so I guess we need to either use that or pass it via -u to the ansible-playbook command.

(undercloud) [stack@undercloud-0 ~]$ export UPGRADE_NODE_USER="stack"
(undercloud) [stack@undercloud-0 ~]$ upgrade-non-controller.sh --upgrade 192.168.0.60
Tue Oct 17 05:19:41 EDT 2017 upgrade-non-controller.sh Logging to upgrade-non-controller.sh-192.168.0.60
No server with a name or ID of '192.168.0.60' exists.
Tue Oct 17 05:19:42 EDT 2017 upgrade-non-controller.sh 192.168.0.60 not known to nova. Trying it as an IP address
PING 192.168.0.60 (192.168.0.60) 56(84) bytes of data.
64 bytes from 192.168.0.60: icmp_seq=1 ttl=64 time=0.210 ms

--- 192.168.0.60 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.210/0.210/0.210/0.000 ms
Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.
Tue Oct 17 05:19:43 EDT 2017 upgrade-non-controller.sh node compute-0 found with address 192.168.0.60 
Tue Oct 17 05:19:43 EDT 2017 upgrade-non-controller.sh Executing /root/tripleo_upgrade_node.sh on 192.168.0.60
Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.
 "nova_compute",
Tue Oct 17 09:19:43 UTC 2017 68c09d94-ea89-4939-88fb-b2529c58632b tripleo-upgrade compute-0 /root/tripleo_upgrade_node.sh has completed - moving onto ansible playbooks
Tue Oct 17 05:19:43 EDT 2017 upgrade-non-controller.sh Clearing any existing dir 192.168.0.60 and downloading config
The TripleO configuration has been successfully generated into: 192.168.0.60/tripleo-brhMzi-config
Tue Oct 17 05:19:51 EDT 2017 upgrade-non-controller.sh Starting the upgrade steps playbook run for compute-0 from 192.168.0.60/tripleo-brhMzi-config/

PLAY [overcloud] ******************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************************************************************************************
fatal: [192.168.0.60]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
	to retry, use: --limit @/home/stack/192.168.0.60/tripleo-brhMzi-config/upgrade_steps_playbook.retry

PLAY RECAP ************************************************************************************************************************************************************************************************************************************
192.168.0.60               : ok=0    changed=0    unreachable=1    failed=0

Comment 3 Marios Andreou 2017-10-17 15:44:59 UTC
o/ mcornea 

so there are two username related things here but we should be clear and possibly need to file a new BZ for the tracking. This BZ as currently titled is about the username that the upgrade-non-controller.sh is trying to use when ssh-ing to the overcloud nodes. As per comment #1 and #2 you can override this OK with the existing UPGRADE_NODE_USER environment variable prior to invoking the upgrade-non-controller.sh. For a better user experience we can also consider the patch at https://review.openstack.org/#/c/512638/ which wires that up via a --overcloud-user cli option to the upgrade-non-controller.sh. 

The other issue as per comment #2 is the username expected by the tripleo ansible inventory. I just spent some time digging, the ansible_ssh_user we spoke about earlier was added here https://review.openstack.org/#/c/493046/. I spotted this while trying to work out how to override it https://github.com/openstack/tripleo-validations/blob/27193e01a42aa00c2058d49c6ae7cf9cdd659aa7/scripts/tripleo-ansible-inventory#L55-L57 - so I think we should be able to use a config file specifying the desired ansible_ssh_user ? I will need to followup with someone from DFG:DF to confirm this still. We might also consider filing a new BZ for this one but it isn't yet clear if we will need any code changes here yet so we can wait for now I think.

thanks

Comment 4 James Slagle 2017-10-17 16:00:14 UTC
you need to pass --ansible_ssh_user to tripleo-ansible-inventory. if that can't be done via -i on the ansible-playbook command line, then generate the inventory to a static file first with --static-inventory <file-path>

Comment 6 mathieu bultel 2017-11-15 16:03:38 UTC
openstack-tripleo-common-7.6.3-3.el7ost

Comment 10 errata-xmlrpc 2017-12-13 22:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462