OSP11 -> OSP12 upgrade: running upgrade-non-controller.sh script for a split stack compute node fails because tripleo-ansible-inventory is using the heat-admin user instead of 'stack'
Description of problem:
This bug was filed to keep track of the changes needed to invoke tripleo-ansible-inventory with stack user as ansible_ssh_user instead of the default heat-admin.
OSP11 -> OSP12 upgrade: running upgrade-non-controller.sh script for a split stack compute node fails because tripleo-ansible-inventory is using the heat-admin user instead of 'stack'
(undercloud) [stack@undercloud-0 ~]$ upgrade-non-controller.sh --upgrade 192.168.0.60
Mon Oct 16 17:41:11 EDT 2017 upgrade-non-controller.sh Logging to upgrade-non-controller.sh-192.168.0.60
No server with a name or ID of '192.168.0.60' exists.
Mon Oct 16 17:41:12 EDT 2017 upgrade-non-controller.sh 192.168.0.60 not known to nova. Trying it as an IP address
PING 192.168.0.60 (192.168.0.60) 56(84) bytes of data.
64 bytes from 192.168.0.60: icmp_seq=1 ttl=64 time=0.367 ms
--- 192.168.0.60 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.367/0.367/0.367/0.000 ms
Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.
heat-admin.0.60's password:
Note that as we describe in the docs the user is required to add the 'stack' user:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/html/director_installation_and_usage/chap-configuring_basic_overcloud_requirements_on_pre_provisioned_nodes
Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.2-0.20171007062244.el7ost.noarch
openstack-tripleo-common-containers-7.6.2-0.20171007061449.el7ost.noarch
openstack-tripleo-common-7.6.2-0.20171007061449.el7ost.noarch
How reproducible:
100%
Steps to Reproduce:
1. Deploy split stack OSP11 deployment with 1 controller + 1 compute
2. Upgrade to OSP12
Actual results:
When running the upgrade-non-controller script to upgrade the compute node a prompt asking for heat-admin user password shows up.
Expected results:
There script should be able to reach the node passwordless via stack user.
Additional info:
--- Additional comment from Marios Andreou on 2017-10-17 01:15:18 EDT ---
o/ Marius there is actually an override available in the upgrade_non_controller.sh https://github.com/openstack/tripleo-common/blob/e9a82e329ee03ba02ccc6776db651a7fa1e20037/scripts/upgrade-non-controller.sh#L31
Can you check to see if that solves the problem:
export UPGRADE_NODE_USER="stack"
upgrade-non-controller.sh --upgrade <foo>
We can consider adding a dedicated option (--admin-user) if that is necessary/desired.
--- Additional comment from Marius Cornea on 2017-10-17 05:32:14 EDT ---
(In reply to Marios Andreou from comment #1)
> o/ Marius there is actually an override available in the
> upgrade_non_controller.sh
> https://github.com/openstack/tripleo-common/blob/
> e9a82e329ee03ba02ccc6776db651a7fa1e20037/scripts/upgrade-non-controller.
> sh#L31
>
> Can you check to see if that solves the problem:
>
> export UPGRADE_NODE_USER="stack"
> upgrade-non-controller.sh --upgrade <foo>
>
> We can consider adding a dedicated option (--admin-user) if that is
> necessary/desired.
OK, that works(failed on something else). I think adding an --admin-user would be more user friendly so if we can do it would be great.
This is the new failure:
/usr/bin/tripleo-ansible-inventory uses the heat-admin as ansible_ssh_user by default but it has the --ansible_ssh_user option so I guess we need to either use that or pass it via -u to the ansible-playbook command.
(undercloud) [stack@undercloud-0 ~]$ export UPGRADE_NODE_USER="stack"
(undercloud) [stack@undercloud-0 ~]$ upgrade-non-controller.sh --upgrade 192.168.0.60
Tue Oct 17 05:19:41 EDT 2017 upgrade-non-controller.sh Logging to upgrade-non-controller.sh-192.168.0.60
No server with a name or ID of '192.168.0.60' exists.
Tue Oct 17 05:19:42 EDT 2017 upgrade-non-controller.sh 192.168.0.60 not known to nova. Trying it as an IP address
PING 192.168.0.60 (192.168.0.60) 56(84) bytes of data.
64 bytes from 192.168.0.60: icmp_seq=1 ttl=64 time=0.210 ms
--- 192.168.0.60 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.210/0.210/0.210/0.000 ms
Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.
Tue Oct 17 05:19:43 EDT 2017 upgrade-non-controller.sh node compute-0 found with address 192.168.0.60
Tue Oct 17 05:19:43 EDT 2017 upgrade-non-controller.sh Executing /root/tripleo_upgrade_node.sh on 192.168.0.60
Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.
"nova_compute",
Tue Oct 17 09:19:43 UTC 2017 68c09d94-ea89-4939-88fb-b2529c58632b tripleo-upgrade compute-0 /root/tripleo_upgrade_node.sh has completed - moving onto ansible playbooks
Tue Oct 17 05:19:43 EDT 2017 upgrade-non-controller.sh Clearing any existing dir 192.168.0.60 and downloading config
The TripleO configuration has been successfully generated into: 192.168.0.60/tripleo-brhMzi-config
Tue Oct 17 05:19:51 EDT 2017 upgrade-non-controller.sh Starting the upgrade steps playbook run for compute-0 from 192.168.0.60/tripleo-brhMzi-config/
PLAY [overcloud] ******************************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ************************************************************************************************************************************************************************************************************************
fatal: [192.168.0.60]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '192.168.0.60' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
to retry, use: --limit @/home/stack/192.168.0.60/tripleo-brhMzi-config/upgrade_steps_playbook.retry
PLAY RECAP ************************************************************************************************************************************************************************************************************************************
192.168.0.60 : ok=0 changed=0 unreachable=1 failed=0
--- Additional comment from Marios Andreou on 2017-10-17 11:44:59 EDT ---
o/ mcornea
so there are two username related things here but we should be clear and possibly need to file a new BZ for the tracking. This BZ as currently titled is about the username that the upgrade-non-controller.sh is trying to use when ssh-ing to the overcloud nodes. As per comment #1 and #2 you can override this OK with the existing UPGRADE_NODE_USER environment variable prior to invoking the upgrade-non-controller.sh. For a better user experience we can also consider the patch at https://review.openstack.org/#/c/512638/ which wires that up via a --overcloud-user cli option to the upgrade-non-controller.sh.
The other issue as per comment #2 is the username expected by the tripleo ansible inventory. I just spent some time digging, the ansible_ssh_user we spoke about earlier was added here https://review.openstack.org/#/c/493046/. I spotted this while trying to work out how to override it https://github.com/openstack/tripleo-validations/blob/27193e01a42aa00c2058d49c6ae7cf9cdd659aa7/scripts/tripleo-ansible-inventory#L55-L57 - so I think we should be able to use a config file specifying the desired ansible_ssh_user ? I will need to followup with someone from DFG:DF to confirm this still. We might also consider filing a new BZ for this one but it isn't yet clear if we will need any code changes here yet so we can wait for now I think.
thanks
--- Additional comment from James Slagle on 2017-10-17 12:00:14 EDT ---
you need to pass --ansible_ssh_user to tripleo-ansible-inventory. if that can't be done via -i on the ansible-playbook command line, then generate the inventory to a static file first with --static-inventory <file-path>
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2017:3462