Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: major-upgrade-composable-steps-docker on split stack deployments with ceph nodes - ceph-ansible workflow fails with Permission denied I suspect this is caused because the split stack nodes are not reacheable via the heat-admin user but via a user called 'stack'. (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud --long overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: 95f4b38d-34f2-4d13-b419-53e1d672c2e1 status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR [root@undercloud-0 stack]# cat /var/log/mistral/ceph-install-workflow.log 2017-10-18 05:15:01,832 p=21843 u=mistral | [DEPRECATION WARNING]: docker is kept for backwards compatibility but usage is discouraged. The module documentation details page may explain more about this rationale.. This feature will be removed in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. 2017-10-18 05:15:03,404 p=21843 u=mistral | PLAY [confirm whether user really meant to switch from non-containerized to containerized ceph daemons] *** 2017-10-18 05:15:03,431 p=21843 u=mistral | TASK [exit playbook, if user did not mean to switch from non-containerized to containerized daemons?] *** 2017-10-18 05:15:03,448 p=21843 u=mistral | skipping: [localhost] 2017-10-18 05:15:03,471 p=21843 u=mistral | PLAY [make sure docker is present and started] ********************************* 2017-10-18 05:15:03,478 p=21843 u=mistral | TASK [Gathering Facts] ********************************************************* 2017-10-18 05:15:03,980 p=21843 u=mistral | fatal: [192.168.0.51]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nWarning: Permanently added '192.168.0.51' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", "unreachable": true} 2017-10-18 05:15:03,981 p=21843 u=mistral | fatal: [192.168.0.64]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nWarning: Permanently added '192.168.0.64' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", "unreachable": true} 2017-10-18 05:15:03,984 p=21843 u=mistral | fatal: [192.168.0.65]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nWarning: Permanently added '192.168.0.65' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", "unreachable": true} 2017-10-18 05:15:03,986 p=21843 u=mistral | fatal: [192.168.0.50]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nWarning: Permanently added '192.168.0.50' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", "unreachable": true} 2017-10-18 05:15:03,987 p=21843 u=mistral | fatal: [192.168.0.63]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nWarning: Permanently added '192.168.0.63' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", "unreachable": true} 2017-10-18 05:15:03,990 p=21843 u=mistral | fatal: [192.168.0.52]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nWarning: Permanently added '192.168.0.52' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", "unreachable": true} 2017-10-18 05:15:03,991 p=21843 u=mistral | PLAY RECAP ********************************************************************* 2017-10-18 05:15:03,991 p=21843 u=mistral | 192.168.0.50 : ok=0 changed=0 unreachable=1 failed=0 2017-10-18 05:15:03,991 p=21843 u=mistral | 192.168.0.51 : ok=0 changed=0 unreachable=1 failed=0 2017-10-18 05:15:03,992 p=21843 u=mistral | 192.168.0.52 : ok=0 changed=0 unreachable=1 failed=0 2017-10-18 05:15:03,992 p=21843 u=mistral | 192.168.0.63 : ok=0 changed=0 unreachable=1 failed=0 2017-10-18 05:15:03,992 p=21843 u=mistral | 192.168.0.64 : ok=0 changed=0 unreachable=1 failed=0 2017-10-18 05:15:03,992 p=21843 u=mistral | 192.168.0.65 : ok=0 changed=0 unreachable=1 failed=0 2017-10-18 05:15:03,992 p=21843 u=mistral | localhost : ok=0 changed=0 unreachable=0 failed=0 Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-7.0.3-0.20171014102841.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 split stack deployment with Ceph nodes 2. Upgrade to OSP12 Actual results: Upgrade fails while running the ceph-ansible playbook because the split stack nodes are unreacheable. Expected results: With split stack nodes ansible uses the 'stack' user per the documentation. Additional info:
The ansible command appears to be so it's trying to use the tripleo-admin user to reach the nodes Command: ansible-playbook /usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml --user tripleo-admin --become --become-user root --extra-vars {"monitor_secret": "***", "ceph_conf_o verrides": {"global": {"rgw_s3_auth_use_keystone": "true", "rgw_keystone_admin_password": "***", "osd_pool_default_pgp_num": 128, "rgw_keystone_url": "http://10.0.0.16:5000", "rgw_keystone_admin_project": "service", "rgw_keystone_accepted_ roles": "Member, _member_, admin", "osd_pool_default_size": 3, "osd_pool_default_pg_num": 128, "rgw_keystone_api_version": 3, "rgw_keystone_admin_user": "swift", "rgw_keystone_admin_domain": "default"}}, "fetch_directory": "/tmp/file-mistr al-actionWim7Dr", "user_config": true, "ceph_docker_image_tag": "latest", "ceph_release": "jewel", "containerized_deployment": true, "public_network": "10.0.0.128/25", "generate_fsid": false, "monitor_address_block": "10.0.0.128/25", "admi n_secret": "***", "keys": [{"mon_cap": "allow r", "osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics", "name": "client. openstack", "key": "AQAg6+ZZAAAAABAAlyZ5Uw/EYSLGW1gZ0YxXiQ==", "mode": "0644"}, {"mon_cap": "allow r, allow command \\\\\\\"auth del\\\\\\\", allow command \\\\\\\"auth caps\\\\\\\", allow command \\\\\\\"auth get\\\\\\\", allow command \\ \\\\\"auth get-or-create\\\\\\\"", "mds_cap": "allow *", "name": "client.manila", "mode": "0644", "key": "AQAg6+ZZAAAAABAA3lxxib+rri81tZGrM3pDog==", "osd_cap": "allow rw"}, {"mon_cap": "allow rw", "osd_cap": "allow rwx", "name": "client.ra dosgw", "key": "AQAg6+ZZAAAAABAA4VsKb08ebtlPTwFqHqcHZQ==", "mode": "0644"}], "openstack_keys": [{"mon_cap": "allow r", "osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool= vms, allow rwx pool=images, allow rwx pool=metrics", "name": "client.openstack", "key": "AQAg6+ZZAAAAABAAlyZ5Uw/EYSLGW1gZ0YxXiQ==", "mode": "0644"}, {"mon_cap": "allow r, allow command \\\\\\\"auth del\\\\\\\", allow command \\\\\\\"auth c aps\\\\\\\", allow command \\\\\\\"auth get\\\\\\\", allow command \\\\\\\"auth get-or-create\\\\\\\"", "mds_cap": "allow *", "name": "client.manila", "mode": "0644", "key": "AQAg6+ZZAAAAABAA3lxxib+rri81tZGrM3pDog==", "osd_cap": "allow rw" }, {"mon_cap": "allow rw", "osd_cap": "allow rwx", "name": "client.radosgw", "key": "AQAg6+ZZAAAAABAA4VsKb08ebtlPTwFqHqcHZQ==", "mode": "0644"}], "osd_objectstore": "filestore", "pools": [], "ntp_service_enabled": false, "ceph_docker_image ": "ceph/rhceph-2-rhel7", "cluster_network": "10.0.1.0/25", "fsid": "efcfa5d6-b3c7-11e7-979d-525400fe98cd", "openstack_config": true, "ceph_docker_registry": "192.168.0.1:8787", "ceph_stable": true, "devices": ["/dev/vdb", "/dev/vdc"], "ce ph_origin": "distro", "openstack_pools": [{"rule_name": "", "pg_num": 128, "name": "volumes"}, {"rule_name": "", "pg_num": 128, "name": "backups"}, {"rule_name": "", "pg_num": 128, "name": "vms"}, {"rule_name": "", "pg_num": 128, "name": " images"}, {"rule_name": "", "pg_num": 128, "name": "metrics"}], "ip_version": "ipv4", "ireallymeanit": "yes", "docker": true} --forks 8 --ssh-common-args "-o StrictHostKeyChecking=no" --ssh-extra-args "-o UserKnownHostsFile=/dev/null" --in ventory-file /tmp/ansible-mistral-actionSQbLV7/inventory.yaml --private-key /tmp/ansible-mistral-actionSQbLV7/ssh_private_key --skip-tags package-install,with_pkg
It looks like the workflow which creates the tripleo-admin user tries to use the nova servers but for split stack there are no nova servers: in /var/log/mistral/engine.log: 2017-10-18 05:14:57.477 12036 INFO workflow_trace [req-9a3fe81b-734f-4370-aca6-f5c452a5ca69 1f7068fda10c46ebb15865f3160fe5b5 08da50fc73114b118f112d645e8631dd - default default] Starting workflow [name=tripleo.access.v1.create_admin_via_nova, input={tasks: [{u'name': u'create user tripleo-admin', u'user': {u'name': u'tripleo-admin'}}, {u'copy': {u'dest': u'/etc/sudoers.d/tripleo-admin', u'content': u'tripleo-admin ALL=(ALL) NOPASSWD:ALL\n', u'mode': 288}, u'name': u'grant admin rights to user tripleo-admin'}, {u'name': u'ensure .ssh dir exists for user tripleo-admin', u'file': {u'owner': u'tripleo-admin', u'path': u'/home/tripleo-admin/.ssh', u'state': u'directory', u'group': u'tripleo-admin', u'mode': 448}}, {u'name': u'ensure authorized_keys file exists for user tripleo-admin', u'file': {u'owner': u'tripleo-admin', u'path': u'/home/tripleo-admin/.ssh/authorized_keys', u'state': u'touch', u'group': u'tripleo-admin', u'mode': 448}}, {u'lineinfile': {u'path': u'/home/tripleo-admin/.ssh/authorized_keys', u'line': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxSo8pgiWjuwaE2nOS4gC1TKUILxyKVwTPFc1mlsqRN/3D7BQxRAen2Uy5r4ksGc975QbD2hg12cWTdVs/QrTdZU+408fjRyTA6jxPTOdALAgUvoxQriPC+fITZTIsfBPpM7/qO+jMrdnVTKQtTqG8wg+ZwIxZlOLpT+Q2FuMmtt3HGt5Co33RZTmRuZRUQe9A6hcxbEx3UySIkfCt5X1nNEy/vRRvS8Crm0au9OKQpeWFCwAz5ReczwPYFv7Q+rRryUgdWNUuUVUKJBpLekXV2fFOIx5s627QkmglL7kDnOUCOXvN6Ie30CFn4k48YCldhQTb44uE2O1JlC8JUaxD Generated by TripleO', u'regexp': u'Generated by TripleO'}, u'name': u'authorize TripleO Mistral key for user tripleo-admin'}]...] 2017-10-18 05:14:57.488 12036 INFO workflow_trace [req-9a3fe81b-734f-4370-aca6-f5c452a5ca69 1f7068fda10c46ebb15865f3160fe5b5 08da50fc73114b118f112d645e8631dd - default default] Workflow 'tripleo.access.v1.create_admin_via_nova' [IDLE -> RUNNING, msg=None] (execution_id=7d0b714c-29cb-4a05-9448-a69315150849) 2017-10-18 05:14:57.881 12036 INFO mistral.engine.engine_server [req-9a3fe81b-734f-4370-aca6-f5c452a5ca69 1f7068fda10c46ebb15865f3160fe5b5 08da50fc73114b118f112d645e8631dd - default default] Received RPC request 'on_action_complete'[action_ex_id=dfe75f79-b7cf-4c68-a8f9-e4493dbf3b81, result=Result [data=[], error=None, cancel=False]] 2017-10-18 05:14:57.898 12036 INFO workflow_trace [req-9a3fe81b-734f-4370-aca6-f5c452a5ca69 1f7068fda10c46ebb15865f3160fe5b5 08da50fc73114b118f112d645e8631dd - default default] Action 'nova.servers_list' (dfe75f79-b7cf-4c68-a8f9-e4493dbf3b81)(task=get_servers) [RUNNING -> SUCCESS, result = []] 2017-10-18 05:14:57.971 12036 INFO workflow_trace [req-9a3fe81b-734f-4370-aca6-f5c452a5ca69 1f7068fda10c46ebb15865f3160fe5b5 08da50fc73114b118f112d645e8631dd - default default] Task 'get_servers' (714174f5-4eb9-472a-a756-3adf50c542d6) [RUNNING -> SUCCESS, msg=None] (execution_id=7d0b714c-29cb-4a05-9448-a69315150849) 2017-10-18 05:14:58.069 12036 INFO workflow_trace [req-9a3fe81b-734f-4370-aca6-f5c452a5ca69 1f7068fda10c46ebb15865f3160fe5b5 08da50fc73114b118f112d645e8631dd - default default] Task 'create_admin' (181dba13-9170-4c42-a483-cd36cb4313e5) [RUNNING -> SUCCESS, msg=None] (execution_id=7d0b714c-29cb-4a05-9448-a69315150849) 2017-10-18 05:14:58.901 12036 INFO workflow_trace [req-9a3fe81b-734f-4370-aca6-f5c452a5ca69 1f7068fda10c46ebb15865f3160fe5b5 08da50fc73114b118f112d645e8631dd - default default] Workflow 'tripleo.access.v1.create_admin_via_nova' [RUNNING -> SUCCESS, msg=None] (execution_id=7d0b714c-29cb-4a05-9448-a69315150849)
One idea is to check if the user exists already (similarily to https://review.openstack.org/#/c/509001/) and skip creation if it does. In this scenario the operator will have to create the user manually on the nodes.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462