Description of problem: ----------------------- Attempt to prepare upgrade playbooks failed. openstack overcloud upgrade prepare --templates --stack overcloud \ --container-registry-file /home/stack/composable_roles/docker-images.yaml \ -e /home/stack/composable_roles/roles/nodes.yaml \ -e /home/stack/composable_roles/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/composable_roles/network/network-environment.yaml \ -e /home/stack/composable_roles/enable-tls.yaml \ -e /home/stack/composable_roles/inject-trust-anchor.yaml \ -e /home/stack/composable_roles/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/composable_roles/hostnames.yaml \ -e /home/stack/composable_roles/debug.yaml \ -e /home/stack/composable_roles/config_heat.yaml \ -e /home/stack/composable_roles/docker-images.yaml \ --roles-file /home/stack/composable_roles/roles/roles_data.yaml 2>&1 ... 2018-04-23 13:02:06Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: UPDATE_IN_PROGRESS state changed 2018-04-23 13:02:07Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: UPDATE_COMPLETE The Resource WorkflowTasks_Step2_Execution requires replacement. 2018-04-23 13:02:07Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS state changed 2018-04-23 13:07:05Z [overcloud-AllNodesDeplHeat Stack update failed. Heat Stack update failed. oySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: CREATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2018-04-23 13:07:05Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm]: UPDATE_FAILED Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR 2018-04-23 13:07:05Z [AllNodesDeploySteps]: UPDATE_FAILED resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR 2018-04-23 13:07:05Z [overcloud]: UPDATE_FAILED Resource UPDATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: 5fc8f8ee-2698-4ba0-89a7-48aad9c3c22d status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR From mistal/ceph-install-workflow.log ------------------------------------- 2018-04-23 09:06:58,277 p=7669 u=mistral | TASK [ceph-mon : assign rbd application to pool(s)] **************************** 2018-04-23 09:06:58,278 p=7669 u=mistral | task path: /usr/share/ceph-ansible/roles/ceph-mon/tasks/openstack_config.yml:17 2018-04-23 09:06:58,278 p=7669 u=mistral | Monday 23 April 2018 09:06:58 -0400 (0:00:03.191) 0:03:33.862 ********** 2018-04-23 09:06:58,886 p=7669 u=mistral | failed: [192.168.24.24] (item={u'rule_name': u'', u'pg_num': 32, u'name': u'images'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-2", "ceph", "--cluster", "ceph", "osd", "pool", "application", "enable", "images", "rbd"], "delta": "0:00:00.318723", "end": "2018-04-23 13:06:58.855011", "item": {"name": "images", "pg_num": 32, "rule_name": ""}, "msg": "n on-zero return code", "rc": 22, "start": "2018-04-23 13:06:58.536288", "stderr": "no valid command found; 10 closest matches:\nosd pool stats {<name>}\nosd pool ls {detail}\nosd pool rmsnap <poolname> <snap>\nos d pool delete <poolname> {<poolname>} {--yes-i-really-really-mean-it}\nosd pool create <poolname> <int[0-]> {<int[0-]>} {replicated|erasure} {<erasure_code_profile>} {<ruleset>} {<int>}\nosd pool rename <poolnam e> <poolname>\nosd pool rm <poolname> {<poolname>} {--yes-i-really-really-mean-it}\nosd pool set <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|nodelete|nopgchange|nosizec hange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target _dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|auid|min_read_recency_for_promote|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_mi n_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority <val> {--yes-i-really-mean-it}\nosd pool get <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num| crush_ruleset|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid|target_max_objects|target_max_bytes|cache_target_d irty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|erasure_code_profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read|hit_set_grad e_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority\nosd pool get-quota <poolname>\nError EINVAL: invalid command", "stderr_lines": ["no valid command found; 10 closest matches:", "osd pool stats {<name>}", "osd pool ls {detail}", "osd pool rmsnap <poolname> <snap>", "osd pool delete <poolname> {<poolname>} {--yes-i-really-re ally-mean-it}", "osd pool create <poolname> <int[0-]> {<int[0-]>} {replicated|erasure} {<erasure_code_profile>} {<ruleset>} {<int>}", "osd pool rename <poolname> <poolname>", "osd pool rm <poolname> {<poolname>} {--yes-i-really-really-mean-it}", "osd pool set <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scru b|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_ min_flush_age|cache_min_evict_age|auid|min_read_recency_for_promote|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interva l|recovery_priority|recovery_op_priority|scrub_priority <val> {--yes-i-really-mean-it}", "osd pool get <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|nodelete|nopgchange|n osizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid|target_max_objects|target_max_bytes|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache _target_full_ratio|cache_min_flush_age|cache_min_evict_age|erasure_code_profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_i nterval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority", "osd pool get-quota <poolname>", "Error EINVAL: invalid command"], "stdout": "", "stdout_lines": []} Version-Release number of selected component (if applicable): ------------------------------------------------------------- puppet-ceph-2.5.1-0.20180305100232.928fb38.el7ost.noarch ceph-ansible-3.1.0-0.1.beta7.el7cp.noarch openstack-tripleo-heat-templates-8.0.2-0.20180414062830.5f869f2.el7ost.noarch Steps to Reproduce: ------------------- 1. Upgrade UC to RHOS-13 (2018-04-19.2) 2. If custom roles-data file is used adjust it to remove FluendtdClient and ManilaBackendGeneric services 3. Prepare RHOS-13 images ( 4. Run 'openstack overcloud upgrade prepare' and pass all the env files used during initial deployment and file with latest docker images. Actual results: --------------- Upgrade prepare attempt failed Expected results: ----------------- Upgrade prepare succeeds Additional info: ---------------- Virtual setup: 3controllers + 3database + 3 messaging + 3ceph + 2computes +2networkers
this might be an issue in Director; we hardcode the ceph_release parameter to 'luminous' [1] and it might break the upgrade playbooks we're testing removal of ceph_release parameter given ceph-ansible can gather it at runtime; should that be the cause we can move this bug to OpenStack product 1. https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/ceph-ansible/ceph-base.yaml#L217
Looks like the controller-2 node tried to use a luminous feature, 'pool application enable', while it was still running Jewel: http://paste.openstack.org/show/719748/ This might be because we enforced the ceph version in THT, which is fine for new containerized ceph deployments but not for upgrades. We don't need to enforce this variable as ceph-ansible determines it for itself. The following patch removes that enforcement: https://review.openstack.org/#/c/563632 Next step is to test and see how it affects the deployment.
Verified with openstack-tripleo-heat-templates-8.0.2-22.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086