Bug 1570830

Summary: [UPGRADES] no valid command found; 10 closest matches:\nosd pool stats {<name>}
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: augol, gfidente, johfulto, mandreou, mbultel, mburns, mcornea, morazi, sclewis
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:52:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2018-04-23 13:29:56 UTC
Description of problem:
-----------------------
Attempt to prepare upgrade playbooks failed.

openstack overcloud upgrade prepare --templates --stack overcloud \
            --container-registry-file /home/stack/composable_roles/docker-images.yaml \
            -e /home/stack/composable_roles/roles/nodes.yaml \
            -e /home/stack/composable_roles/internal.yaml \
            -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
            -e /home/stack/composable_roles/network/network-environment.yaml \
            -e /home/stack/composable_roles/enable-tls.yaml \
            -e /home/stack/composable_roles/inject-trust-anchor.yaml \
            -e /home/stack/composable_roles/public_vip.yaml \
            -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
            -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
            -e /home/stack/composable_roles/hostnames.yaml \
            -e /home/stack/composable_roles/debug.yaml \
            -e /home/stack/composable_roles/config_heat.yaml \
            -e /home/stack/composable_roles/docker-images.yaml \
           --roles-file /home/stack/composable_roles/roles/roles_data.yaml 2>&1
...
2018-04-23 13:02:06Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: UPDATE_IN_PROGRESS  state changed
2018-04-23 13:02:07Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: UPDATE_COMPLETE  The Resource WorkflowTasks_Step2_Execution requires replacement.
2018-04-23 13:02:07Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS  state changed
2018-04-23 13:07:05Z [overcloud-AllNodesDeplHeat Stack update failed.
Heat Stack update failed.
oySteps-v64f6bcjezzm.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
2018-04-23 13:07:05Z [overcloud-AllNodesDeploySteps-v64f6bcjezzm]: UPDATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-04-23 13:07:05Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-04-23 13:07:05Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud UPDATE_FAILED 

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: 5fc8f8ee-2698-4ba0-89a7-48aad9c3c22d
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR

From mistal/ceph-install-workflow.log
-------------------------------------
2018-04-23 09:06:58,277 p=7669 u=mistral |  TASK [ceph-mon : assign rbd application to pool(s)] ****************************
2018-04-23 09:06:58,278 p=7669 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-mon/tasks/openstack_config.yml:17
2018-04-23 09:06:58,278 p=7669 u=mistral |  Monday 23 April 2018  09:06:58 -0400 (0:00:03.191)       0:03:33.862 ********** 
2018-04-23 09:06:58,886 p=7669 u=mistral |  failed: [192.168.24.24] (item={u'rule_name': u'', u'pg_num': 32, u'name': u'images'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-2", "ceph", 
"--cluster", "ceph", "osd", "pool", "application", "enable", "images", "rbd"], "delta": "0:00:00.318723", "end": "2018-04-23 13:06:58.855011", "item": {"name": "images", "pg_num": 32, "rule_name": ""}, "msg": "n
on-zero return code", "rc": 22, "start": "2018-04-23 13:06:58.536288", "stderr": "no valid command found; 10 closest matches:\nosd pool stats {<name>}\nosd pool ls {detail}\nosd pool rmsnap <poolname> <snap>\nos
d pool delete <poolname> {<poolname>} {--yes-i-really-really-mean-it}\nosd pool create <poolname> <int[0-]> {<int[0-]>} {replicated|erasure} {<erasure_code_profile>} {<ruleset>} {<int>}\nosd pool rename <poolnam
e> <poolname>\nosd pool rm <poolname> {<poolname>} {--yes-i-really-really-mean-it}\nosd pool set <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|nodelete|nopgchange|nosizec
hange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target
_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|auid|min_read_recency_for_promote|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_mi
n_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority <val> {--yes-i-really-mean-it}\nosd pool get <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|
crush_ruleset|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid|target_max_objects|target_max_bytes|cache_target_d
irty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|erasure_code_profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read|hit_set_grad
e_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority\nosd pool get-quota <poolname>\nError EINVAL: invalid command", 
"stderr_lines": ["no valid command found; 10 closest matches:", "osd pool stats {<name>}", "osd pool ls {detail}", "osd pool rmsnap <poolname> <snap>", "osd pool delete <poolname> {<poolname>} {--yes-i-really-re
ally-mean-it}", "osd pool create <poolname> <int[0-]> {<int[0-]>} {replicated|erasure} {<erasure_code_profile>} {<ruleset>} {<int>}", "osd pool rename <poolname> <poolname>", "osd pool rm <poolname> {<poolname>}
 {--yes-i-really-really-mean-it}", "osd pool set <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scru
b|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache_target_full_ratio|cache_
min_flush_age|cache_min_evict_age|auid|min_read_recency_for_promote|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval|deep_scrub_interva
l|recovery_priority|recovery_op_priority|scrub_priority <val> {--yes-i-really-mean-it}", "osd pool get <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|nodelete|nopgchange|n
osizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid|target_max_objects|target_max_bytes|cache_target_dirty_ratio|cache_target_dirty_high_ratio|cache
_target_full_ratio|cache_min_flush_age|cache_min_evict_age|erasure_code_profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read|hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_i
nterval|scrub_max_interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority", "osd pool get-quota <poolname>", "Error EINVAL: invalid command"], "stdout": "", "stdout_lines": []}


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
puppet-ceph-2.5.1-0.20180305100232.928fb38.el7ost.noarch
ceph-ansible-3.1.0-0.1.beta7.el7cp.noarch
openstack-tripleo-heat-templates-8.0.2-0.20180414062830.5f869f2.el7ost.noarch

Steps to Reproduce:
-------------------
1. Upgrade UC to RHOS-13 (2018-04-19.2)
2. If custom roles-data file is used adjust it to remove FluendtdClient and ManilaBackendGeneric services
3. Prepare RHOS-13 images (
4. Run 'openstack overcloud upgrade prepare' and pass all the env files used during initial deployment and file with latest docker images.

Actual results:
---------------
Upgrade prepare attempt failed


Expected results:
-----------------
Upgrade prepare succeeds


Additional info:
----------------
Virtual setup: 3controllers + 3database + 3 messaging + 3ceph + 2computes +2networkers

Comment 2 Giulio Fidente 2018-04-23 13:59:49 UTC
this might be an issue in Director; we hardcode the ceph_release parameter to 'luminous' [1] and it might break the upgrade playbooks

we're testing removal of ceph_release parameter given ceph-ansible can gather it at runtime; should that be the cause we can move this bug to OpenStack product

1. https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/ceph-ansible/ceph-base.yaml#L217

Comment 4 John Fulton 2018-04-23 14:24:28 UTC
Looks like the controller-2 node tried to use a luminous feature, 'pool application enable',  while it was still running Jewel: 

 http://paste.openstack.org/show/719748/

This might be because we enforced the ceph version in THT, which is fine for new containerized ceph deployments but not for upgrades. We don't need to enforce this variable as ceph-ansible determines it for itself. The following patch removes that enforcement: 

 https://review.openstack.org/#/c/563632

Next step is to test and see how it affects the deployment.

Comment 16 Yurii Prokulevych 2018-05-23 08:46:37 UTC
Verified with openstack-tripleo-heat-templates-8.0.2-22.el7ost.noarch

Comment 18 errata-xmlrpc 2018-06-27 13:52:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086