Bug 2101677 - osds_per_device does not work when passed using --osd-spec option during ceph deploy
Summary: osds_per_device does not work when passed using --osd-spec option during ceph...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ga
: 17.0
Assignee: Francesco Pantano
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-28 06:18 UTC by Ketan Mehta
Modified: 2022-09-21 12:23 UTC (History)
4 users (show)

Fixed In Version: tripleo-ansible-3.3.1-0.20220705170917.c570322.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:23:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 848158 0 None master: MERGED tripleo-ansible: Make osds spec extra keys more flexible (Ic12dd0a099a2a13fa058677dc3c327b808173893) 2022-07-05 18:31:24 UTC
OpenStack gerrit 848288 0 None stable/wallaby: MERGED tripleo-ansible: Make osds spec extra keys more flexible (Ic12dd0a099a2a13fa058677dc3c327b808173893) 2022-07-05 18:31:30 UTC
Red Hat Issue Tracker OSP-16075 0 None None None 2022-06-28 06:24:27 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:23:41 UTC

Description Ketan Mehta 2022-06-28 06:18:49 UTC
Description of problem:

RHOSP17.0 + RHCS5.x

While deploying ceph before overcloud provisioning with 'osds_per_device' the deployment fails at the initial stage itself as ceph_spec_bootstrap.py doesn't seem to have support for it.

The deployment fails with the following traceback:

+++
Using module file /usr/share/ansible/plugins/modules/ceph_spec_bootstrap.py                                                                                                                    
Pipelining is enabled.                                                                                                                                                                         
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack                                                                                                                                         
<localhost> EXEC /bin/sh -c '/usr/bin/python3.9 && sleep 0'                                                                                                                                    
The full traceback is:                                                                                                                                                                         
Traceback (most recent call last):                                                                                                                                                             
  File "<stdin>", line 107, in <module>                                                                                                                                                        
  File "<stdin>", line 99, in _ansiballz_main                                                                                                                                                  
  File "<stdin>", line 47, in invoke_module                                                                                                                                                    
  File "/usr/lib64/python3.9/runpy.py", line 210, in run_module                                                                                                                                
    return _run_module_code(code, init_globals, run_name, mod_spec)                                                                                                                            
  File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code                                                                                                                           
    _run_code(code, mod_globals, init_globals,                                                                                                                                                 
  File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code                                                                                                                                  
    exec(code, run_globals)                                                                                                                                                                    
  File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 502, in <module>                               
  File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 489, in main                                   
  File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 363, in get_specs                              
  File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/module_utils/ceph_spec.py", line 230, in make_daemon_spec                            
Exception: Fatal: the spec should be composed by only allowed keywords                                                                                                                         
2022-06-27 10:05:50.647505 | 000af799-5956-c9fb-316e-00000000001e |      FATAL | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | error={                    
    "changed": false,                                                                                                                                                                          
    "module_stderr": "Traceback (most recent call last):\n  File \"<stdin>\", line 107, in <module>\n  File \"<stdin>\", line 99, in _ansiballz_main\n  File \"<stdin>\", line 47, in invoke_mo
dule\n  File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib64/python3.9/runpy.py\", line 97
, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_ceph
_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 502, in <module>\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_i7y
1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 489, in main\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstr
ap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 363, in get_specs\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/mod
ule_utils/ceph_spec.py\", line 230, in make_daemon_spec\nException: Fatal: the spec should be composed by only allowed keywords\n",                              
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1
}
+++

Here is the deployment command in use:

openstack overcloud ceph deploy templates-output/overcloud-node-deployed.yaml --osd-spec osd-spec.yml --network-data templates/custom_network_data.yaml --roles-data templates/roles_data_nfs.yaml --container-image-prepare container-prepare-parameter.yaml  -o template-output/deployed-ceph.yaml -vv

&& the osd spec file:

# cat osd-spec.yml

cat osd-spec.yml 
data_devices:
  paths:
    - /dev/nvme0n1
    - /dev/nvme1n1
    - /dev/nvme2n1
osds_per_device: 2

Version-Release number of selected component (if applicable):

tripleo-ansible-3.3.1-0.20220307013244.130185a.el9ost.noarch

How reproducible:

Deploy ceph with osds_per_device specified in the spec file and the deployment fails with the mentioned error.

Steps to Reproduce:
1.
2.
3.

Actual results:

Deployment fails.

Expected results:

Deployment should complete.

Additional info:

Deployment completes without 'osds_per_device' being used in the spec file.

Comment 1 John Fulton 2022-06-29 16:33:08 UTC
We found the following snippet from a larger ceph spec [1] was valid when applied with 'ceph orch apply -i ':

(undercloud) [stack@undercloud ~]$ cat osd-spec.yml 
data_devices:
  model: 'SAMSUNG'
osds_per_device: 2
(undercloud) [stack@undercloud ~]$ 

However, when we pass 'openstack overcloud ceph deploy --osd-spec osd-spec.yml ', it fails [3] on the /usr/share/ansible/plugins/modules/ceph_spec_bootstrap.py

Perhaps the ALLOWED_EXTRA_KEYS is too limited:

  https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/ansible_plugins/module_utils/ceph_spec.py#L33-L40


[1] 
"""
service_type: alertmanager
service_name: alertmanager
placement:
  hosts:
  - controller-0
  - controller-1
  - controller-2
networks:
- 192.17.3.0/24
spec:
  port: 9093
---
service_type: crash
service_name: crash
placement:
  host_pattern: '*'
---
service_type: grafana
service_name: grafana
placement:
  hosts:
  - controller-0
  - controller-1
  - controller-2
networks:
- 192.17.3.0/24
spec:
  port: 3100
---
service_type: mds
service_id: mds
service_name: mds.mds
placement:
  hosts:
  - controller-0
  - controller-1
  - controller-2
---
service_type: mgr
service_name: mgr
placement:
  hosts:
  - controller-0
  - controller-1
  - controller-2
---
service_type: mon
service_name: mon
placement:
  hosts:
  - controller-0
  - controller-1
  - controller-2
---
service_type: node-exporter
service_name: node-exporter
placement:
  host_pattern: '*'
networks:
- 192.17.3.0/24
---
service_type: osd
service_id: default_drive_group
service_name: osd.default_drive_group
placement:
  hosts:
  - cephstorage-0
  - cephstorage-1
  - cephstorage-3
  - cephstorage-4
spec:
  data_devices:
    paths:
    - /dev/nvme0n1
    - /dev/nvme1n1
    - /dev/nvme2n1
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: default_drive_group_vendor
service_name: osd.default_drive_group_vendor
placement:
  hosts:
  - cephstorage-2
spec:
  data_devices:
    model: SAMSUNG
  filter_logic: AND
  objectstore: bluestore
  osds_per_device: 2
---
service_type: prometheus
service_name: prometheus
placement:
  hosts:
  - controller-0
  - controller-1
  - controller-2
networks:
- 192.17.3.0/24
spec:
  port: 9092
---
service_type: rgw
service_id: rgw
service_name: rgw.rgw
placement:
  hosts:
  - controller-0
  - controller-1
  - controller-2
networks:
- 192.17.3.0/24
spec:
  rgw_frontend_port: 8080
  rgw_realm: default
  rgw_zone: default
"""

[2] openstack overcloud ceph deploy templates-output/overcloud-node-deployed.yaml --osd-spec osd-spec.yml --network-data templates/custom_network_data.yaml --roles-data templates/roles_data_nfs.yaml --container-image-prepare container-prepare-parameter.yaml  -o templates-output/deployed-ceph.yaml -vv --skip-hosts-config  --skip-user-create  --skip-container-registry-config --yes

[3]
2022-06-29 12:30:35.378849 | 000af799-5956-0b22-d49e-00000000001e |       TASK | Create Ceph spec based on baremetal_deployed_path and tripleo_roles
Using module file /usr/share/ansible/plugins/modules/ceph_spec_bootstrap.py 
Pipelining is enabled.                                                                                                                                                                        
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack 
<localhost> EXEC /bin/sh -c '/usr/bin/python3.9 && sleep 0'                                                                                                                                   
The full traceback is:                                                                         
Traceback (most recent call last):         
  File "<stdin>", line 107, in <module>
  File "<stdin>", line 99, in _ansiballz_main                                                  
  File "<stdin>", line 47, in invoke_module
  File "/usr/lib64/python3.9/runpy.py", line 210, in run_module                      
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code        
    _run_code(code, mod_globals, init_globals,                                                 
  File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code                                                                                                                                 
    exec(code, run_globals)       
  File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 502, in <module>
  File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 489, in main
  File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 363, in get_specs
  File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/module_utils/ceph_spec.py", line 230, in make_daemon_spec
Exception: Fatal: the spec should be composed by only allowed keywords                                                                                                                        
2022-06-29 12:30:35.938975 | 000af799-5956-0b22-d49e-00000000001e |      FATAL | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | error={
    "changed": false,              
    "module_stderr": "Traceback (most recent call last):\n  File \"<stdin>\", line 107, in <module>\n  File \"<stdin>\", line 99, in _ansiballz_main\n  File \"<stdin>\", line 47, in invoke_m
odule\n  File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib64/python3.9/runpy.py\", line 
97, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_c
eph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 502, in <module>\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload
__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 489, in main\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bo
otstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 363, in get_specs\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansib
le/module_utils/ceph_spec.py\", line 230, in make_daemon_spec\nException: Fatal: the spec should be composed by only allowed keywords\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1
}

Comment 2 John Fulton 2022-06-29 16:41:58 UTC
WORKAROUND:

(undercloud) [stack@undercloud ~]$ sudo vim /usr/share/ansible/plugins/module_utils/ceph_spec.py 
(undercloud) [stack@undercloud ~]$ grep osds_per_device !$
grep osds_per_device /usr/share/ansible/plugins/module_utils/ceph_spec.py
        'osds_per_device'
(undercloud) [stack@undercloud ~]$ grep -B 1 -A 1 osds_per_device /usr/share/ansible/plugins/module_utils/ceph_spec.py
        'encrypted',
        'osds_per_device'
    ]
(undercloud) [stack@undercloud ~]$ 

after the above the same command worked. 

However, the bug fix for this is more than the above. We're hard-coding the allowed parameters and will always be behind as cephadm adds new features to its spec. We need to be more flexible.

Comment 9 John Fulton 2022-07-11 15:16:10 UTC
To test that this fix works, do a deployment with the --osd-spec flag like this:

 openstack overcloud ceph deploy --osd-spec osd-spec.yml

Where osd-spec.yml contains the following:

(undercloud) [stack@undercloud ~]$ cat osd-spec.yml 
data_devices:
  all: true
osds_per_device: 2
(undercloud) [stack@undercloud ~]$ 

The osds_per_device entry above is what would cause the bug; but as per the fix the, ceph deployment should not fail.

Comment 10 Yogev Rabl 2022-08-04 13:41:03 UTC
verified.

Ceph deployment completed successfully with the parameters set by osd specs

Comment 15 errata-xmlrpc 2022-09-21 12:23:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.