Bug 1945280

Summary: Step to export ceph configuration in a spine/leaf fails if the deployment uses a collapsed network topology
Product: Red Hat OpenStack Reporter: Darin Sorrentino <dsorrent>
Component: python-tripleoclientAssignee: John Fulton <johfulto>
Status: CLOSED ERRATA QA Contact: Alfredo <alfrgarc>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: bdobreli, hbrock, jdurgin, johfulto, jslagle, lhh, mburns, mhicks, spower
Target Milestone: z7Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-12.3.2-1.20210505144302.ae58329 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 20:18:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
TGZ of /var/lib/mistral on undercloud as requested by John Fulton
none
central inventory without storage_ip none

Description Darin Sorrentino 2021-03-31 15:18:09 UTC
Description of problem:

Exporting ceph configuration data for a DCN deployment fails when using a collapsed network topology.

You don't need to utilize isolated networks when deploying spine/leaf topology and it is possible to deploy with collapsing all networks down into the provisioning network.  When you have this deployment topology, the command to export the ceph data from the central/leaf0 location fails:

Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_export_ceph.py", line 105, in take_action
    config_download_dir))
  File "/usr/lib/python3.6/site-packages/tripleoclient/export.py", line 171, in export_ceph
    mon_ips = export_storage_ips(stack, config_download_dir)
  File "/usr/lib/python3.6/site-packages/tripleoclient/export.py", line 158, in export_storage_ips
    ip = inventory_data[mon_role]['hosts'][hostname]['storage_ip']
KeyError: 'storage_ip'
'storage_ip'


Version-Release number of selected component (if applicable):
16.1

How reproducible:
Every time.

Steps to Reproduce:
1. Deploy DCN/Spine Leaf with ceph in central & edge without isolated networks.
2. Execute command:

sudo -E openstack overcloud export ceph \
--stack central \
--config-download-dir /var/lib/mistral \
--output-file ~/dcn-common/central_ceph_external.yaml

This is from section 5.3, step 2 here:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/distributed_compute_node_and_storage_deployment/assembly_deploying-storage-at-the-edge#deploying_edge_sites_with_storage

3.

Actual results:
Traceback show above.

Expected results:
Creation of ~/dcn-common/central_ceph_external.yaml file which contains the Ceph credential information.  Example:

parameter_defaults:
  CephExternalMultiConfig:
  - ceph_conf_overrides:
      client:
        keyring: /etc/ceph/central.client.openstack.keyring
    cluster: central
    dashboard_enabled: false
    external_cluster_mon_ips: 10.20.0.10,10.20.0.11,10.20.0.12
    fsid: 12345678-1234-1234-1234-1234567890ab
    keys:
    - caps:
        mgr: allow *
        mon: profile rbd
        osd: profile rbd pool=vms, profile rbd pool=volumes, profile rbd pool=images
      key: ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ab==
      mode: '0600'
      name: client.openstack


Additional info:

Comment 1 John Fulton 2021-04-05 15:28:57 UTC
(In reply to Darin Sorrentino from comment #0)
> Description of problem:
> 
> Exporting ceph configuration data for a DCN deployment fails when using a
> collapsed network topology.
> 
> You don't need to utilize isolated networks when deploying spine/leaf
> topology and it is possible to deploy with collapsing all networks down into
> the provisioning network.  When you have this deployment topology, the
> command to export the ceph data from the central/leaf0 location fails:
> 
> Exception occured while running the command
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32,
> in run
>     super(Command, self).run(parsed_args)
>   File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line
> 41, in run
>     return super(Command, self).run(parsed_args)
>   File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
>     return_code = self.take_action(parsed_args) or 0
>   File
> "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_export_ceph.py",
> line 105, in take_action
>     config_download_dir))
>   File "/usr/lib/python3.6/site-packages/tripleoclient/export.py", line 171,
> in export_ceph
>     mon_ips = export_storage_ips(stack, config_download_dir)
>   File "/usr/lib/python3.6/site-packages/tripleoclient/export.py", line 158,
> in export_storage_ips
>     ip = inventory_data[mon_role]['hosts'][hostname]['storage_ip']
> KeyError: 'storage_ip'
> 'storage_ip'
> 
> 
> Version-Release number of selected component (if applicable):
> 16.1
> 
> How reproducible:
> Every time.
> 
> Steps to Reproduce:
> 1. Deploy DCN/Spine Leaf with ceph in central & edge without isolated
> networks.
> 2. Execute command:
> 
> sudo -E openstack overcloud export ceph \
> --stack central \
> --config-download-dir /var/lib/mistral \
> --output-file ~/dcn-common/central_ceph_external.yaml
> 
> This is from section 5.3, step 2 here:
> 
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.
> 1/html/distributed_compute_node_and_storage_deployment/assembly_deploying-
> storage-at-the-edge#deploying_edge_sites_with_storage
> 
> 3.
> 
> Actual results:
> Traceback show above.
> 
> Expected results:
> Creation of ~/dcn-common/central_ceph_external.yaml file which contains the
> Ceph credential information.  Example:
> 
> parameter_defaults:
>   CephExternalMultiConfig:
>   - ceph_conf_overrides:
>       client:
>         keyring: /etc/ceph/central.client.openstack.keyring
>     cluster: central
>     dashboard_enabled: false
>     external_cluster_mon_ips: 10.20.0.10,10.20.0.11,10.20.0.12
>     fsid: 12345678-1234-1234-1234-1234567890ab
>     keys:
>     - caps:
>         mgr: allow *
>         mon: profile rbd
>         osd: profile rbd pool=vms, profile rbd pool=volumes, profile rbd
> pool=images
>       key: ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ab==
>       mode: '0600'
>       name: client.openstack
> 
> 
> Additional info:

Would you please send me a copy of /var/lib/mistral from your undercloud so that I can be sure the export script can deal with this scenario? 

I am unable to reproduce this in my environment. I deployed without network isolation and so my ceph services are listening on the provisioning network (as you describe above) but when this happens the storage_ip is still set.

$ grep storage_ip inventory.yml 
      storage_ip: 192.168.24.8
      storage_ip: 192.168.24.23
      storage_ip: 192.168.24.12
      storage_ip: 192.168.24.11
$ 

My experience is that the inventory gets built with storage_ip entry either way, even if it's on the provisioning network (defaulting to 192.168.24.0/24).

Comment 2 Darin Sorrentino 2021-04-06 13:32:26 UTC
Created attachment 1769573 [details]
TGZ of /var/lib/mistral on undercloud as requested by John Fulton

Comment 4 John Fulton 2021-04-06 14:15:39 UTC
Created attachment 1769581 [details]
central inventory without storage_ip

Comment 28 errata-xmlrpc 2021-12-09 20:18:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762