Bug 1884540

Summary: DCN: Docs changed for Glance on DCN multibackend storage
Product: Red Hat OpenStack Reporter: Marian Krcmarik <mkrcmari>
Component: documentationAssignee: Roger Heslop <rheslop>
Status: CLOSED CURRENTRELEASE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: cfields, gcharot, johfulto, mgarciac, owalsh, rheslop
Target Milestone: z2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-02 18:27:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1802774    

Description Marian Krcmarik 2020-10-02 09:00:32 UTC
Description of problem:
We need to do some probably significant changes how we deploy (what templates we use) Glance in Storage Multibackend DCN environemnt.
It's all in the lines of upstream change request:
https://review.opendev.org/#/c/755520/

Version-Release number of selected component (if applicable):
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/distributed_compute_node_and_storage_deployment/index#assembly_deploying-storage-at-the-edge

The changes are basically all about Glance.
To sum up the changes:
1. Initial Central stack deployment should be deployed with glance.yaml template which looks like:
  parameter_defaults:
    GlanceEnabledImportMethods: web-download,copy-image
    GlanceBackend: rbd
    GlanceStoreDescription: 'central rbd glance store'
    GlanceBackendID: central  <<<THIS IS NEWLY ADDED PARAM
    CephClusterName: central

and then once central stack is updated (after DCN sites deployed) glance_update would only have:
  parameter_defaults:
   dcn0:
      GlanceBackend: rbd
      GlanceStoreDescription: 'dcn0 rbd glance store'
      CephClientUserName: 'glance'
      CephClusterName: dcn0
    dcn1:
      GlanceBackend: rbd
      GlanceStoreDescription: 'dcn1 rbd glance store'
      CephClientUserName: 'glance'
      CephClusterName: dcn1

2. and add parameter GlanceBackendID: $STACKNAME to glance.yaml for each DCN, for example:
parameter_defaults:
  GlanceEnabledImportMethods: web-download,copy-image
  GlanceBackend: rbd
  GlanceStoreDescription: 'dcn0 rbd glance store'
  GlanceBackendID: dcn0
  GlanceMultistoreConfig:
    central:
      GlanceBackend: rbd
      GlanceStoreDescription: 'central rbd glance store'
      CephClientUserName: 'external'
      CephClusterName: central

AS far as I read downstream doc, these changes should be enough, but better to go through upstream change and check all pieces.

Comment 1 Marian Krcmarik 2020-10-02 12:53:23 UTC
If the deployment (16.1.2) is being created as currently described in documentation or updated from older release of 16.1 (deployed as described currently in documentation) then spawning of any instance at EDGE DCN site will fail with an error:
nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 8f3efb5f-e328-4841-bd8b-acb8dcaa3af6. Last exception: Image 647b67b1-55c1-4cc8-a7ec-2bc7304a94e4 could not be found

following backtrace in logs:
ERROR nova.scheduler.utils [req-4aa71cca-da66-4557-a6df-e2e90c76316f c5170dfeb38d434d9f16f71633047546 a20e41cc197341cabcaaa01eaf83ec9d - default default] [instance: 732463de-b04e-470a-b83b-6804b9dadcb8] Error from last host: dcn2-computehci2-0.redhat.local (node dcn2-computehci2-0.redhat.local): ['Traceback (most recent call last):
', '  File "/usr/lib/python3.6/site-packages/nova/image/glance.py", line 256, in show
    image = self._client.call(context, 2, \'get\', args=(image_id,))
', '  File "/usr/lib/python3.6/site-packages/nova/image/glance.py", line 193, in call
    result = getattr(controller, method)(*args, **kwargs)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/v2/images.py", line 198, in get
    return self._get(image_id)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/common/utils.py", line 598, in inner
    return RequestIdProxy(wrapped(*args, **kwargs))
', '  File "/usr/lib/python3.6/site-packages/glanceclient/v2/images.py", line 191, in _get
    resp, body = self.http_client.get(url, headers=header)
', '  File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 386, in get
    return self.request(url, \'GET\', **kwargs)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/common/http.py", line 387, in request
    return self._handle_response(resp)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/common/http.py", line 126, in _handle_response
    raise exc.from_response(resp, resp.content)
', 'glanceclient.exc.HTTPNotFound: HTTP 404 Not Found: No image found with ID -cfa0-4e8a-b78b-28413cd2dff0ecb05c31
', '
During handling of the above exception, another exception occurred:

', 'Traceback (most recent call last):
', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
    block_device_info=block_device_info)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3636, in spawn
    block_device_info=block_device_info)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4022, in _create_image
    injection_info, fallback_from_host)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4130, in _create_and_inject_local_root
    instance, size, fallback_from_host)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 9450, in _try_fetch_image_cache
    trusted_certs=instance.trusted_certs)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py", line 275, in cache
    *args, **kwargs)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py", line 940, in create_image
    prepare_template(target=base, *args, **kwargs)
', '  File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
    return f(*args, **kwargs)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py", line 271, in fetch_func_sync
    fetch_func(target=target, *args, **kwargs)
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4122, in clone_fallback_to_fetch
    backend.clone(context, disk_images[\'image_id\'])
', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py", line 963, in clone
    include_locations=True)
', '  File "/usr/lib/python3.6/site-packages/nova/image/api.py", line 105, in get
    show_deleted=show_deleted)
', '  File "/usr/lib/python3.6/site-packages/nova/image/glance.py", line 258, in show
    _reraise_translated_image_exception(image_id)
', '  File "/usr/lib/python3.6/site-packages/nova/image/glance.py", line 922, in _reraise_translated_image_exception
    six.reraise(type(new_exc), new_exc, exc_trace)
', '  File "/usr/lib/python3.6/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
', '  File "/usr/lib/python3.6/site-packages/nova/image/glance.py", line 256, in show
    image = self._client.call(context, 2, \'get\', args=(image_id,))
', '  File "/usr/lib/python3.6/site-packages/nova/image/glance.py", line 193, in call
    result = getattr(controller, method)(*args, **kwargs)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/v2/images.py", line 198, in get
    return self._get(image_id)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/common/utils.py", line 598, in inner
    return RequestIdProxy(wrapped(*args, **kwargs))
', '  File "/usr/lib/python3.6/site-packages/glanceclient/v2/images.py", line 191, in _get
    resp, body = self.http_client.get(url, headers=header)
', '  File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 386, in get
    return self.request(url, \'GET\', **kwargs)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/common/http.py", line 387, in request
    return self._handle_response(resp)
', '  File "/usr/lib/python3.6/site-packages/glanceclient/common/http.py", line 126, in _handle_response
    raise exc.from_response(resp, resp.content)
', 'nova.exception.ImageNotFound: Image 2dff0ecb-cfa0-4e8a-b78b-28413cd05c31 could not be found.
', '
During handling of the above exception, another exception occurred:

', 'Traceback (most recent call last):
', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance
    filter_properties, request_spec)
', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance
    instance_uuid=instance.uuid, reason=six.text_type(e))
', 'nova.exception.RescheduledException: Build of instance 732463de-b04e-470a-b83b-6804b9dadcb8 was re-scheduled: Image 2dff0ecb-cfa0-4e8a-b78b-28413cd05c31 could not be found.
']

Comment 2 John Fulton 2020-10-02 13:17:11 UTC
It would have been better if we had this setting from the start to ensure a unique Glance backend identifier (default default_backend) settable via GlanceBackendID but testing didn't uncover not having this as a problem until we hit a side effect from the following which wasn't in the initial 16.1 release: 

https://review.opendev.org/#/c/741086/1/glance/common/store_utils.py@199

The QE testing on the above new code uncovered this issue.

Comment 17 Roger Heslop 2021-06-02 18:27:24 UTC
Closing bug as the GlanceBackendID parameter is added to the documentation in the current release.

Thanks John,