Bug 1847770

Summary: [dcn] instance boot fails on DistributedComputeHCIScaleOut node: Error contacting glance server 'http://172.25.2.182:9292' for 'get', done trying.: glanceclient.exc.HTTPServiceUnavailable: HTTP 503 Service Unavailable
Product: Red Hat OpenStack Reporter: John Fulton <johfulto>
Component: openstack-tripleo-heat-templatesAssignee: John Fulton <johfulto>
Status: CLOSED ERRATA QA Contact: bkopilov <bkopilov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: abishop, apevec, bdobreli, gcharot, gfidente, lhh, mburns, pgrist
Target Milestone: rcKeywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200616081526.396affd.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 07:53:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1802772    

Description John Fulton 2020-06-17 01:12:35 UTC
When deploying HCI nodes on the edge with the following node count per DCN site:

  DistributedComputeHCICount: 3
  DistributedComputeHCIScaleOutCount: 1

instances booted on DistributedComputeHCI like this work fine: 

openstack server create instance2 --image 42af42c6-c663-49b1-a800-feb6f2547371 --flavor m1.nano --nic net-id=afb1c6dd-b425-4cec-b0d3-c70cc837df01 --availability-zone az-dcn1 --hypervisor-hostname dcn1-computehci1-0.redhat.local

However instances fail to boot on DistributedComputeHCIScaleOut nodes when running the following: 

openstack server create instance3 --image 42af42c6-c663-49b1-a800-feb6f2547371 --flavor m1.nano --nic net-id=afb1c6dd-b425-4cec-b0d3-c70cc837df01 --availability-zone az-dcn1 --hypervisor-hostname dcn1-computehciscaleout1-0.redhat.local 

The following is seen in the nova-compute log:

/var/log/containers/nova/nova-compute.log:60:2020-06-16 11:05:26.822 8 ERROR nova.image.glance [req-05906d7b-ca3e-4977-bbb3-3d400c85f9ea efb7deed85904588a504616e95035276 699ce3c09f914440ae9408375f5af143 - default default] Error contacting glance server 'http://172.25.2.182:9292' for 'get', done trying.: glanceclient.exc.HTTPServiceUnavailable: HTTP 503 Service Unavailable: No server is available to handle this request.
/var/log/containers/nova/nova-compute.log:61:2020-06-16 11:05:26.822 8 ERROR nova.image.glance Traceback (most recent call last):
/var/log/containers/nova/nova-compute.log:62:2020-06-16 11:05:26.822 8 ERROR nova.image.glance   File "/usr/lib/python3.6/site-packages/nova/image/glance.py", line 193, in call
/var/log/containers/nova/nova-compute.log:63:2020-06-16 11:05:26.822 8 ERROR nova.image.glance     result = getattr(controller, method)(*args, **kwargs)
/var/log/containers/nova/nova-compute.log:64:2020-06-16 11:05:26.822 8 ERROR nova.image.glance   File "/usr/lib/python3.6/site-packages/glanceclient/v2/images.py", line 198, in get
/var/log/containers/nova/nova-compute.log:65:2020-06-16 11:05:26.822 8 ERROR nova.image.glance     return self._get(image_id)
/var/log/containers/nova/nova-compute.log:66:2020-06-16 11:05:26.822 8 ERROR nova.image.glance   File "/usr/lib/python3.6/site-packages/glanceclient/common/utils.py", line 598, in inner
/var/log/containers/nova/nova-compute.log:67:2020-06-16 11:05:26.822 8 ERROR nova.image.glance     return RequestIdProxy(wrapped(*args, **kwargs))
/var/log/containers/nova/nova-compute.log:68:2020-06-16 11:05:26.822 8 ERROR nova.image.glance   File "/usr/lib/python3.6/site-packages/glanceclient/v2/images.py", line 191, in _get
/var/log/containers/nova/nova-compute.log:69:2020-06-16 11:05:26.822 8 ERROR nova.image.glance     resp, body = self.http_client.get(url, headers=header)
/var/log/containers/nova/nova-compute.log:70:2020-06-16 11:05:26.822 8 ERROR nova.image.glance   File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 386, in get
/var/log/containers/nova/nova-compute.log:71:2020-06-16 11:05:26.822 8 ERROR nova.image.glance     return self.request(url, 'GET', **kwargs)
/var/log/containers/nova/nova-compute.log:72:2020-06-16 11:05:26.822 8 ERROR nova.image.glance   File "/usr/lib/python3.6/site-packages/glanceclient/common/http.py", line 387, in request
/var/log/containers/nova/nova-compute.log:73:2020-06-16 11:05:26.822 8 ERROR nova.image.glance     return self._handle_response(resp)
/var/log/containers/nova/nova-compute.log:74:2020-06-16 11:05:26.822 8 ERROR nova.image.glance   File "/usr/lib/python3.6/site-packages/glanceclient/common/http.py", line 126, in _handle_response
/var/log/containers/nova/nova-compute.log:75:2020-06-16 11:05:26.822 8 ERROR nova.image.glance     raise exc.from_response(resp, resp.content)
/var/log/containers/nova/nova-compute.log:76:2020-06-16 11:05:26.822 8 ERROR nova.image.glance glanceclient.exc.HTTPServiceUnavailable: HTTP 503 Service Unavailable: No server is available to handle this

Comment 1 John Fulton 2020-06-17 01:20:04 UTC
WORKAROUND:

Open port 9292 on the DistributedComputeHCI nodes

Comment 2 John Fulton 2020-06-17 01:20:48 UTC
Suspected root cause

The GlanceAPI service opens port 9292 because glance-api-container-puppet.yaml says so [1]
When the GlanceApiEdge service were created [2] it didn't include a directive to open port 9292
The glance-api-edge-container-puppet.yaml [3] probably needs something similar to [1] 

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/glance/glance-api-container-puppet.yaml#L365
[2] https://github.com/openstack/tripleo-heat-templates/commit/30ca49bf611d0c3b443df6e7a636628dee281303
[3] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/glance/glance-api-edge-container-puppet.yaml

Comment 20 errata-xmlrpc 2020-07-29 07:53:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148