Bug 1266201 - Overcloud HA deploy hangs on baremetal - passes on rerun
Summary: Overcloud HA deploy hangs on baremetal - passes on rerun
Keywords:
Status: CLOSED DUPLICATE of bug 1228862
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 7.0 (Kilo)
Assignee: James Slagle
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-24 18:35 UTC by Ronelle Landy
Modified: 2016-04-18 06:54 UTC (History)
7 users (show)

Fixed In Version: instack-undercloud-2.1.2-28.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-09-24 20:49:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Debug info at the point the deployment hangs (165.64 KB, text/plain)
2015-09-24 18:35 UTC, Ronelle Landy
no flags Details

Description Ronelle Landy 2015-09-24 18:35:47 UTC
Created attachment 1076657 [details]
Debug info at the point the deployment hangs

Description of problem:

Deploying an overcloud with 3 controllers and 1 compute node on baremetal hangs. After four hours the the deployment fails with:

 "HTTPUnauthorized: ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1
Authentication required" 

However this error is misleading and is only thrown because the deploy hangs for four hours and hits the auth token timeout.

The real issue shows that one controller node completes successfully and the other two hang:

From the difference between the os-collect-config log output ($ sudo journalctl -u os-collect-config) on the two systems:

<bnemec> The working one looks up its instance-id through os-apply-config, finds it, and continues to report back.
<bnemec> The stuck one looks up its instance-id, doesn't find it, and doesn't report back.
<bnemec> ec2.json is missing from /run/os-collect-config/os_config_files.json

Compare:

[heat-admin@overcloud-controller-1 ~]$ sudo cat  /var/run/os-collect-config/os_config_files.json
["/var/lib/os-collect-config/heat_local.json", "/var/lib/os-collect-config/cfn.json", "/var/lib/os-collect-config/overcloud-Controller-glvonlfw4cyr-1-fwjzxtscidpo-ControllerConfig-eadn3ts46kbd.json", "/var/lib/os-collect-config/overcloud-Controller-glvonlfw4cyr-1-fwjzxtscidpo-NetworkConfig-cewcrmrguc72-OsNetConfigImpl-dikf7dmjxwei.json"][heat-admin@overcloud-controller-1 ~]$ 


[heat-admin@overcloud-controller-2 ~]$ sudo cat  /var/run/os-collect-config/os_config_files.json
["/var/lib/os-collect-config/heat_local.json", "/var/lib/os-collect-config/ec2.json", "/var/lib/os-collect-config/cfn.json", "/var/lib/os-collect-config/overcloud-Controller-glvonlfw4cyr-2-zpqyf3pxsvzh-ControllerConfig-rlywsxrszpgb.json", "/var/lib/os-collect-config/overcloud-Controller-glvonlfw4cyr-2-zpqyf3pxsvzh-NetworkConfig-6kanbsanfcl6-OsNetConfigImpl-hyqst4v27pq5.json"][heat-admin@overcloud-controller-2 ~]$ 


If the stalled stack is deleted and the exact deploy command is rerun, the stack goes to complete.

Version-Release number of selected component (if applicable):

[stack@host15 ~]$ rpm -qa | grep openstack
openstack-nova-common-2015.1.1-1.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-5.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-4.el7ost.noarch
openstack-ceilometer-api-2015.1.1-1.el7ost.noarch
openstack-ironic-api-2015.1.1-4.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-ironic-conductor-2015.1.1-4.el7ost.noarch
openstack-tempest-kilo-20150708.2.el7ost.noarch
openstack-neutron-ml2-2015.1.1-5.el7ost.noarch
openstack-nova-scheduler-2015.1.1-1.el7ost.noarch
openstack-puppet-modules-2015.1.8-17.el7ost.noarch
openstack-nova-cert-2015.1.1-1.el7ost.noarch
python-django-openstack-auth-1.2.0-5.el7ost.noarch
openstack-dashboard-2015.1.1-2.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.1-4.el7ost.noarch
openstack-ceilometer-notification-2015.1.1-1.el7ost.noarch
openstack-ceilometer-collector-2015.1.1-1.el7ost.noarch
openstack-tuskar-0.4.18-4.el7ost.noarch
openstack-neutron-openvswitch-2015.1.1-5.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-nova-api-2015.1.1-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-2.git49b57eb.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-10.el7ost.noarch
openstack-nova-console-2015.1.1-1.el7ost.noarch
openstack-neutron-2015.1.1-5.el7ost.noarch
openstack-heat-engine-2015.1.1-4.el7ost.noarch
openstack-ceilometer-common-2015.1.1-1.el7ost.noarch
openstack-ironic-discoverd-1.1.0-6.el7ost.noarch
openstack-selinux-0.6.37-1.el7ost.noarch
openstack-nova-compute-2015.1.1-1.el7ost.noarch
openstack-nova-conductor-2015.1.1-1.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-tuskar-ui-0.4.0-3.el7ost.noarch
openstack-neutron-common-2015.1.1-5.el7ost.noarch
openstack-heat-common-2015.1.1-4.el7ost.noarch
openstack-ironic-common-2015.1.1-4.el7ost.noarch
openstack-ceilometer-alarm-2015.1.1-1.el7ost.noarch
openstack-keystone-2015.1.1-1.el7ost.noarch
openstack-dashboard-theme-2015.1.1-2.el7ost.noarch
openstack-glance-2015.1.1-1.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch
openstack-heat-api-2015.1.1-4.el7ost.noarch
openstack-ceilometer-central-2015.1.1-1.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-nova-novncproxy-2015.1.1-1.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch

[stack@host15 ~]$ rpm -qa | grep oscplugin
python-rdomanager-oscplugin-0.0.10-5.el7ost.noarch


How reproducible:
Deploy always fails the first time on baremetal and passes on rerun

Steps to Reproduce:
1. Install ops-director from latest puddle on baremetal
2. Deploy an HA overcloud (3 controllers 1 compute)
3. Watch the deployment stall

Actual results:

The deployment will error out after 4 hours

Expected results:

The deployment should reach COMPLETE

Additional info:

Comment 4 James Slagle 2015-09-24 20:48:28 UTC
this was introduced b/c of the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1228862

there is an issue with the patch for that, which i've fixed in:
https://code.engineering.redhat.com/gerrit/#/c/58392/

it's built into instack-undercloud-2.1.2-28.el7ost

Comment 5 James Slagle 2015-09-24 20:49:35 UTC
going to close this one as a dupe of the other bug so we dont have to chase a bunch of acks on it

*** This bug has been marked as a duplicate of bug 1228862 ***

Comment 6 James Slagle 2015-09-24 20:55:20 UTC
the issue here was that keystone users are getting recreated now in 98-undercloud-setup, and nova-api must be holding onto an old token. This causes the following traceback in the nova-api log when the overcloud nodes query for metadata:

    _handle_fault_response /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:176
2015-09-24 16:47:50.925 2968 ERROR nova.api.openstack [req-ef1347fd-554d-4ff7-9bcf-9b47501bf52f 71514baf617747dd816985fdebe313e8 f13b7755370e4875ae107532d2815a51 - - -] Caught error: 404 Not Found

The resource could not be found.

   
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack Traceback (most recent call last):
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/nova/api/openstack/__init__.py", line 125, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return req.get_response(self.application)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     application, catch_exc_info=False)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     app_iter = application(self.environ, start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return resp(environ, start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 634, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return self._call_app(env, start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 554, in _call_app
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return self._app(env, _fake_start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return resp(environ, start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return resp(environ, start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/routes/middleware.py", line 131, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     response = self.app(environ, start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return resp(environ, start_response)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     resp = self.call_func(req, *args, **self.kwargs)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return self.func(req, *args, **kwargs)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 756, in __call__
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     content_type, body, accept)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 821, in _process_stack
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     action_result = self.dispatch(meth, request, action_args)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 911, in dispatch
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     return method(req=request, **action_args)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/contrib/floating_ips.py", line 103, in index
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     floating_ips = self.network_api.get_floating_ips_by_project(context)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1252, in get_floating_ips_by_project
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     fips = client.list_floatingips(tenant_id=project_id)['floatingips']
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 102, in with_params
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     ret = self.function(instance, *args, **kwargs)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 691, in list_floatingips
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     **_params)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 307, in list
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     for r in self._pagination(collection, path, **params):
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 320, in _pagination
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     res = self.get(path, params=params)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 293, in get
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     headers=headers, params=params)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 270, in retry_request
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     headers=headers, params=params)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 211, in do_request
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     self._handle_fault_response(status_code, replybody)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 185, in _handle_fault_response
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     exception_handler_v20(status_code, des_error_body)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 83, in exception_handler_v20
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack     message=message)
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack NeutronClientException: 404 Not Found
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack 
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack The resource could not be found.
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack 
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack    
2015-09-24 16:47:50.925 2968 TRACE nova.api.openstack 



the corresponding error from neutron/server.log:

2015-09-24 16:47:53.688 29876 DEBUG keystoneclient.session [-] Request returned failure status: 404 request /usr/lib/python2.7/site-packages/keystoneclient/session.py:396
2015-09-24 16:47:53.688 29876 WARNING keystonemiddleware.auth_token [-] Authorization failed for token


Note You need to log in before you can comment on or make changes to this bug.