Bug 1303154

Summary: Deploy times are longer in 7.2 vs 7.1.
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: rhosp-directorAssignee: Hugh Brock <hbrock>
Status: CLOSED CANTFIX QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: dsneddon, hbrock, jcoufal, mburns, rhel-osp-director-maint
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-05 19:37:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2016-01-29 17:27:04 UTC
Description of problem:

A deploy with no network isolation (only using the --templates parameter, but no -e parameters) used to take 25-35 minutes, and this now takes 50-71 minutes.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.deploy with 7.1
2.deploy with 7.2
3.notice time difference

Actual results:
nearly 3x longer times in some cases.

Expected results:
similar times
Additional info:
Will attach sosreport from undercloud after failed 7.2 deploy.
Customer thinks the deploy fails because of some timeout that may also be related to longer deploy times.

Comment 3 Jeremy 2016-02-01 08:59:57 UTC
This is pretty much What I have found from the undercloud sosreport at this time. Any suggestions what else to ask for or look at? Thanks.


/glance/api.log
Multiple:
2016-01-28 16:00:49.155 57522 ERROR glance.registry.client.v1.client [req-4e3ea720-2393-4bcf-80e8-9c7ed3c55f5b 87ee5216df014b1bbb17ceb4057c080f 9df580d8d6904762a0edd0d49d3f9092 - - -] Registry client request GET /images/bm-deploy-ramdisk raised NotFound

/ironic/api.log
Multiple:
2016-01-28 18:12:49.245 60446 ERROR wsme.api [-] Server-side error: "Invalid control character '\n' at: line 1 column 137 (char 136)". Detail:

/ironic/ironic-conductor.log
2016-01-28 18:44:30.290 60467 ERROR oslo_messaging._drivers.common [-] Returning exception Node 77cb10b9-9e21-461b-9297-eb23316bdd73 is associated with instance 9c3811a9-100f-4fc6-96b7-2c466f58e6c2. to caller
2016-01-28 18:44:30.290 60467 ERROR oslo_messaging._drivers.common [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n    executor_callback))\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n    executor_callback)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch\n    result = func(ctxt, **new_args)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner\n    return func(*args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 405, in update_node\n    node_obj.save()\n', '  File "/usr/lib/python2.7/site-packages/ironic/objects/base.py", line 143, in wrapper\n    return fn(self, ctxt, *args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 265, in save\n    self.dbapi.update_node(self.uuid, updates)\n', '  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 338, in update_node\n    return self._do_update_node(node_id, values)\n', '  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 364, in _do_update_node\n    instance=ref.instance_uuid)\n', 'NodeAssociated: Node 77cb10b9-9e21-461b-9297-eb23316bdd73 is associated with instance 9c3811a9-100f-4fc6-96b7-2c466f58e6c2.\n']

Comment 4 Dan Sneddon 2016-02-03 17:35:40 UTC
(In reply to Jeremy from comment #0)

I don't think the deployment at the customer site is timing out because the deploy takes too long. I think it is hanging because their network configuration is somehow invalid.

We have increased the base deployment time due to additional verifications and steps to ensure proper upgrade functionality.

I don't think the right way to look at those increases is "3x longer than before", I think the way to properly express those increases in time is "30 minutes longer than before".

So, if a deployment with full network isolation used to take 60 minutes, we would now expect it to take ~90 minutes, not 180. We have added to the deployment time, but it's a net increase, not a factorial increase.

We aren't getting widespread timeouts due to increases in deployment time across the board, so I think there is a misconfiguration in this case.

Comment 6 Mike Burns 2016-04-07 21:07:13 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 8 Jaromir Coufal 2016-10-05 19:37:27 UTC
Obsolete since we already have 7.3 and newer releases, please re-open if still valid.