Bug 1308419

Summary: Unable to scale up a deployed overcloud by adding a new compute node
Product: Red Hat OpenStack Reporter: Raoul Scarazzini <rscarazz>
Component: python-tripleoclientAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: Omri Hochman <ohochman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: abeekhof, augol, dbecker, fdinitto, hbrock, hjensas, jcoufal, jschluet, jslagle, mburns, mcornea, mgandolf, michele, morazi, oblaut, ohochman, racedoro, rhel-osp-director-maint, rscarazz, rybrown
Target Milestone: rcKeywords: TestOnly, Triaged
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-tripleoclient-5.3.0-1.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 15:22:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raoul Scarazzini 2016-02-15 07:18:39 UTC
Description of problem:

I'm not able to add a new compute node to a deployed overcloud environment following the steps described on the official documentation https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Advanced-Registering_Nodes_for_the_Advanced_Overcloud.

The update operation ends with this error:

2016-02-12 12:15:46 overcloud UPDATE_IN_PROGRESS Stack UPDATE started
...
...
2016-02-12 12:35:34 Compute UPDATE_FAILED resources.Compute: resources[0]: ClientException: resources.TenantPort: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'keystoneclient.exceptions.RequestTimeout'> (HTTP 500) (Req
2016-02-12 12:39:46 Controller UPDATE_FAILED UPDATE aborted
2016-02-12 12:39:47 overcloud UPDATE_FAILED resources.Compute: resources[0]: ClientException: resources.TenantPort: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'keystoneclient.exceptions.RequestTimeout'> (HTTP 500) (Req

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 7.2 (Maipo)
openstack-tripleo-heat-templates-0.8.7-2.el7ost.noarch
instack-undercloud-2.2.1-2.el7ost.noarch

How reproducible:

Deploy an overcloud with a command line similar to this one:

openstack overcloud deploy --templates --libvirt-type=kvm --ntp-server 10.5.26.10 --control-scale 3 --compute-scale 1 --ceph-storage-scale 0 --block-storage-scale 0 --swift-storage-scale 0 --control-flavor baremetal --compute-flavor baremetal --ceph-storage-flavor baremetal --block-storage-flavor baremetal --swift-storage-flavor baremetal -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml --neutron-bridge-mappings datacentre:br-floating

And after the successful deployment, try to add the new compute node:

$ openstack baremetal import --json newnodes.json
$ ironic node-list
$ ironic node-set-maintenance [NODE UUID] true
$ openstack baremetal introspection start [NODE UUID]
$ ironic node-set-maintenance [NODE UUID] false
$ ironic node-update [NODE UUID] add properties/capabilities='profile:baremetal,boot_option:local'
$ ironic node-update [NODE UUID] add driver_info/deploy_kernel='09b40e3d-0382-4925-a356-3a4b4f36b514'
$ ironic node-update [NODE UUID] add driver_info/deploy_ramdisk='765a46af-4417-4592-91e5-a300ead3faf6'
$ openstack overcloud deploy --templates --compute-scale 3 -e [...]

Actual results:

Failure (see error above).

Expected results:

New compute nodes should become available.

Comment 2 Ryan Brown 2016-03-01 17:17:59 UTC
Can you please attach the logs for heat-engine and all the nova logs for the time period you attempted the scaleup? This error message doesn't have info about the actual error, just that one occurred.

Comment 3 Raoul Scarazzini 2016-03-16 09:45:00 UTC
I finally managed to have an sosreport [1] while having the problem on an OSP7 setup with 3 controllers and 1 compute, scaling up to one additional compute.

Reading around it seems that the problem happens when you have just 1 compute. But I need to proof it with new tests that I'm doing. I will update the bug once I have news.

[1] http://file.rdu.redhat.com/~rscarazz/BZ1308419/sosreport-macb8ca3a66f440.example.com.1308419-20160316093050.tar.xz

Comment 4 Mike Burns 2016-04-07 21:11:06 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 5 Harald Jensås 2016-05-09 10:48:25 UTC
I cannot access the sosreport[1], I also tried to login to file.rdu.redhat.com via SSH to fetch it, but the file is not there or the permissions are not right.

Raoul can you make the SOSreport available?


[1] http://file.rdu.redhat.com/~rscarazz/BZ1308419/sosreport-macb8ca3a66f440.example.com.1308419-20160316093050.tar.xz

Comment 6 Raoul Scarazzini 2016-05-09 10:50:17 UTC
Done.

Comment 8 James Slagle 2016-10-14 15:21:44 UTC
This should be working in OSP10. please confirm if that's the case or not. Are you able to reproduce?

Comment 10 Jaromir Coufal 2016-10-27 08:36:31 UTC
Fabio: The reason is that it was tagged in the Whiteboard as HighAvailability.

Our QE is testing scale up in OSP10, will confirm.

Omri, could you please point to right QE to verify if the scaling up is working?

Comment 11 Fabio Massimo Di Nitto 2016-10-27 08:43:17 UTC
(In reply to Jaromir Coufal from comment #10)
> Fabio: The reason is that it was tagged in the Whiteboard as
> HighAvailability.

ack, removed. the tag pre-dates the introduction of DFGs and we used it to query bugzilla for all bugs that we filed or potentially relevant to HA. I´ll get somebody to drop it since it´s obsoleted now.

> 
> Our QE is testing scale up in OSP10, will confirm.
> 

perfect.

> Omri, could you please point to right QE to verify if the scaling up is
> working?

Comment 14 errata-xmlrpc 2016-12-14 15:22:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Comment 15 Amit Ugol 2018-05-02 10:53:42 UTC
closed, no need for needinfo.