Bug 1308419 - Unable to scale up a deployed overcloud by adding a new compute node [NEEDINFO]
Unable to scale up a deployed overcloud by adding a new compute node
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient (Show other bugs)
8.0 (Liberty)
x86_64 Linux
unspecified Severity high
: rc
: 10.0 (Newton)
Assigned To: RHOS Maint
Omri Hochman
: TestOnly, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-15 02:18 EST by Raoul Scarazzini
Modified: 2016-12-14 10:22 EST (History)
19 users (show)

See Also:
Fixed In Version: python-tripleoclient-5.3.0-1.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-12-14 10:22:54 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jcoufal: needinfo? (ohochman)


Attachments (Terms of Use)

  None (edit)
Description Raoul Scarazzini 2016-02-15 02:18:39 EST
Description of problem:

I'm not able to add a new compute node to a deployed overcloud environment following the steps described on the official documentation https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Advanced-Registering_Nodes_for_the_Advanced_Overcloud.

The update operation ends with this error:

2016-02-12 12:15:46 overcloud UPDATE_IN_PROGRESS Stack UPDATE started
...
...
2016-02-12 12:35:34 Compute UPDATE_FAILED resources.Compute: resources[0]: ClientException: resources.TenantPort: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'keystoneclient.exceptions.RequestTimeout'> (HTTP 500) (Req
2016-02-12 12:39:46 Controller UPDATE_FAILED UPDATE aborted
2016-02-12 12:39:47 overcloud UPDATE_FAILED resources.Compute: resources[0]: ClientException: resources.TenantPort: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'keystoneclient.exceptions.RequestTimeout'> (HTTP 500) (Req

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 7.2 (Maipo)
openstack-tripleo-heat-templates-0.8.7-2.el7ost.noarch
instack-undercloud-2.2.1-2.el7ost.noarch

How reproducible:

Deploy an overcloud with a command line similar to this one:

openstack overcloud deploy --templates --libvirt-type=kvm --ntp-server 10.5.26.10 --control-scale 3 --compute-scale 1 --ceph-storage-scale 0 --block-storage-scale 0 --swift-storage-scale 0 --control-flavor baremetal --compute-flavor baremetal --ceph-storage-flavor baremetal --block-storage-flavor baremetal --swift-storage-flavor baremetal -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml --neutron-bridge-mappings datacentre:br-floating

And after the successful deployment, try to add the new compute node:

$ openstack baremetal import --json newnodes.json
$ ironic node-list
$ ironic node-set-maintenance [NODE UUID] true
$ openstack baremetal introspection start [NODE UUID]
$ ironic node-set-maintenance [NODE UUID] false
$ ironic node-update [NODE UUID] add properties/capabilities='profile:baremetal,boot_option:local'
$ ironic node-update [NODE UUID] add driver_info/deploy_kernel='09b40e3d-0382-4925-a356-3a4b4f36b514'
$ ironic node-update [NODE UUID] add driver_info/deploy_ramdisk='765a46af-4417-4592-91e5-a300ead3faf6'
$ openstack overcloud deploy --templates --compute-scale 3 -e [...]

Actual results:

Failure (see error above).

Expected results:

New compute nodes should become available.
Comment 2 Ryan Brown 2016-03-01 12:17:59 EST
Can you please attach the logs for heat-engine and all the nova logs for the time period you attempted the scaleup? This error message doesn't have info about the actual error, just that one occurred.
Comment 3 Raoul Scarazzini 2016-03-16 05:45:00 EDT
I finally managed to have an sosreport [1] while having the problem on an OSP7 setup with 3 controllers and 1 compute, scaling up to one additional compute.

Reading around it seems that the problem happens when you have just 1 compute. But I need to proof it with new tests that I'm doing. I will update the bug once I have news.

[1] http://file.rdu.redhat.com/~rscarazz/BZ1308419/sosreport-macb8ca3a66f440.example.com.1308419-20160316093050.tar.xz
Comment 4 Mike Burns 2016-04-07 17:11:06 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 5 Harald Jensås 2016-05-09 06:48:25 EDT
I cannot access the sosreport[1], I also tried to login to file.rdu.redhat.com via SSH to fetch it, but the file is not there or the permissions are not right.

Raoul can you make the SOSreport available?


[1] http://file.rdu.redhat.com/~rscarazz/BZ1308419/sosreport-macb8ca3a66f440.example.com.1308419-20160316093050.tar.xz
Comment 6 Raoul Scarazzini 2016-05-09 06:50:17 EDT
Done.
Comment 8 James Slagle 2016-10-14 11:21:44 EDT
This should be working in OSP10. please confirm if that's the case or not. Are you able to reproduce?
Comment 10 Jaromir Coufal 2016-10-27 04:36:31 EDT
Fabio: The reason is that it was tagged in the Whiteboard as HighAvailability.

Our QE is testing scale up in OSP10, will confirm.

Omri, could you please point to right QE to verify if the scaling up is working?
Comment 11 Fabio Massimo Di Nitto 2016-10-27 04:43:17 EDT
(In reply to Jaromir Coufal from comment #10)
> Fabio: The reason is that it was tagged in the Whiteboard as
> HighAvailability.

ack, removed. the tag pre-dates the introduction of DFGs and we used it to query bugzilla for all bugs that we filed or potentially relevant to HA. I´ll get somebody to drop it since it´s obsoleted now.

> 
> Our QE is testing scale up in OSP10, will confirm.
> 

perfect.

> Omri, could you please point to right QE to verify if the scaling up is
> working?
Comment 14 errata-xmlrpc 2016-12-14 10:22:54 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.