Bug 1326883 - After compute node scaling in a mixed UC8-OC7 environment, nova-compute service cannot start
Summary: After compute node scaling in a mixed UC8-OC7 environment, nova-compute servi...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Eoghan Glynn
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-13 16:02 UTC by Dan Yasny
Modified: 2019-09-09 16:34 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-15 13:32:36 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Dan Yasny 2016-04-13 16:02:28 UTC
Description of problem:
I am testing the support for managing overcloud 7.3 from undercloud 8
The flow:
1. deploy a standard setup - 3 controllers, 1 compute and 1 ceph, network isolation and SSL using 7.3 GA
2. populate the overcloud with instances, tenants, objects, volumes, etc
3. upgrade undercloud to 8
4. fix the known issues from 1325702 and 1326644
5. bring up vlan10, to restore connectivity
6. verify the populated objects are alive and still exist in the overcloud
7. update tripleo-overcloud-passwords with "OVERCLOUD_RABBITMQ_PASSWORD=guest" (BZ1320333)
8. rerun the deploy command, pointing to a local tht dir, that ocntains the kilo templates, and changing the compute count from 1 to 2
9. wait for the deployment to complete. It actually failed, with UPDATE_FAILED; stack_status_reason : Engine went down during stack UPDATE
10. check ironic, and nova on UC - looks like the second compute got added fine.
11. tried to see if I can stop the instances and restart them again, so they would spread across the two computes, and found I can't stop the VMs. 
12. checked the compute node, and saw that it has a cycling message in the nova-compute log:
2016-04-13 15:11:30.687 30346 ERROR oslo_messaging._drivers.impl_rabbit [req-377bb6b2-4ed0-47d8-aa4e-a4c85a8dc8d1 - - - - -] AMQP server 192.168.100.13:5672 closed the connection. Check login credentials: Socket closed

Version-Release number of selected component (if applicable):
7.3GA on the overcloud

Undercloud on 8 puddle:
python-django-openstack-auth-2.0.1-1.2.el7ost.noarch
openstack-dashboard-8.0.1-2.el7ost.noarch
openstack-heat-engine-5.0.1-5.el7ost.noarch
openstack-nova-scheduler-12.0.2-5.el7ost.noarch
openstack-neutron-ml2-7.0.1-15.el7ost.noarch
openstack-ironic-api-4.2.2-4.el7ost.noarch
openstack-ceilometer-collector-5.0.2-2.el7ost.noarch
openstack-ironic-inspector-2.2.5-2.el7ost.noarch
openstack-selinux-0.6.58-1.el7ost.noarch
openstack-tuskar-0.4.18-5.el7ost.noarch
openstack-tripleo-image-elements-0.9.9-1.el7ost.noarch
openstack-swift-2.5.0-2.el7ost.noarch
openstack-ceilometer-notification-5.0.2-2.el7ost.noarch
openstack-neutron-common-7.0.1-15.el7ost.noarch
python-openstackclient-1.7.2-1.el7ost.noarch
openstack-dashboard-theme-8.0.1-2.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.1-5.el7ost.noarch
openstack-tempest-liberty-20160317.1.el7ost.noarch
openstack-nova-console-12.0.2-5.el7ost.noarch
openstack-nova-novncproxy-12.0.2-5.el7ost.noarch
openstack-ironic-conductor-4.2.2-4.el7ost.noarch
openstack-glance-11.0.1-4.el7ost.noarch
openstack-keystone-8.0.1-1.el7ost.noarch
openstack-puppet-modules-7.0.17-1.el7ost.noarch
openstack-tripleo-0.0.7-1.el7ost.noarch
openstack-nova-cert-12.0.2-5.el7ost.noarch
openstack-neutron-openvswitch-7.0.1-15.el7ost.noarch
openstack-ceilometer-alarm-5.0.2-2.el7ost.noarch
openstack-swift-object-2.5.0-2.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-tuskar-ui-0.4.0-5.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch
openstack-ceilometer-common-5.0.2-2.el7ost.noarch
openstack-ironic-common-4.2.2-4.el7ost.noarch
openstack-heat-common-5.0.1-5.el7ost.noarch
openstack-heat-api-cfn-5.0.1-5.el7ost.noarch
openstack-nova-conductor-12.0.2-5.el7ost.noarch
openstack-ceilometer-central-5.0.2-2.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch
openstack-ceilometer-polling-5.0.2-2.el7ost.noarch
openstack-tripleo-common-0.3.1-1.el7ost.noarch
openstack-heat-api-5.0.1-5.el7ost.noarch
openstack-nova-api-12.0.2-5.el7ost.noarch
openstack-swift-proxy-2.5.0-2.el7ost.noarch
openstack-swift-container-2.5.0-2.el7ost.noarch
openstack-nova-common-12.0.2-5.el7ost.noarch
openstack-nova-compute-12.0.2-5.el7ost.noarch
openstack-neutron-7.0.1-15.el7ost.noarch
openstack-ceilometer-api-5.0.2-2.el7ost.noarch
openstack-swift-account-2.5.0-2.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-2.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.5-1.el7ost.noarch
openstack-swift-plugin-swift3-1.9-1.el7ost.noarch


How reproducible:
once so far

Steps to Reproduce:
1. see above
2.
3.

Actual results:
on the old compute, nova-compute service is stuck on starting

Expected results:

heat stack scale should work and nova should work
Additional info:

setup is available for investigation

Comment 2 Brad P. Crochet 2016-04-15 13:32:36 UTC
It appears that on the system in question, the overcloud deploy was run from a directory different from the initial run. What occurred was that the tripleo-overcloud-passwords file was regenerated (as was all of the passwords), and so it put the stack into an indeterminate state. If this can be reproduced when using the same password file, then please reopen. Otherwise, closing this.


Note You need to log in before you can comment on or make changes to this bug.