Bug 1322509 - Upgrade from OSP Director 7.3 -> OSP Director 8.0 poodle fails with bad rabbitmq credentials
Summary: Upgrade from OSP Director 7.3 -> OSP Director 8.0 poodle fails with bad rabbi...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Marios Andreou
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-30 15:33 UTC by Andreas Karis
Modified: 2023-09-14 03:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-28 13:51:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andreas Karis 2016-03-30 15:33:34 UTC
Description of problem:
Upgrading OSP Director from 7.3 to 8.0 with
`rhos-release -P 8-director -d`
Fails with bad rabbitmq credentials


Version-Release number of selected component (if applicable):
7.3 -> 8.0 poodle

How reproducible:
During final overcloud upgrade:
openstack overcloud deploy --templates --stack overcloud -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e ${template_base_dir}/network-environment.yaml -e ${template_base_dir}/enable-tls.yaml -e ${template_base_dir}/cloudname.yaml -e ${template_base_dir}/inject-trust-anchor.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml -e ${template_base_dir}/rhos-release-8.yaml

 all services (nova, neutron, etc.) will report that they cannot join rabbitmq and stack update fails because of this:
Mar 30 14:35:00 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:00.142 26833 ERROR oslo.messaging._drivers.impl_rabbit [req-2dd79883-5e6e-4972-bfad-f1c5471197e7 - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:04 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:04.085 26833 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:04 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:04.159 26833 ERROR oslo.messaging._drivers.impl_rabbit [req-2dd79883-5e6e-4972-bfad-f1c5471197e7 - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:09 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:09.102 26833 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:09 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:09.178 26833 ERROR oslo.messaging._drivers.impl_rabbit [req-2dd79883-5e6e-4972-bfad-f1c5471197e7 - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:16 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:16.124 26833 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:16 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:16.199 26833 ERROR oslo.messaging._drivers.impl_rabbit [req-2dd79883-5e6e-4972-bfad-f1c5471197e7 - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:24 overcloud-compute-0.localdomain nova-compute[15275]: 2016-03-30 14:35:24.056 15275 ERROR oslo.messaging._drivers.impl_rabbit [req-951ad31b-bf1b-4e3c-9d75-f0b274e0d30c - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:25 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:25.147 26833 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed
Mar 30 14:35:25 overcloud-compute-0.localdomain neutron-openvswitch-agent[26833]: 2016-03-30 14:35:25.216 26833 ERROR oslo.messaging._drivers.impl_rabbit [req-2dd79883-5e6e-4972-bfad-f1c5471197e7 - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed

Rabbitmq configuration /etc/rabbitmq/rabbitmq.config got updated with new credentials, but rabbit is still using guest / guest. Verification: Changing password back to guest/guest in configuration files fixes this.
 
In order to make it work, we needed to run the following manually:
[root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest 2DvVMQpV9trHBdyFZvepT9kJW
Changing password for user "guest" ...
...done.

After which final step of deployment succeeded.

Comment 1 Mike Burns 2016-03-30 17:46:17 UTC

*** This bug has been marked as a duplicate of bug 1321132 ***

Comment 2 Andreas Karis 2016-03-30 18:51:28 UTC
I don't think that this is a duplicate of 1321132

We fixed 1321132 in our deployment, but then got past it and hit the above at a later stage. The above is valid for nova, neutron, and all other services. The above bug is about rabbitmq: the password is updated in the configuration, but it is _not_ updated in rabbitmq

What's missing is a command similar to this:

[root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest 2DvVMQpV9trHBdyFZvepT9kJW

Comment 4 Mike Burns 2016-03-31 00:24:57 UTC
(In reply to Andreas Karis from comment #2)
> I don't think that this is a duplicate of 1321132
> 
> We fixed 1321132 in our deployment, but then got past it and hit the above
> at a later stage. The above is valid for nova, neutron, and all other
> services. The above bug is about rabbitmq: the password is updated in the
> configuration, but it is _not_ updated in rabbitmq
> 
> What's missing is a command similar to this:
> 
> [root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest
> 2DvVMQpV9trHBdyFZvepT9kJW

The neutron issue in the bug that I cloned this for was due to the rabbit password issue.  I still think it's a duplicate, but I'll let Marios make that determination.

Comment 5 Marios Andreou 2016-03-31 07:14:10 UTC
(In reply to Mike Burns from comment #4)
> (In reply to Andreas Karis from comment #2)
> > I don't think that this is a duplicate of 1321132
> > 
> > We fixed 1321132 in our deployment, but then got past it and hit the above
> > at a later stage. The above is valid for nova, neutron, and all other
> > services. The above bug is about rabbitmq: the password is updated in the
> > configuration, but it is _not_ updated in rabbitmq
> > 
> > What's missing is a command similar to this:
> > 
> > [root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest
> > 2DvVMQpV9trHBdyFZvepT9kJW
> 
> The neutron issue in the bug that I cloned this for was due to the rabbit
> password issue.  I still think it's a duplicate, but I'll let Marios make
> that determination.

Hi Andreas, Mike

I was about to say yes this is a duplicate of 1321132 but then noticed that the output is from the *compute* node in the description above. We need to be clear about what _exactly_ the upgrade fails on here - I mean, there would have been a 'Stack update failed because foo' at some point. I ask because the issue shown above, to be clear, that neutron-openvswitch-agent on the compute node can't talk to rabbit, wouldn't have caused the stack update to fail. Note that the neutron-openvswitch-agent on the compute node issue is upstream at https://bugs.launchpad.net/tripleo/+bug/1563437 and we have a fix for it at https://review.openstack.org/#/c/298946/ (you should be carrying this if you are following the upgrades kbase - ping me on irc if you don't have the details).

You mention "all services" can't connect to rabbitmq, so you may have seen on the controller something like:

Mar  24 17:40:58 overcloud-controller-1.localdomain cinder-scheduler[24723]:  2016-03-24 17:40:58.586 24723 ERROR oslo.messaging._drivers.impl_rabbit  [req-cd90f482-3f61-4c8d-9a72-d68927bb7c84 - - - - -] AMQP server  172.16.2.7:5672 closed the connection. Check login credentials: Socket  closed
Mar 24 17:40:58  overcloud-controller-1.localdomain keystone-all[22108]: 2016-03-24  17:40:58.861 22121 ERROR oslo.messaging._drivers.impl_rabbit  [req-b734d341-fe4d-4a1f-b143-f43bfb436408 - - - - -] AMQP server  172.16.2.7:5672 closed the connection. Check login credentials: Socket  closed

If you have the fix for 1321132 applied here then this ^^^ also won't cause the stack to fail - though you will see this in the logs until we restart rabbit and all-the-things so they pickup the new credentials (happens towards the end of the converge step).

Can we have more info about what the stack fails on here, it could be something else entirely and the error messages here (about all-the-things on controller and openvswitch-agent on compute not talking to rabbitmq cos credentials) are just misleading,

thanks, marios

Comment 6 Marios Andreou 2016-03-31 07:25:14 UTC
(In reply to Andreas Karis from comment #2)
> I don't think that this is a duplicate of 1321132
> 
> We fixed 1321132 in our deployment, but then got past it and hit the above
> at a later stage. The above is valid for nova, neutron, and all other
> services. The above bug is about rabbitmq: the password is updated in the
> configuration, but it is _not_ updated in rabbitmq
> 
> What's missing is a command similar to this:
> 
> [root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest
> 2DvVMQpV9trHBdyFZvepT9kJW

Hi Andreas, you are right we need to make sure the services (rabbit but also all the things that talk to it) restart after the new password is set during upgrade. We make that happen by setting the UpdateIdentifier at https://review.openstack.org/#/c/298685/. Related to that is that we *don't* want puppet to restart services after the config changes (and before we do the controlled/ordered restart with pacemaker) so we have https://review.openstack.org/#/c/298695/ to prevent it (bug in puppet-neutron so we carry the fix for now).

As you reported in the description above, there is also a problem with neutron-openvswitch-agent on the compute node and we have this https://review.openstack.org/#/c/299303/ which causes the service to be notified of the config change and be restarted,

hope that helps, please see my previous comment above about what the stack update fails on exactly, it could be something else,

thanks, marios

Comment 7 Mike Burns 2016-04-07 21:36:02 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 12 Omri Hochman 2016-04-21 00:17:00 UTC
Unable to reproduce the issue when upgraded from 7.3 latest to 8.0GA . 

instack-0.0.8-2.el7ost.noarch
instack-undercloud-2.2.7-4.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-9.el7ost.noarch
python-heatclient-1.0.0-1.el7ost.noarch
openstack-heat-api-cfn-5.0.1-5.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-heat-engine-5.0.1-5.el7ost.noarch

Comment 13 Red Hat Bugzilla 2023-09-14 03:20:24 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.