Bug 1322509
Summary: | Upgrade from OSP Director 7.3 -> OSP Director 8.0 poodle fails with bad rabbitmq credentials | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Andreas Karis <akaris> |
Component: | openstack-tripleo-heat-templates | Assignee: | Marios Andreou <mandreou> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Omri Hochman <ohochman> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 (Kilo) | CC: | akaris, mburns, rhel-osp-director-maint |
Target Milestone: | ga | Keywords: | Reopened, TestOnly |
Target Release: | 8.0 (Liberty) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-04-28 13:51:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andreas Karis
2016-03-30 15:33:34 UTC
*** This bug has been marked as a duplicate of bug 1321132 *** I don't think that this is a duplicate of 1321132 We fixed 1321132 in our deployment, but then got past it and hit the above at a later stage. The above is valid for nova, neutron, and all other services. The above bug is about rabbitmq: the password is updated in the configuration, but it is _not_ updated in rabbitmq What's missing is a command similar to this: [root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest 2DvVMQpV9trHBdyFZvepT9kJW (In reply to Andreas Karis from comment #2) > I don't think that this is a duplicate of 1321132 > > We fixed 1321132 in our deployment, but then got past it and hit the above > at a later stage. The above is valid for nova, neutron, and all other > services. The above bug is about rabbitmq: the password is updated in the > configuration, but it is _not_ updated in rabbitmq > > What's missing is a command similar to this: > > [root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest > 2DvVMQpV9trHBdyFZvepT9kJW The neutron issue in the bug that I cloned this for was due to the rabbit password issue. I still think it's a duplicate, but I'll let Marios make that determination. (In reply to Mike Burns from comment #4) > (In reply to Andreas Karis from comment #2) > > I don't think that this is a duplicate of 1321132 > > > > We fixed 1321132 in our deployment, but then got past it and hit the above > > at a later stage. The above is valid for nova, neutron, and all other > > services. The above bug is about rabbitmq: the password is updated in the > > configuration, but it is _not_ updated in rabbitmq > > > > What's missing is a command similar to this: > > > > [root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest > > 2DvVMQpV9trHBdyFZvepT9kJW > > The neutron issue in the bug that I cloned this for was due to the rabbit > password issue. I still think it's a duplicate, but I'll let Marios make > that determination. Hi Andreas, Mike I was about to say yes this is a duplicate of 1321132 but then noticed that the output is from the *compute* node in the description above. We need to be clear about what _exactly_ the upgrade fails on here - I mean, there would have been a 'Stack update failed because foo' at some point. I ask because the issue shown above, to be clear, that neutron-openvswitch-agent on the compute node can't talk to rabbit, wouldn't have caused the stack update to fail. Note that the neutron-openvswitch-agent on the compute node issue is upstream at https://bugs.launchpad.net/tripleo/+bug/1563437 and we have a fix for it at https://review.openstack.org/#/c/298946/ (you should be carrying this if you are following the upgrades kbase - ping me on irc if you don't have the details). You mention "all services" can't connect to rabbitmq, so you may have seen on the controller something like: Mar 24 17:40:58 overcloud-controller-1.localdomain cinder-scheduler[24723]: 2016-03-24 17:40:58.586 24723 ERROR oslo.messaging._drivers.impl_rabbit [req-cd90f482-3f61-4c8d-9a72-d68927bb7c84 - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed Mar 24 17:40:58 overcloud-controller-1.localdomain keystone-all[22108]: 2016-03-24 17:40:58.861 22121 ERROR oslo.messaging._drivers.impl_rabbit [req-b734d341-fe4d-4a1f-b143-f43bfb436408 - - - - -] AMQP server 172.16.2.7:5672 closed the connection. Check login credentials: Socket closed If you have the fix for 1321132 applied here then this ^^^ also won't cause the stack to fail - though you will see this in the logs until we restart rabbit and all-the-things so they pickup the new credentials (happens towards the end of the converge step). Can we have more info about what the stack fails on here, it could be something else entirely and the error messages here (about all-the-things on controller and openvswitch-agent on compute not talking to rabbitmq cos credentials) are just misleading, thanks, marios (In reply to Andreas Karis from comment #2) > I don't think that this is a duplicate of 1321132 > > We fixed 1321132 in our deployment, but then got past it and hit the above > at a later stage. The above is valid for nova, neutron, and all other > services. The above bug is about rabbitmq: the password is updated in the > configuration, but it is _not_ updated in rabbitmq > > What's missing is a command similar to this: > > [root@overcloud-controller-0 rabbitmq]# rabbitmqctl change_password guest > 2DvVMQpV9trHBdyFZvepT9kJW Hi Andreas, you are right we need to make sure the services (rabbit but also all the things that talk to it) restart after the new password is set during upgrade. We make that happen by setting the UpdateIdentifier at https://review.openstack.org/#/c/298685/. Related to that is that we *don't* want puppet to restart services after the config changes (and before we do the controlled/ordered restart with pacemaker) so we have https://review.openstack.org/#/c/298695/ to prevent it (bug in puppet-neutron so we carry the fix for now). As you reported in the description above, there is also a problem with neutron-openvswitch-agent on the compute node and we have this https://review.openstack.org/#/c/299303/ which causes the service to be notified of the config change and be restarted, hope that helps, please see my previous comment above about what the stack update fails on exactly, it could be something else, thanks, marios This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. Unable to reproduce the issue when upgraded from 7.3 latest to 8.0GA . instack-0.0.8-2.el7ost.noarch instack-undercloud-2.2.7-4.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-9.el7ost.noarch python-heatclient-1.0.0-1.el7ost.noarch openstack-heat-api-cfn-5.0.1-5.el7ost.noarch openstack-heat-templates-0-0.8.20150605git.el7ost.noarch openstack-heat-engine-5.0.1-5.el7ost.noarch The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |