Bug 1326823
Summary: | scaling up after upgrade from 7.3 to 8.0 brings down cinder | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> |
Component: | rhosp-director | Assignee: | Jiri Stransky <jstransk> |
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.0 (Liberty) | CC: | augol, dbecker, eharney, emacchi, jcoufal, mburns, morazi, rhel-osp-director-maint |
Target Milestone: | async | ||
Target Release: | 8.0 (Liberty) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-0.8.14-8.el7ost | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-04-20 13:04:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marius Cornea
2016-04-13 13:23:13 UTC
Adding some details here: I tried the same scenario on a fresh 8 install with 1 compute, then scaled out with an additional compute and live migration completed fine. The working environment with fresh 8 install, was it backed by Ceph too? I don't have a root cause pinned down, but posting more debugging info: The full stack trace shows that the error was triggered within check_can_live_migrate_source method in nova, specifically when executing initialize_connection in cinderclient: http://fpaste.org/355055/62068146/raw/ (The errors mentioning check_can_live_migrate_source can be found on both compute-0 and compute-1.) Inspecting cinder-api logs, it seems like haproxy returned the 504 code before cinder-api got a chance to respond, but the response from cinder-api would have been an error anyway: http://fpaste.org/355057/60562306/raw/ I found something weird when doing the live migration: http://paste.openstack.org/show/Ya7G5BVmMsiZhSq6Wbc8/ Which is related to this change: https://github.com/openstack/tripleo-heat-templates/commit/fd0b25b010db428c450b99b50ff3a0d60d263005 I think this commit is not backward compatible with the cinder volumes we created before. cinder service-list is showing 2 services, while it should show only one, I think we need to migrate volumes from the old one to the new one, with a MySQL operation (or maybe using Cinder API?). That is I think, the root issue. (In reply to Jiri Stransky from comment #3) > The working environment with fresh 8 install, was it backed by Ceph too? > Yes, it is backed by Ceph. I think Emilien is right about the root cause. On the fresh environment I can only see: +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-scheduler | hostgroup | nova | enabled | up | 2016-04-13T18:08:22.000000 | - | | cinder-volume | rbd:volumes@tripleo_ceph | nova | enabled | up | 2016-04-13T18:08:24.000000 | - | +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ Given that I believe the issue is not related to either IPv6 or SSL and will show up with all Ceph backed environment. (In reply to Emilien Macchi from comment #5) > cinder service-list is showing 2 services, while it should show only one, I > think we need to migrate volumes from the old one to the new one, with a > MySQL operation (or maybe using Cinder API?). > Yes, this seems to come from the fact that cinder.conf specifies "host=hostgroup", but "hostgroup" isn't an actual host. (Looking at overcloud-controller-0.) Thanks Emilien, Marius and Eric for the debugging. I've traced the issue you mention to backwards incompatible changes in puppet-cinder. First a change that unconditionally sets host for cinder backends to a computed non-overridable value: https://review.openstack.org/#/c/209412/ And a change that migrates from `host` to `backend_host` and makes the value configurable, but it keeps the old (wrong, backwards incompatible) behavior for default value of the property. https://review.openstack.org/#/c/231068/ I think these should be both reverted, but since they already made it into stable/liberty and stable/mitaka, it's probably easiest to just work around this in t-h-t :-/ After upgrade: stack@instack:~>>> cinder service-list /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SecurityWarning /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SecurityWarning +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-scheduler | hostgroup | nova | enabled | down | 2016-04-18T12:23:20.000000 | - | | cinder-scheduler | hostgroup | nova | enabled | up | 2016-04-18T16:41:20.000000 | - | | cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2016-04-18T16:41:19.000000 | - | +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0653.html |