Bug 1475404
Summary: | OSP11 -> OSP12 upgrade: redis and gnocchi haproxy backend are down after major-upgrade-composable-steps-docker.yaml | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||
Component: | puppet-tripleo | Assignee: | RHOS Maint <rhos-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 12.0 (Pike) | CC: | abeekhof, aherr, bperkins, chjones, dbecker, dciabrin, fdinitto, jjoyce, jschluet, mandreou, mburns, michele, morazi, pkilambi, rhel-osp-director-maint, rscarazz, sasha, slinaber, tvignaud, ushkalim | ||||
Target Milestone: | rc | Keywords: | Triaged | ||||
Target Release: | 12.0 (Pike) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | puppet-tripleo-7.4.3-0.20171025110205.93a9217.el7ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-12-13 21:45:43 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1493915, 1497602, 1503064 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Marius Cornea
2017-07-26 14:59:56 UTC
* Cause The reason for this is that in OSP 11 (redis in baremetal) there is a property that the redis resource agent uses: redis_REPL_INFO: controller-0 This property, after the migration to containers, prevents the resource to become master because we look for the wait_last_known_master that was master. But in a bundle the name is not controller-0 but redis-bundle-0. So something like this: redis_REPL_INFO: redis-bundle-0 As soon as we remove that global cluster property the resource starts up in master mode correctly. * Fixes We are exploring the best way to fix this. It will likely need some additional pacemaker env variables that will give us the host vs bundle distinction and some changes to the resource agents for rabbitmq and redis (which are the agents that store hostnames in properties) We're still looking into this. Likely the changes will involve both pacemaker and the resource-agents for a proper clean fix. Hi Damien, Could you please link the patches to this BZ once they can be pulled for testing? This issue is currently blocking upgrades and the only way to bypass is to disable telemetry services. Thanks! Just a drive-by update. We've been working tirelessly on this topic and I think we're slowly settling on the approach that is backwards compatible enough. Basically we will need three pieces: 1. Pacemaker changes (currently the minimum version needed is 1.1.16-12.12 and can be found here http://people.redhat.com/mbaldess/rpms/container-repo/pacemaker-bundle.repo) 2. Resource agents. A first non-final draft of the needed patch is here http://acksyn.org/files/tripleo/all-bundles-hostattribute-fixes.patch and a temporary build is in resource-agents-3.9.5-105.pidone.2.el7.centos.x86_64 3. Two additional reviews are needed. Namely: https://review.openstack.org/497766 https://review.openstack.org/495491 The reason for all this work is basically that pacemaker and resource-agent are in the common channels and so we need to make sure that every change is fully backwards-compatible and works both in BM and containers. I'll post more updates once this stuff is fully baked. For anyone coming in late, this is a race condition caused by our sensible job(s) sending a resource delete and cleanup at about the same time. *** Bug 1498540 has been marked as a duplicate of this bug. *** Verified on: puppet-tripleo-7.4.3-6.el7ost.noarch Upgrade passed and redis back-end is reachable Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462 |