Bug 1366392
| Summary: | AODH migration fails because puppet-aodh module cannot be found by Puppet | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Jiri Stransky <jstransk> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Arik Chernetsky <achernet> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.0 (Mitaka) | CC: | athomas, dbecker, emacchi, jstransk, mburns, morazi, ohochman, rhel-osp-director-maint, sasha, sclewis, srevivo, tvignaud |
| Target Milestone: | async | ||
| Target Release: | 9.0 (Mitaka) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-09-19 15:01:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1337794 | ||
This Bz blocks the path for Upgrade from: 7.3 -> 8.0 -> 9.0 this came after applying the workaround for Bz#1364583 that suggest to disable NetworkManager on the overcloud nodes - in order to successfully upgrade from 7.3 to 8.0. See: https://bugzilla.redhat.com/show_bug.cgi?id=1364583#c14 The error to take in account is: Could not find declared class ::aodh Note: don't worry about "Error: NetworkManager is not running" this is not critical. So if TripleO can't find ::aodh, this is because puppet-aodh is not installed. You need to make sure OPM was upgraded during the process and make sure the aodh is here. [heat-admin@overcloud-controller-0 ~]$ rpm -q puppet puppet-3.6.2-2.el7.noarch [heat-admin@overcloud-controller-0 ~]$ logout Connection to 192.168.0.11 closed. [stack@undercloud72 ~]$ rpm -qa|grep puppet openstack-tripleo-puppet-elements-2.0.0-4.el7ost.noarch puppet-3.6.2-4.el7sat.noarch openstack-puppet-modules-8.1.8-1.el7ost.noarch The setup was successfully upgraded from 7 to 8. When I cam to upgrade it to 9, the reported issue occured during the aodh migration step. I'm pretty sure OPM is not updated on the overcloud at the stage of Aodh migration. Again: the error is OPM is not updated. Please ignore the NetworkManager thing. Debugged the deployment. Moving the BZ to openstack-puppet-modules component and listing a workaround below. Emilien is right that the root cause is missing aodh module. It's not in fact completely missing, it's present in /usr/share/openstack-puppet/modules, but it's not symlinked from /etc/puppet/modules, where Puppet looks for modules. The symlinking is currently (probably incorrectly?) part of the overcloud image building process rather than the openstack-puppet-module RPM itself. The effect is that whenever we add a new module during RPM update of openstack-puppet-modules, it cannot be found by Puppet, because the symlink in /etc/puppet/modules is not created. We should probably move the symlinking onto the RPM level. Re-running the DIB elements on overcloud to create the symlinks is probably not a realistic solution. The workaround could to run this on every overcloud node: ln -f -s /usr/share/openstack-puppet/modules/* /etc/puppet/modules/ prior to triggering the AODH migration. (The script line is taken from what the DIB element does when building an image [1].) [1] https://github.com/openstack/tripleo-puppet-elements/blob/627b949430f9124181d4470abd908e25a9bfa760/elements/puppet-modules/install.d/puppet-modules-package-install/75-puppet-modules-package#L7 (In reply to Jiri Stransky from comment #10) > Debugged the deployment. Moving the BZ to openstack-puppet-modules component > and listing a workaround below. > > Emilien is right that the root cause is missing aodh module. It's not in > fact completely missing, it's present in > /usr/share/openstack-puppet/modules, but it's not symlinked from > /etc/puppet/modules, where Puppet looks for modules. > > The symlinking is currently (probably incorrectly?) part of the overcloud > image building process rather than the openstack-puppet-module RPM itself. > The effect is that whenever we add a new module during RPM update of > openstack-puppet-modules, it cannot be found by Puppet, because the symlink > in /etc/puppet/modules is not created. > > We should probably move the symlinking onto the RPM level. Re-running the > DIB elements on overcloud to create the symlinks is probably not a realistic > solution. > > > The workaround could to run this on every overcloud node: > > ln -f -s /usr/share/openstack-puppet/modules/* /etc/puppet/modules/ > > prior to triggering the AODH migration. (The script line is taken from what > the DIB element does when building an image [1].) > > > [1] > https://github.com/openstack/tripleo-puppet-elements/blob/ > 627b949430f9124181d4470abd908e25a9bfa760/elements/puppet-modules/install.d/ > puppet-modules-package-install/75-puppet-modules-package#L7 Given the move that DIB makes, I'm not sure that it makes sense to have the RPM do this automatically. I think I'd rather see the upgrade do this symlink. In theory, we're only going to add new modules between releases. (There is a rare case, possibly, where we add one within a release in which case I'd say do this on update and upgrade). Doing this in the rpm could have significant issues for people who use OPM outside of director or packstack. The might install them on a foreman server and import them for other hosts, but not want them in /etc/puppet on that machine. We also have to consider the packaging changes around OPM (separate rpms per module). (In reply to Mike Burns from comment #11) > Doing this in the rpm could have significant issues for people who use OPM > outside of director or packstack. The might install them on a foreman > server and import them for other hosts, but not want them in /etc/puppet on > that machine. Ok that's fair. The solution on upgrade side will probably be a bit hacky because we'll need to put it into the AODH migration (as there's nothing prior that running on the cloud during upgrade) to fix up existing OSP 8 which are already in wrong state. The above would be just a mitaka-specific fix. We'd deal with this properly upstream in a slightly different way -- we should probably trigger the symlinking both on minor updates and major upgrades to keep consistent state at all times, though the chance of breakage during minor update is low. Moving back to t-h-t then :) Submitted a Mitaka-only patch to fixup existing Liberty deployments before doing AODH migration: https://review.openstack.org/#/c/355446 And also a patch that should prevent getting the deployment into a bad state in the future: https://review.openstack.org/#/c/356028 I'd still like to investigate if we can move away from the symlinks altogether, but such solution has some minor conflict potential (e.g. custom roles in TripleO, and probably unclean backport to mitaka), so putting the fix into the updates/upgrades first could still be a way to go. Submitted another patch for TripleO to not depend on the /etc/puppet/modules symlinks. https://review.openstack.org/#/c/356457/ The mitaka-only fixup already got merged and solves our immediate problem. The other two are more forward-looking to prevent such issues in the future, but we don't necessarily need them for OSP 7->8->9 upgrade. I suggest backporting only the mitaka-only fixup for now, as it should fix the problem during AODH migration. I'll also change the BZ title accordingly. Short term fix went into Mitaka and a long term solution to prevent similar issues from happening has been merged to Newton. Closing this BZ as it is predominantly about the fixed Mitaka issue. Adding the Newton patch too into the external trackers. |
rhel-osp-director: 8->9 upgrade of overcloud with network manager being disabled on OC nodes fails Environment: openstack-tripleo-heat-templates-2.0.0-30.el7ost.noarch openstack-puppet-modules-8.1.8-1.el7ost.noarch instack-undercloud-4.0.0-11.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-30.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-16.el7ost.noarch Step to reproduce: 1. Have an overcloud with Network Manager disabled on nodes. 2. Attempt to upgrade to 9 Result: 2016-06-29 13:29:21 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:31 [1]: SIGNAL_IN_PROGRESS Signal: deployment failed (1) 2016-06-29 13:29:31 [1]: CREATE_FAILED Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-06-29 13:29:32 [overcloud-UpdateWorkflow-ehdq5lr3lzcp-AodhUpgradeConfigDeployment-c4nltoaweugv]: CREATE_FAILED Resource CREATE failed: Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-06-29 13:29:33 [AodhUpgradeConfigDeployment]: CREATE_FAILED Error: resources.AodhUpgradeConfigDeployment.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-06-29 13:29:34 [overcloud-UpdateWorkflow-ehdq5lr3lzcp]: UPDATE_FAILED Error: resources.AodhUpgradeConfigDeployment.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-06-29 13:29:35 [UpdateWorkflow]: UPDATE_FAILED resources.UpdateWorkflow: Error: resources.AodhUpgradeConfigDeployment.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-06-29 13:29:35 [1]: SIGNAL_IN_PROGRESS Signal: deployment succeeded 2016-06-29 13:29:36 [1]: UPDATE_COMPLETE state changed 2016-06-29 13:29:37 [overcloud-ControllerAllNodesValidationDeployment-szghjo6u4hqw]: UPDATE_COMPLETE Stack UPDATE completed successfully 2016-06-29 13:29:38 [ControllerAllNodesValidationDeployment]: UPDATE_COMPLETE state changed 2016-06-29 13:29:38 [overcloud]: UPDATE_FAILED resources.UpdateWorkflow: Error: resources.AodhUpgradeConfigDeployment.resources[2]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-06-29 13:29:40 [1]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:41 [1]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:42 [1]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:43 [1]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:44 [1]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:44 [ControllerDeployment]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:45 [1]: SIGNAL_COMPLETE Unknown 2016-06-29 13:29:46 [NetworkDeployment]: SIGNAL_COMPLETE Unknown Stack overcloud UPDATE_FAILED Deployment failed: Heat Stack update failed. "deploy_stderr": "Error: NetworkManager is not running.\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mError: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::aodh at /var/lib/heat-config/heat-config-puppet/d9c22157-4dac-4144-9be0-1e4e3606866f.pp:30 on node overcloud-controller-1.localdomain\nWrapped exception:\nCould not find declared class ::aodh\u001b[0m\n\u001b[1;31mError: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::aodh at /var/lib/heat-config/heat-config-puppet/d9c22157-4dac-4144-9be0-1e4e3606866f.pp:30 on node overcloud-controller-1.localdomain\u001b[0m\n", "deploy_status_code": 1