Bug 1464456
Summary: | Upgrade 8 to 9 failed, customer skipped Updating the Configuration Agent step. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eduard Barrera <ebarrera> |
Component: | rhosp-director | Assignee: | Sofer Athlan-Guyot <sathlang> |
Status: | CLOSED NOTABUG | QA Contact: | Amit Ugol <augol> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 9.0 (Mitaka) | CC: | dbecker, ebarrera, mburns, mcornea, morazi, mschuppe, rhel-osp-director-maint, sathlang |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-06-30 09:51:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eduard Barrera
2017-06-23 13:09:40 UTC
Hi, If I get this properly you've achieved to upgrade to osp9 but you have skipped the step in the 2.4 documentation. Basically this step make sure that all previous run of the heat agent are remembered after a reboot of the overcloud nodes. So can you check the status of /var/run/heat-config /var/lib/heat-config. Basically you want the /var/lib/heat-config to be populated. You can review the script that makes the copy: https://github.com/openstack/heat-templates/blob/master/hot/software-config/elements/heat-config/bin/heat-config-rebuild-deployed Then you can check that the heat agent is using the right directory /var/lib/... in this script /usr/libexec/os-refresh-config/configure.d/55-heat-config on the overcloud. Tell me if that's enough information to get you going. (In reply to Sofer Athlan-Guyot from comment #1) > Hi, > > If I get this properly you've achieved to upgrade to osp9 but you have > skipped the step in the 2.4 documentation. > > Basically this step make sure that all previous run of the heat agent are > remembered after a reboot of the overcloud nodes. > > So can you check the status of /var/run/heat-config /var/lib/heat-config. still using /var/run instead of /var/lib: [mschuppe@collab-shell var]$ ll run/heat-config/ total 688 drwxrwxrwx+ 2 mschuppe mschuppe 8192 Jun 22 16:10 deployed -rwxrwxrwx+ 1 mschuppe mschuppe 678141 Jun 22 16:09 heat-config drwxrwxrwx+ 2 mschuppe mschuppe 4096 Jun 22 08:49 heat-config-script [mschuppe@collab-shell var]$ ll lib/heat-config/ total 24 drwxrwxrwx+ 2 mschuppe mschuppe 4096 Jun 22 16:10 heat-config-puppet drwxrwxrwx+ 3 mschuppe mschuppe 4096 Jun 22 15:22 heat-config-script drwxrwxrwx+ 2 mschuppe mschuppe 44 Jun 3 2016 hooks > > Basically you want the /var/lib/heat-config to be populated. You can review > the script that makes the copy: > https://github.com/openstack/heat-templates/blob/master/hot/software-config/ > elements/heat-config/bin/heat-config-rebuild-deployed > > Then you can check that the heat agent is using the right directory > /var/lib/... > in this script /usr/libexec/os-refresh-config/configure.d/55-heat-config on > the overcloud. usr/libexec/os-refresh-config/configure.d/55-heat-config use the old /var/run: HOOKS_DIR = os.environ.get('HEAT_CONFIG_HOOKS', '/var/lib/heat-config/hooks') CONF_FILE = os.environ.get('HEAT_SHELL_CONFIG', '/var/run/heat-config/heat-config') DEPLOYED_DIR = os.environ.get('HEAT_CONFIG_DEPLOYED', '/var/run/heat-config/deployed') HEAT_CONFIG_NOTIFY = os.environ.get('HEAT_CONFIG_NOTIFY', 'heat-config-notify') Basically the remaining question is if it is ok to run the above from an already upgraded undercloud to OSP9 instead of an OSP8 undercloud (overcloud is still OSP8): 1) from OSP9 undercloud copy the /usr/share/openstack-heat-templates/software-config/elements/heat-config/os-refresh-config/configure.d/55-heat-config to the overcloud nodes 2) on the overcloud nodes create /var/lib/heat-config/deployed 3) copy heat-config-rebuild-deployed from OSP9 undercloud to the overcloud nodes 4) run heat-config-rebuild-deployed (or manually move /var/run/heat-config/deployed to /var/lib/heat-config/deployed ) [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/upgrading_red_hat_openstack_platform/sect-updating_the_environment#sect-Updating_the_Configuration_Agent Hi, as seen on irc, you can directly apply the steps from the documentation. We have cross checked that 55-heat-config from osp8 and osp9 are the same, so everything apply. During the upgrade process from 8 to 9 step: 3.4.3 Installing Aodh https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/9/html-single/upgrading_red_hat_openstack_platform/#sect-Major-Upgrading_the_Overcloud-Aodh Deployment is finishing the following way: 2017-06-28 07:41:26 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk-NetworkMidonetDeploymentComputes-x5tposkcnizj]: UPDATE_COMPLETE Stack UPDATE completed successfully 2017-06-28 07:41:26 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk-NetworkMidonetDeploymentControllers-ptyl3bafm72q]: UPDATE_COMPLETE Stack UPDATE completed successfully 2017-06-28 07:41:27 [NetworkMidonetDeploymentControllers]: UPDATE_COMPLETE state changed 2017-06-28 07:41:27 [NetworkMidonetDeploymentComputes]: UPDATE_COMPLETE state changed 2017-06-28 07:41:29 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk]: UPDATE_COMPLETE Stack UPDATE completed successfully 2017-06-28 07:41:30 [AllNodesExtraConfig]: UPDATE_COMPLETE state changed Stack overcloud-test UPDATE_COMPLETE /home/stack/.ssh/known_hosts updated. Original contents retained as /home/stack/.ssh/known_hosts.old Authorization Failed: Unable to establish connection to https://api-test.heicloud.uni-heidelberg.de:13000/v2.0/tokens So not sure what is the error here, is the update finishing correctly and some extra stuff is failing ? but anyway, It is supposed that this step removes ceilometer and install aodh, but the alarm evacuator is still present. I neither understand why heat is not started: [root@overcloud-test-controller-0 heat-admin]# pcs status |grep -i stopped -B 1 Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-engine-clone [openstack-heat-engine] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Failed Actions: * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-0 'not installed' (5): call=242, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=123ms * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-1 'not installed' (5): call=239, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=124ms * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-2 'not installed' (5): call=234, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=134ms I check stonith is off and aparently there is no constraint stopping to start heat if ceilometer alarm is not started These are the logs on corosync.log un 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator:0 has failed INFINITY times on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator:0 has failed INFINITY times on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-2 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-2 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away f un 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-engine:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-engine:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api:0 on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cloudwatch:0 on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cloudwatch:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cloudwatch:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cfn:0 on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cfn:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cfn:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: native_deallocate: Deallocating openstack-heat-api:0 from overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: native_deallocate: Deallocating openstack-heat-api:1 from overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: native_deallocate: Deallocating openstack-heat-api:2 from overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active Hi, closing this one. The new issue is tracked there https://bugzilla.redhat.com/show_bug.cgi?id=1465939 thanks, |