Description of problem: An overcloud update failed with the following error a: https://bugzilla.redhat.com/show_bug.cgi?id=1443638 Error: (<unknown>): mapping values are not allowed in this context at line 334 column 42 at /var/lib/heat-config/heat-config-puppet/2c674b79-51d1-44f9-81d9-691f6227ac81.pp:16 on node overcloud-test-controller-1.localdomain Wrapped exception: (<unknown>): mapping values are not allowed in this context at line 334 column 42 Error: (<unknown>): mapping values are not allowed in this context at line 334 column 42 at /var/lib/heat-config/heat-config-puppet/2c674b79-51d1-44f9-81d9-691f6227ac81.pp:16 on node overcloud-test-controller-1.localdomain We traced the error and came to conclusion that the error is caused by the file /etc/puppet/hieradata/controller.yaml line 334: Psych::SyntaxError: (controller.yaml): mapping values are not allowed in this context at line 334 column 42 from /usr/share/ruby/psych.rb:205:in `parse' from /usr/share/ruby/psych.rb:205:in `parse_stream' from /usr/share/ruby/psych.rb:153:in `parse' from /usr/share/ruby/psych.rb:129:in `load' from /usr/share/ruby/psych.rb:299:in `block in load_file' from /usr/share/ruby/psych.rb:299:in `open' from /usr/share/ruby/psych.rb:299:in `load_file' from (irb):7:in `block in irb_binding' from (irb):5:in `foreach' from (irb):5 from /bin/irb:12:in `<main>' The output of line 334 = ceilometer::dispatcher::gnocchi::url: ://: The error is probably caused because step Updating the Configuration Agent[2] Now Director is version 9 so it is not possible to do that step, so I we do it it will correspond to the step for updating from 9 to 10. What steps should be done now to continue with the upgrade to 9 ? [1]https://bugzilla.redhat.com/show_bug.cgi?id=1443638 [2]https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/upgrading_red_hat_openstack_platform/sect-updating_the_environment#sect-Updating_the_Configuration_Agent 2.3. Updating the Configuration Agent Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi, If I get this properly you've achieved to upgrade to osp9 but you have skipped the step in the 2.4 documentation. Basically this step make sure that all previous run of the heat agent are remembered after a reboot of the overcloud nodes. So can you check the status of /var/run/heat-config /var/lib/heat-config. Basically you want the /var/lib/heat-config to be populated. You can review the script that makes the copy: https://github.com/openstack/heat-templates/blob/master/hot/software-config/elements/heat-config/bin/heat-config-rebuild-deployed Then you can check that the heat agent is using the right directory /var/lib/... in this script /usr/libexec/os-refresh-config/configure.d/55-heat-config on the overcloud.
Tell me if that's enough information to get you going.
(In reply to Sofer Athlan-Guyot from comment #1) > Hi, > > If I get this properly you've achieved to upgrade to osp9 but you have > skipped the step in the 2.4 documentation. > > Basically this step make sure that all previous run of the heat agent are > remembered after a reboot of the overcloud nodes. > > So can you check the status of /var/run/heat-config /var/lib/heat-config. still using /var/run instead of /var/lib: [mschuppe@collab-shell var]$ ll run/heat-config/ total 688 drwxrwxrwx+ 2 mschuppe mschuppe 8192 Jun 22 16:10 deployed -rwxrwxrwx+ 1 mschuppe mschuppe 678141 Jun 22 16:09 heat-config drwxrwxrwx+ 2 mschuppe mschuppe 4096 Jun 22 08:49 heat-config-script [mschuppe@collab-shell var]$ ll lib/heat-config/ total 24 drwxrwxrwx+ 2 mschuppe mschuppe 4096 Jun 22 16:10 heat-config-puppet drwxrwxrwx+ 3 mschuppe mschuppe 4096 Jun 22 15:22 heat-config-script drwxrwxrwx+ 2 mschuppe mschuppe 44 Jun 3 2016 hooks > > Basically you want the /var/lib/heat-config to be populated. You can review > the script that makes the copy: > https://github.com/openstack/heat-templates/blob/master/hot/software-config/ > elements/heat-config/bin/heat-config-rebuild-deployed > > Then you can check that the heat agent is using the right directory > /var/lib/... > in this script /usr/libexec/os-refresh-config/configure.d/55-heat-config on > the overcloud. usr/libexec/os-refresh-config/configure.d/55-heat-config use the old /var/run: HOOKS_DIR = os.environ.get('HEAT_CONFIG_HOOKS', '/var/lib/heat-config/hooks') CONF_FILE = os.environ.get('HEAT_SHELL_CONFIG', '/var/run/heat-config/heat-config') DEPLOYED_DIR = os.environ.get('HEAT_CONFIG_DEPLOYED', '/var/run/heat-config/deployed') HEAT_CONFIG_NOTIFY = os.environ.get('HEAT_CONFIG_NOTIFY', 'heat-config-notify') Basically the remaining question is if it is ok to run the above from an already upgraded undercloud to OSP9 instead of an OSP8 undercloud (overcloud is still OSP8): 1) from OSP9 undercloud copy the /usr/share/openstack-heat-templates/software-config/elements/heat-config/os-refresh-config/configure.d/55-heat-config to the overcloud nodes 2) on the overcloud nodes create /var/lib/heat-config/deployed 3) copy heat-config-rebuild-deployed from OSP9 undercloud to the overcloud nodes 4) run heat-config-rebuild-deployed (or manually move /var/run/heat-config/deployed to /var/lib/heat-config/deployed ) [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/upgrading_red_hat_openstack_platform/sect-updating_the_environment#sect-Updating_the_Configuration_Agent
Hi, as seen on irc, you can directly apply the steps from the documentation. We have cross checked that 55-heat-config from osp8 and osp9 are the same, so everything apply.
During the upgrade process from 8 to 9 step: 3.4.3 Installing Aodh https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/9/html-single/upgrading_red_hat_openstack_platform/#sect-Major-Upgrading_the_Overcloud-Aodh Deployment is finishing the following way: 2017-06-28 07:41:26 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk-NetworkMidonetDeploymentComputes-x5tposkcnizj]: UPDATE_COMPLETE Stack UPDATE completed successfully 2017-06-28 07:41:26 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk-NetworkMidonetDeploymentControllers-ptyl3bafm72q]: UPDATE_COMPLETE Stack UPDATE completed successfully 2017-06-28 07:41:27 [NetworkMidonetDeploymentControllers]: UPDATE_COMPLETE state changed 2017-06-28 07:41:27 [NetworkMidonetDeploymentComputes]: UPDATE_COMPLETE state changed 2017-06-28 07:41:29 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk]: UPDATE_COMPLETE Stack UPDATE completed successfully 2017-06-28 07:41:30 [AllNodesExtraConfig]: UPDATE_COMPLETE state changed Stack overcloud-test UPDATE_COMPLETE /home/stack/.ssh/known_hosts updated. Original contents retained as /home/stack/.ssh/known_hosts.old Authorization Failed: Unable to establish connection to https://api-test.heicloud.uni-heidelberg.de:13000/v2.0/tokens So not sure what is the error here, is the update finishing correctly and some extra stuff is failing ? but anyway, It is supposed that this step removes ceilometer and install aodh, but the alarm evacuator is still present. I neither understand why heat is not started: [root@overcloud-test-controller-0 heat-admin]# pcs status |grep -i stopped -B 1 Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-engine-clone [openstack-heat-engine] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ] Failed Actions: * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-0 'not installed' (5): call=242, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=123ms * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-1 'not installed' (5): call=239, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=124ms * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-2 'not installed' (5): call=234, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=134ms I check stonith is off and aparently there is no constraint stopping to start heat if ceilometer alarm is not started These are the logs on corosync.log un 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator:0 has failed INFINITY times on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator:0 has failed INFINITY times on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-2 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-2 after 1000000 failures (max=1000000) Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: get_failcount_full: openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away f un 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-engine:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-engine:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api:0 on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cloudwatch:0 on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cloudwatch:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cloudwatch:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cfn:0 on overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cfn:1 on overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: RecurringOp: Start recurring monitor (60s) for openstack-heat-api-cfn:2 on overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-ceilometer-notification:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: native_deallocate: Deallocating openstack-heat-api:0 from overcloud-test-controller-0 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: native_deallocate: Deallocating openstack-heat-api:1 from overcloud-test-controller-1 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: native_deallocate: Deallocating openstack-heat-api:2 from overcloud-test-controller-2 Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:2 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:0 from being active Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain pengine: info: clone_update_actions_interleave: Inhibiting openstack-heat-api:1 from being active
Hi, closing this one. The new issue is tracked there https://bugzilla.redhat.com/show_bug.cgi?id=1465939 thanks,