Bug 1464456 - Upgrade 8 to 9 failed, customer skipped Updating the Configuration Agent step.
Summary: Upgrade 8 to 9 failed, customer skipped Updating the Configuration Agent step.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Sofer Athlan-Guyot
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-23 13:09 UTC by Eduard Barrera
Modified: 2020-08-13 09:30 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-30 09:51:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eduard Barrera 2017-06-23 13:09:40 UTC
Description of problem:

An overcloud update failed with the following error a:
https://bugzilla.redhat.com/show_bug.cgi?id=1443638

Error: (<unknown>): mapping values are not allowed in this context at line 334 column 42 at /var/lib/heat-config/heat-config-puppet/2c674b79-51d1-44f9-81d9-691f6227ac81.pp:16 on node overcloud-test-controller-1.localdomain
Wrapped exception:
(<unknown>): mapping values are not allowed in this context at line 334 column 42
Error: (<unknown>): mapping values are not allowed in this context at line 334 column 42 at /var/lib/heat-config/heat-config-puppet/2c674b79-51d1-44f9-81d9-691f6227ac81.pp:16 on node overcloud-test-controller-1.localdomain
We traced the error and came to conclusion that the error is caused by the file /etc/puppet/hieradata/controller.yaml line 334:
Psych::SyntaxError: (controller.yaml): mapping values are not allowed in this context at line 334 column 42
	from /usr/share/ruby/psych.rb:205:in `parse'
	from /usr/share/ruby/psych.rb:205:in `parse_stream'
	from /usr/share/ruby/psych.rb:153:in `parse'
	from /usr/share/ruby/psych.rb:129:in `load'
	from /usr/share/ruby/psych.rb:299:in `block in load_file'
	from /usr/share/ruby/psych.rb:299:in `open'
	from /usr/share/ruby/psych.rb:299:in `load_file'
	from (irb):7:in `block in irb_binding'
	from (irb):5:in `foreach'
	from (irb):5
	from /bin/irb:12:in `<main>'

The output of line 334 = ceilometer::dispatcher::gnocchi::url: ://:

The error is probably caused because step Updating the Configuration Agent[2]

Now Director is version 9 so it is not possible to do that step, so I we do it it will correspond to the step for updating from 9 to 10.


What steps should be done now to continue with the upgrade to 9 ?



[1]https://bugzilla.redhat.com/show_bug.cgi?id=1443638

[2]https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/upgrading_red_hat_openstack_platform/sect-updating_the_environment#sect-Updating_the_Configuration_Agent


2.3. Updating the Configuration Agent

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sofer Athlan-Guyot 2017-06-23 14:06:19 UTC
Hi,

If I get this properly you've achieved to upgrade to osp9 but you have skipped the step in the 2.4 documentation.

Basically this step make sure that all previous run of the heat agent are remembered after a reboot of the overcloud nodes.

So can you check the status of /var/run/heat-config /var/lib/heat-config.

Basically you want the /var/lib/heat-config to be populated.  You can review the script that makes the copy: https://github.com/openstack/heat-templates/blob/master/hot/software-config/elements/heat-config/bin/heat-config-rebuild-deployed 

Then you can check that the heat agent is using the right directory /var/lib/...
in this script /usr/libexec/os-refresh-config/configure.d/55-heat-config on the overcloud.

Comment 2 Sofer Athlan-Guyot 2017-06-23 14:06:55 UTC
Tell me if that's enough information to get you going.

Comment 3 Martin Schuppert 2017-06-26 09:35:35 UTC
(In reply to Sofer Athlan-Guyot from comment #1)
> Hi,
> 
> If I get this properly you've achieved to upgrade to osp9 but you have
> skipped the step in the 2.4 documentation.
> 
> Basically this step make sure that all previous run of the heat agent are
> remembered after a reboot of the overcloud nodes.
> 
> So can you check the status of /var/run/heat-config /var/lib/heat-config.

still using /var/run instead of /var/lib:

[mschuppe@collab-shell var]$ ll run/heat-config/
total 688
drwxrwxrwx+ 2 mschuppe mschuppe   8192 Jun 22 16:10 deployed
-rwxrwxrwx+ 1 mschuppe mschuppe 678141 Jun 22 16:09 heat-config
drwxrwxrwx+ 2 mschuppe mschuppe   4096 Jun 22 08:49 heat-config-script

[mschuppe@collab-shell var]$ ll lib/heat-config/
total 24
drwxrwxrwx+ 2 mschuppe mschuppe 4096 Jun 22 16:10 heat-config-puppet
drwxrwxrwx+ 3 mschuppe mschuppe 4096 Jun 22 15:22 heat-config-script
drwxrwxrwx+ 2 mschuppe mschuppe   44 Jun  3  2016 hooks

> 
> Basically you want the /var/lib/heat-config to be populated.  You can review
> the script that makes the copy:
> https://github.com/openstack/heat-templates/blob/master/hot/software-config/
> elements/heat-config/bin/heat-config-rebuild-deployed 
> 
> Then you can check that the heat agent is using the right directory
> /var/lib/...
> in this script /usr/libexec/os-refresh-config/configure.d/55-heat-config on
> the overcloud.

usr/libexec/os-refresh-config/configure.d/55-heat-config use the old /var/run:

HOOKS_DIR = os.environ.get('HEAT_CONFIG_HOOKS',
                           '/var/lib/heat-config/hooks')
CONF_FILE = os.environ.get('HEAT_SHELL_CONFIG',
                           '/var/run/heat-config/heat-config')
DEPLOYED_DIR = os.environ.get('HEAT_CONFIG_DEPLOYED',
                              '/var/run/heat-config/deployed')
HEAT_CONFIG_NOTIFY = os.environ.get('HEAT_CONFIG_NOTIFY',
                                    'heat-config-notify')

Basically the remaining question is if it is ok to run the above from an already upgraded undercloud to OSP9 instead of an OSP8 undercloud (overcloud is still OSP8):
 
1) from OSP9 undercloud copy the /usr/share/openstack-heat-templates/software-config/elements/heat-config/os-refresh-config/configure.d/55-heat-config to the overcloud nodes
2) on the overcloud nodes create /var/lib/heat-config/deployed
3) copy heat-config-rebuild-deployed from OSP9 undercloud to the overcloud nodes
4) run heat-config-rebuild-deployed (or manually move /var/run/heat-config/deployed to /var/lib/heat-config/deployed )

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/upgrading_red_hat_openstack_platform/sect-updating_the_environment#sect-Updating_the_Configuration_Agent

Comment 4 Sofer Athlan-Guyot 2017-06-26 10:08:20 UTC
Hi,

as seen on irc, you can directly apply the steps from the documentation.

We have cross checked that 55-heat-config from osp8 and osp9 are the same, so everything apply.

Comment 5 Eduard Barrera 2017-06-28 13:18:20 UTC
During the upgrade process from 8 to 9 step: 3.4.3 Installing Aodh https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/9/html-single/upgrading_red_hat_openstack_platform/#sect-Major-Upgrading_the_Overcloud-Aodh

Deployment is finishing the following way:


2017-06-28 07:41:26 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk-NetworkMidonetDeploymentComputes-x5tposkcnizj]: UPDATE_COMPLETE Stack UPDATE completed successfully
2017-06-28 07:41:26 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk-NetworkMidonetDeploymentControllers-ptyl3bafm72q]: UPDATE_COMPLETE Stack UPDATE completed successfully
2017-06-28 07:41:27 [NetworkMidonetDeploymentControllers]: UPDATE_COMPLETE state changed
2017-06-28 07:41:27 [NetworkMidonetDeploymentComputes]: UPDATE_COMPLETE state changed
2017-06-28 07:41:29 [overcloud-test-AllNodesExtraConfig-7maxstusxiwk]: UPDATE_COMPLETE Stack UPDATE completed successfully
2017-06-28 07:41:30 [AllNodesExtraConfig]: UPDATE_COMPLETE state changed
Stack overcloud-test UPDATE_COMPLETE
/home/stack/.ssh/known_hosts updated.
Original contents retained as /home/stack/.ssh/known_hosts.old
Authorization Failed: Unable to establish connection to https://api-test.heicloud.uni-heidelberg.de:13000/v2.0/tokens

So not sure what is the error here, is the update finishing correctly and some extra stuff is failing ?

but anyway, It is supposed that this step removes ceilometer and install aodh, but the alarm evacuator is still present. I neither understand why heat is not started:




[root@overcloud-test-controller-0 heat-admin]# pcs status |grep -i stopped -B 1
 Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]
     Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ]
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ]
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ]
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ]
 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
     Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ]
 Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator]
     Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ]
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Stopped: [ overcloud-test-controller-0 overcloud-test-controller-1 overcloud-test-controller-2 ]

Failed Actions: * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-0 'not installed' (5): call=242, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=123ms * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-1 'not installed' (5): call=239, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=124ms * openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-2 'not installed' (5): call=234, status=Not installed, exitreason='none', last-rc-change='Mon Jun 26 15:55:18 2017', queued=0ms, exec=134ms 


I check stonith is off and aparently there is no constraint stopping to start heat if ceilometer alarm is not started

These are the logs on corosync.log


un 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-0
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-0
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-0 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator:0 has failed INFINITY times on overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-1 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator:0 has failed INFINITY times on overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-2 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away from overcloud-test-controller-2 after 1000000 failures (max=1000000)
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: get_failcount_full:        openstack-ceilometer-alarm-evaluator-clone has failed INFINITY times on overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:  warning: check_migration_threshold: Forcing openstack-ceilometer-alarm-evaluator-clone away f


un 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-engine:1 on overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-engine:2 on overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api:0 on overcloud-test-controller-0
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api:1 on overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api:2 on overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api-cloudwatch:0 on overcloud-test-controller-0
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api-cloudwatch:1 on overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api-cloudwatch:2 on overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api-cfn:0 on overcloud-test-controller-0
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api-cfn:1 on overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: RecurringOp:        Start recurring monitor (60s) for openstack-heat-api-cfn:2 on overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-ceilometer-notification:0 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-ceilometer-notification:1 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-ceilometer-notification:2 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-ceilometer-notification:0 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-ceilometer-notification:1 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-ceilometer-notification:2 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:0 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: native_deallocate: Deallocating openstack-heat-api:0 from overcloud-test-controller-0
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:1 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: native_deallocate: Deallocating openstack-heat-api:1 from overcloud-test-controller-1
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:2 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: native_deallocate: Deallocating openstack-heat-api:2 from overcloud-test-controller-2
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:0 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:1 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:2 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:0 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:1 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:2 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:0 from being active
Jun 28 10:42:33 [3644] overcloud-test-controller-2.localdomain    pengine:     info: clone_update_actions_interleave:   Inhibiting openstack-heat-api:1 from being active

Comment 6 Sofer Athlan-Guyot 2017-06-30 09:51:57 UTC
Hi,

closing this one.  The new issue is tracked there https://bugzilla.redhat.com/show_bug.cgi?id=1465939

thanks,


Note You need to log in before you can comment on or make changes to this bug.