Hide Forgot
Description of problem:Following https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/upgrading-red-hat-openstack-platform/#sect-Major-Upgrading_the_Overcloud-Aodh, I ran the first portion of the deploy with /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-aodh.yaml env file. It failed the first time because ceilometer-alarm-evaluator was not stopped. I stopped ceilometer-alarm-evaluator in pcs and re-ran the deploy. This time it errored out saying ceilometer-alarm-evaluator could not be found. I noticed the update script had deleted the service but the update script does not seem to be resilient enough to handle it being gone and re-deploying the script again. I will post all the logs / commands below. Version-Release number of selected component (if applicable): director osp8 , updated to osp9 before the overcloud deploy How reproducible: unknown Steps to Reproduce: 1.steps above 2. 3. Additional info: [root@overcloud-controller-0 heat-admin]# pcs status |grep -A3 ceilometer Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] (unmanaged) Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-engine-clone [openstack-heat-engine] (unmanaged) openstack-heat-engine (systemd:openstack-heat-engine): Started overcloud-controller-1 (unmanaged) -- Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] (unmanaged) openstack-ceilometer-api (systemd:openstack-ceilometer-api): Started overcloud-controller-1 (unmanaged) openstack-ceilometer-api (systemd:openstack-ceilometer-api): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-api (systemd:openstack-ceilometer-api): Started overcloud-controller-2 (unmanaged) Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] (unmanaged) Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] (unmanaged) -- Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] (unmanaged) openstack-ceilometer-collector (systemd:openstack-ceilometer-collector): Started overcloud-controller-1 (unmanaged) openstack-ceilometer-collector (systemd:openstack-ceilometer-collector): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-collector (systemd:openstack-ceilometer-collector): Started overcloud-controller-2 (unmanaged) Clone Set: openstack-keystone-clone [openstack-keystone] (unmanaged) openstack-keystone (systemd:openstack-keystone): Started overcloud-controller-1 (unmanaged) openstack-keystone (systemd:openstack-keystone): Started overcloud-controller-0 (unmanaged) -- Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] (unmanaged) openstack-ceilometer-notification (systemd:openstack-ceilometer-notification): Started overcloud-controller-1 (unmanaged) openstack-ceilometer-notification (systemd:openstack-ceilometer-notification): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-notification (systemd:openstack-ceilometer-notification): Started overcloud-controller-2 (unmanaged) Clone Set: openstack-cinder-api-clone [openstack-cinder-api] (unmanaged) openstack-cinder-api (systemd:openstack-cinder-api): Started overcloud-controller-1 (unmanaged) openstack-cinder-api (systemd:openstack-cinder-api): Started overcloud-controller-0 (unmanaged) -- Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] (unmanaged) openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started overcloud-controller-1 (unmanaged) openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started overcloud-controller-2 (unmanaged) Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] (unmanaged) openstack-heat-api-cfn (systemd:openstack-heat-api-cfn): Started overcloud-controller-1 (unmanaged) openstack-heat-api-cfn (systemd:openstack-heat-api-cfn): Started overcloud-controller-0 (unmanaged) [root@overcloud-controller-0 heat-admin]# ###controller 1 o-collect-config (heat-config) [WARNING] Skipping config d98d6438-6b4d-46fd-8a51-7bb4aa7e3bf9, already deployed 45:13,731] (heat-config) [WARNING] To force-deploy, rm /var/run/heat-config/deployed/d98d6438-6b4d-46fd-8a51-7bb4aa7e3bf9.json 45:13,732] (heat-config) [DEBUG] Running /var/lib/heat-config/hooks/script < /var/run/heat-config/deployed/e55975c3-8ad7-430e-bf52-34a46b23474a.json 45:15,044] (heat-config) [INFO] {"deploy_stdout": " Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] (unmanaged)\n", "deploy_stderr": "Error 45:15,045] (heat-config) [DEBUG] [2016-10-05 19:45:13,767] (heat-config) [INFO] deploy_server_id=e393527a-042b-4cac-ae3f-660595551211 45:13,767] (heat-config) [INFO] deploy_action=CREATE 45:13,767] (heat-config) [INFO] deploy_stack_id=overcloud-UpdateWorkflow-scacp7uesqfa-AodhPreUpgradeDeployment-se43m2criqnr/a5a8f420-86e9-4e81-8c11-4226917d5c77 45:13,767] (heat-config) [INFO] deploy_resource_name=0 45:13,767] (heat-config) [INFO] deploy_signal_transport=CFN_SIGNAL 45:13,768] (heat-config) [INFO] deploy_signal_id=http://192.0.3.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Ae67e4f8a71b7460e8c83d4bce82571e7%3Astacks%2Fovercloud-UpdateWorkflow-scac 45:13,768] (heat-config) [INFO] deploy_signal_verb=POST 45:13,768] (heat-config) [DEBUG] Running /var/lib/heat-config/heat-config-script/e55975c3-8ad7-430e-bf52-34a46b23474a 45:15,040] (heat-config) [INFO] Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] (unmanaged) 45:15,041] (heat-config) [DEBUG] Error: unable to find a resource/clone/master/group: openstack-ceilometer-alarm-evaluator 45:15,041] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-script/e55975c3-8ad7-430e-bf52-34a46b23474a. [1] 45:15,045] (heat-config) [INFO] Completed /var/lib/heat-config/hooks/script [stack@cctg-aci-cloud-host1 ~]$ heat resource-list a5a8f420-86e9-4e81-8c11-4226917d5c77 WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead +---------------+--------------------------------------+------------------------------+-----------------+---------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +---------------+--------------------------------------+------------------------------+-----------------+---------------------+ | 1 | e2b66dc0-ddd4-4401-96d1-e6fe22a563f9 | OS::Heat::SoftwareDeployment | CREATE_COMPLETE | 2016-10-05T19:14:59 | | 2 | dfc72353-aa10-4f13-b7a0-e9531d6fae59 | OS::Heat::SoftwareDeployment | CREATE_COMPLETE | 2016-10-05T19:14:59 | | 0 | fa29ee25-457c-44ff-89ba-29c9b82c17fa | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-10-05T20:09:48 | +---------------+--------------------------------------+------------------------------+-----------------+---------------------+ [stack@cctg-aci-cloud-host1 ~]$ heat deployment-show fa29ee25-457c-44ff-89ba-29c9b82c17fa WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "FAILED", "server_id": "e393527a-042b-4cac-ae3f-660595551211", "config_id": "9d949012-129a-4372-964c-3a54e18fd3ed", "output_values": { "deploy_stdout": " Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] (unmanaged)\n", "deploy_stderr": "Error: unable to find a resource/clone/master/group: openstack-ceilometer-alarm-evaluator\n", "deploy_status_code": 1 }, "creation_time": "2016-10-05T20:09:48", "updated_time": "2016-10-05T20:10:41", "input_values": {}, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "fa29ee25-457c-44ff-89ba-29c9b82c17fa" //notes hacked update script in /var/lib/heat-config/heat-config-script/ to comment out alarm-eval lines. this allowed the script to run locally on the controller. Prob not persistent and only on 1 controllre so did on undercloud in /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh Update for that portion now completed successfully
seeing similar behavior for the keystone portion of the upgrade. The script seems to try to stop keystone in pcs but it doesn't fully stop. This was the same thing that happened for ceilometer which lead to the problem. [root@overcloud-controller-0 heat-admin]# pcs status |grep -A3 keystone Clone Set: openstack-keystone-clone [openstack-keystone] (unmanaged) openstack-keystone (systemd:openstack-keystone): (target-role:Stopped) Started overcloud-controller-1 (unmanaged) openstack-keystone (systemd:openstack-keystone): (target-role:Stopped) Started overcloud-controller-0 (unmanaged) openstack-keystone (systemd:openstack-keystone): (target-role:Stopped) Started overcloud-controller-2 (unmanaged)
I am seeing the same error in our training environment. The heat templates put the whole cluster in maintenance (the unmanaged state): ./extraconfig/tasks/pacemaker_maintenance_mode.sh: pcs property set maintenance-mode=true But then later try to disable a couple of ceilometer resources: /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh if pcs status | grep openstack-ceilometer-alarm; then # Disable pacemaker resources for ceilometer-alarms pcs resource disable openstack-ceilometer-alarm-evaluator check_resource openstack-ceilometer-alarm-evaluator stopped 600 pcs resource delete openstack-ceilometer-alarm-evaluator pcs resource disable openstack-ceilometer-alarm-notifier check_resource openstack-ceilometer-alarm-notifier stopped 600 pcs resource delete openstack-ceilometer-alarm-notifier My workaround was to disable those ceilometer resources manually before running the upgrade so this validation check passed: [stack@director ~]$ ssh heat-admin@control0 "sudo pcs resource disable openstack-ceilometer-alarm-evaluator" [stack@director ~]$ ssh heat-admin@control0 "sudo pcs resource disable openstack-ceilometer-alarm-notifier" This resulted in a successful upgrade. I feel like this is working around some issue in the upgrade logic IMO, since you can't disable a resource if its unmanaged.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Hi, I was able to upgrade successfully aodh. So closing this one. If you still have the issue, feel free to re-open it. Regards,