Hide Forgot
osp-director-10: PCS cluster down before the Controller upgraded from Osp9 to OSP10 (Error:"cannot start with some cluster nodes being offline".). Environment: ------------ instack-undercloud-5.0.0-0.20160818065636.41ef775.el7ost.noarch instack-5.0.0-0.20160802165724.5aabf5c.el7ost.noarch openstack-heat-api-cfn-7.0.0-0.20160823082523.1106458.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-33.el7ost.noarch openstack-heat-templates-0.0.1-0.20160822094546.1ac2823.el7ost.noarch python-heat-tests-7.0.0-0.20160823082523.1106458.el7ost.noarch openstack-heat-engine-7.0.0-0.20160823082523.1106458.el7ost.noarch puppet-heat-9.1.0-0.20160815142726.d364553.el7ost.noarch python-heatclient-1.3.0-0.20160802194627.44dfe53.el7ost.noarch openstack-heat-common-7.0.0-0.20160823082523.1106458.el7ost.noarch openstack-heat-api-7.0.0-0.20160823082523.1106458.el7ost.noarch heat-cfntools-1.3.0-2.el7ost.noarch openstack-tripleo-heat-templates-5.0.0-0.20160823140311.072404b.el7ost.noarch Steps ------ (1) Attempt to follow the guide to upgrade from osp9 to osp10 https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade#controller-and-block-storage-upgrade (2)After successful run of the init command run openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml Results: --------- Upgrade controller fail due to PCS cluster down Upgrade View: -------------- 2016-08-24 14:02:44 [UpgradeInitConfig]: DELETE_COMPLETE state changed 2016-08-24 14:02:44 [0]: UPDATE_IN_PROGRESS state changed 2016-08-24 14:02:45 [1]: UPDATE_IN_PROGRESS state changed 2016-08-24 14:02:45 [BlockStorageAllNodesDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:02:45 [CephStorageAllNodesDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:02:45 [ObjectStorageAllNodesDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:02:46 [ObjectStorageAllNodesValidationDeployment]: UPDATE_IN_PROGRESS state changed 2016-08-24 14:02:47 [BlockStorageAllNodesValidationDeployment]: UPDATE_IN_PROGRESS state changed 2016-08-24 14:02:48 [CephStorageAllNodesValidationDeployment]: UPDATE_IN_PROGRESS state changed 2016-08-24 14:02:51 [BlockStorageAllNodesValidationDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:02:51 [ObjectStorageAllNodesValidationDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:02:52 [CephStorageAllNodesValidationDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:03:17 [0]: SIGNAL_IN_PROGRESS Signal: deployment babdeb4c-b008-497d-a8aa-537061741749 succeeded 2016-08-24 14:03:17 [0]: UPDATE_COMPLETE state changed 2016-08-24 14:03:17 [overcloud-ComputeAllNodesDeployment-ula6xjjbqnuc]: UPDATE_COMPLETE Stack UPDATE completed successfully 2016-08-24 14:03:18 [ComputeAllNodesDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:03:19 [ComputeAllNodesValidationDeployment]: UPDATE_IN_PROGRESS state changed 2016-08-24 14:03:19 [overcloud-ComputeAllNodesValidationDeployment-mdmfmjmag7gg]: UPDATE_IN_PROGRESS Stack UPDATE started 2016-08-24 14:03:19 [overcloud-ComputeAllNodesValidationDeployment-mdmfmjmag7gg]: UPDATE_COMPLETE Stack UPDATE completed successfully 2016-08-24 14:03:20 [ComputeAllNodesValidationDeployment]: UPDATE_COMPLETE state changed 2016-08-24 14:03:43 [0]: SIGNAL_IN_PROGRESS Signal: deployment c7145966-5644-4fe5-9727-d0a5685d0597 failed (1) 2016-08-24 14:03:43 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-08-24 14:03:46 [2]: SIGNAL_IN_PROGRESS Signal: deployment b374b565-d6cb-409a-8f28-d3d060ea2c31 failed (1) 2016-08-24 14:03:47 [2]: CREATE_FAILED Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-08-24 14:03:51 [2]: SIGNAL_IN_PROGRESS Signal: deployment 6b404c48-1450-4625-a9c5-a90a8e879ebe succeeded 2016-08-24 14:03:51 [2]: UPDATE_COMPLETE state changed 2016-08-24 14:04:04 [1]: SIGNAL_IN_PROGRESS Signal: deployment bd80f208-f0d4-4f34-8d65-57f4210f516d failed (1) 2016-08-24 14:04:05 [1]: CREATE_FAILED Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-08-24 14:04:05 [overcloud-UpdateWorkflow-57rzzvytb7mc-ControllerPacemakerUpgradeDeployment_Step1-gugu2kls6k5m]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-08-24 14:04:05 [ControllerPacemakerUpgradeDeployment_Step1]: CREATE_FAILED Error: resources.ControllerPacemakerUpgradeDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-08-24 14:04:05 [overcloud-UpdateWorkflow-57rzzvytb7mc]: UPDATE_FAILED Error: resources.ControllerPacemakerUpgradeDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-08-24 14:04:06 [UpdateWorkflow]: UPDATE_FAILED resources.UpdateWorkflow: Error: resources.ControllerPacemakerUpgradeDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-08-24 14:04:06 [ControllerAllNodesDeployment]: UPDATE_FAILED UPDATE aborted 2016-08-24 14:04:06 [overcloud]: UPDATE_FAILED resources.UpdateWorkflow: Error: resources.ControllerPacemakerUpgradeDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 2016-08-24 14:04:07 [0]: UPDATE_FAILED UPDATE aborted 2016-08-24 14:04:07 [1]: UPDATE_FAILED UPDATE aborted 2016-08-24 14:04:08 [overcloud-ControllerAllNodesDeployment-lqutwxlcrcoi]: UPDATE_FAILED Operation cancelled 2016-08-24 14:04:11 [1]: SIGNAL_FAILED Signal: deployment c737033f-20f4-490e-b679-1ce9b2837bf7 succeeded Stack overcloud UPDATE_FAILED Heat Stack update failed. (reverse-i-search)`re': openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/sha^C/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml [stack@undercloud72 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+---------------+---------------------+---------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+---------------+---------------------+---------------------+ | 59ba3729-b247-4600-83b7-df119ce96542 | overcloud | UPDATE_FAILED | 2016-08-23T17:34:16 | 2016-08-24T13:58:36 | +--------------------------------------+------------+---------------+---------------------+---------------------+ [stack@undercloud72 ~]$ heat resource-list overcloud -n5 | grep -v COMPLETE WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead +--------------------------------------------+-----------------------------------------------+-----------------------------------------------------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +--------------------------------------------+-----------------------------------------------+-----------------------------------------------------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------------------------------------+ | UpdateWorkflow | 6c447028-cb03-4044-acfd-b7b6f3ccc6f4 | OS::TripleO::Tasks::UpdateWorkflow | UPDATE_FAILED | 2016-08-24T14:02:24 | overcloud | | 0 | c7145966-5644-4fe5-9727-d0a5685d0597 | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-08-24T14:02:35 | overcloud-UpdateWorkflow-57rzzvytb7mc-ControllerPacemakerUpgradeDeployment_Step1-gugu2kls6k5m | | 1 | bd80f208-f0d4-4f34-8d65-57f4210f516d | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-08-24T14:02:35 | overcloud-UpdateWorkflow-57rzzvytb7mc-ControllerPacemakerUpgradeDeployment_Step1-gugu2kls6k5m | | ControllerPacemakerUpgradeDeployment_Step1 | 5ee12238-fb71-4616-9219-ca7e5271171e | OS::Heat::SoftwareDeploymentGroup | CREATE_FAILED | 2016-08-24T14:02:35 | overcloud-UpdateWorkflow-57rzzvytb7mc | | 2 | b374b565-d6cb-409a-8f28-d3d060ea2c31 | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-08-24T14:02:36 | overcloud-UpdateWorkflow-57rzzvytb7mc-ControllerPacemakerUpgradeDeployment_Step1-gugu2kls6k5m | | ControllerAllNodesDeployment | 7ec5ffd8-906d-43fe-adce-451a8490327e | OS::Heat::StructuredDeployments | UPDATE_FAILED | 2016-08-24T14:02:39 | overcloud | | 0 | 799c1464-b61b-4737-85ef-0803ac07fb39 | OS::Heat::StructuredDeployment | UPDATE_FAILED | 2016-08-24T14:02:43 | overcloud-ControllerAllNodesDeployment-lqutwxlcrcoi | | 1 | c737033f-20f4-490e-b679-1ce9b2837bf7 | OS::Heat::StructuredDeployment | UPDATE_FAILED | 2016-08-24T14:02:44 | overcloud-ControllerAllNodesDeployment-lqutwxlcrcoi | +--------------------------------------------+-----------------------------------------------+-----------------------------------------------------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------------------------------------+ [stack@undercloud72 ~]$ heat deployment-show b374b565-d6cb-409a-8f28-d3d060ea2c31 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "FAILED", "server_id": "7996e49b-f08f-4ef6-8969-39354cbb5c40", "config_id": "68a54dc4-a401-424c-aae5-aaf0e2268edc", "output_values": { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "+ cluster_sync_timeout=1800\n+ check_cluster\n+ pcs status\n+ grep -E '(cluster is not currently running)|(OFFLINE:)'\n+ echo_error 'ERROR: upgrade cannot start with some cluster nodes being offline'\n+ echo 'ERROR: upgrade cannot start with some cluster nodes being offline'\n+ tee /dev/fd2\n+ exit 1\n", "deploy_status_code": 1 }, "creation_time": "2016-08-24T14:02:44", "updated_time": "2016-08-24T14:03:46", "input_values": { "update_identifier": "", "deploy_identifier": "1472047107" }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "b374b565-d6cb-409a-8f28-d3d060ea2c31" } [stack@undercloud72 ~]$ [stack@undercloud72 ~]$ [stack@undercloud72 ~]$ [stack@undercloud72 ~]$ [stack@undercloud72 ~]$ [stack@undercloud72 ~]$ nova list +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ | fe7570d7-91ad-431a-bfcb-8786ae7ead4e | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.7 | | 1c1f6c46-1836-4e31-bf35-871e3589f6f0 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.9 | | 1e958f1d-7697-4433-945e-82c1f4cc18e2 | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.0.8 | | 7996e49b-f08f-4ef6-8969-39354cbb5c40 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.0.10 | +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ [stack@undercloud72 ~]$ ssh heat-admin.0.9 The authenticity of host '192.168.0.9 (192.168.0.9)' can't be established. ECDSA key fingerprint is 57:64:0f:07:33:c4:d9:ae:3d:7c:1b:45:5b:68:39:55. Are you sure you want to continue connecting (yes/no)? yes [heat-admin@overcloud-controller-0 ~]$ sudo su - Last login: Fri Sep 2 14:06:52 UTC 2016 on pts/0 [root@overcloud-controller-0 ~]# pcs status Error: cluster is not currently running on this node
After running : pcs cluster start (on all controllers) : 2016-08-24 15:30:55 [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully Stack overcloud UPDATE_COMPLETE Overcloud Endpoint: http://10.19.184.210:5000/v2.0 Overcloud Deployed
I went past this stage today. Are you sure that the problem was not local to you deployment ?
(In reply to Sofer Athlan-Guyot from comment #3) > I went past this stage today. Are you sure that the problem was not local > to you deployment ? hi Sofer, it's contently reproduced on my BM , I've opened a followup Bz for failed resources of gnocchi service, occurs just after I'm starting the pcs cluster manually : https://bugzilla.redhat.com/show_bug.cgi?id=1374531
could have happened because I called the file /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml in my deployment command, while my setup didn't include ceph nodes. When I used ceph nodes in my deployment the issue didn't occur , same for : https://bugzilla.redhat.com/show_bug.cgi?id=1374531
Ok given comment #4 I think we can close this as not a bug for now (environmental, incorrect environment files specified)? Please reopen if you disagree.
(In reply to marios from comment #6) > Ok given comment #4 I think we can close this as not a bug for now > (environmental, incorrect environment files specified)? Please reopen if you > disagree. Sorry, I meant comment #5, where Omri explains inclusion of the storage-environment file
closed, no need for needinfo.