| Summary: | Pacemaker constantly fails at documented step 10.4 upgrading the overcloud | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Andreas Karis <akaris> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Jiri Stransky <jstransk> |
| Status: | CLOSED NOTABUG | QA Contact: | Arik Chernetsky <achernet> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.0 (Liberty) | CC: | aschultz, mburns, mcornea, michele, rhel-osp-director-maint, slinaber |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | 8.0 (Liberty) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-10-03 15:31:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Andreas Karis
2016-04-19 21:17:22 UTC
Running into this in a virtual lab as well https://bugzilla.redhat.com/show_bug.cgi?id=1328621 10.4. Upgrading the Overcloud - Red Hat Customer Portal Important If the Overcloud stack failed during this step, log into one of your Controller nodes, run sudo pcs cluster start, then rerun openstack overcloud deploy on the director. Still in a constant loop: update kills cluster -> manual restart of cluster -> update kills cluster -> ... Contrary to what's documented, restarting pcsd on one node only doesn't help: [stack@undercloud ~]$ heat resource-list -n5 overcloud | grep -i failed heat deployment-output-show | UpdateWorkflow | 5105adcb-562d-4702-ae28-f6cb25f021c5 | OS::TripleO::Tasks::UpdateWorkflow | UPDATE_FAILED | 2016-04-20T22:03:08 | overcloud | | ControllerPacemakerUpgradeDeployment_Step1 | b87ec833-35a1-412f-8c66-385d46cb67ae | OS::Heat::SoftwareDeploymentGroup | UPDATE_FAILED | 2016-04-20T22:03:13 | overcloud-UpdateWorkflow-bfb6swfof4jd | | 0 | d767a806-c96a-4ede-b18f-0f60c7c60099 | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-04-20T22:03:15 | overcloud-UpdateWorkflow-bfb6swfof4jd-ControllerPacemakerUpgradeDeployment_Step1-7mmjddgrjqvc | | 2 | 821b5f05-ca81-4a4e-9524-a486dd53f0ad | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-04-20T22:03:17 | overcloud-UpdateWorkflow-bfb6swfof4jd-ControllerPacemakerUpgradeDeployment_Step1-7mmjddgrjqvc | | 1 | 6195bdaa-3414-429c-9f44-92a1b97c94b3 | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-04-20T22:03:24 | overcloud-UpdateWorkflow-bfb6swfof4jd-ControllerPacemakerUpgradeDeployment_Step1-7mmjddgrjqvc | [stack@undercloud ~]$ heat deployment-output-show 6195bdaa-3414-429c-9f44-92a1b97c94b3 --all { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show 821b5f05-ca81-4a4e-9524-a486dd53f0ad --all heat d{ "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show d767a806-c96a-4ede-b18f-0f60c7c60099 --all { "deploy_stdout": "OFFLINE: [ overcloud-controller-1 overcloud-controller-2 ]\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } So I restarted it on all 3 controllers, with the following result: [stack@undercloud ~]$ heat deployment-output-show 7018206d-af70-4469-ab76-83ccfab56c33 --all { "deploy_stdout": "httpd has stopped\n Clone Set: openstack-keystone-clone [openstack-keystone]\nopenstack-keystone has stopped\nredis has stopped\nmongod has stopped\nrabbitmq has stopped\nmemcached has stopped\ngalera has stopped\novercloud-controller-2: Stopping Cluster (pacemaker)...\novercloud-controller-1: Stopping Cluster (pacemaker)...\novercloud-controller-0: Stopping Cluster (pacemaker)...\novercloud-controller-1: Stopping Cluster (corosync)...\novercloud-controller-0: Stopping Cluster (corosync)...\novercloud-controller-2: Stopping Cluster (corosync)...\ninactive\nLoaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is registered to Red Hat Subscription Management, but is not receiving updates. You can use subscription-manager to assign subscriptions.\n", "deploy_stderr": "There are no enabled repos.\n Run \"yum repolist all\" to see the repos you have.\n You can enable repos with yum-config-manager --enable <repo>\n", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show aa435d18-1d75-44a5-8a0d-2bfbe8b69ef4 --all { "deploy_stdout": "active\nactive\nactive\nactive\nactive\nactive\ninactive\nLoaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is registered to Red Hat Subscription Management, but is not receiving updates. You can use subscription-manager to assign subscriptions.\n", "deploy_stderr": "There are no enabled repos.\n Run \"yum repolist all\" to see the repos you have.\n You can enable repos with yum-config-manager --enable <repo>\n", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show 2358656c-c3fc-4633-b947-61375e4f177d --all { "deploy_stdout": "active\nactive\nactive\ninactive\nLoaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is registered to Red Hat Subscription Management, but is not receiving updates. You can use subscription-manager to assign subscriptions.\n", "deploy_stderr": "There are no enabled repos.\n Run \"yum repolist all\" to see the repos you have.\n You can enable repos with yum-config-manager --enable <repo>\n", "deploy_status_code": 1 } fixed like this [root@overcloud-controller-0 ~]# subscription-manager repos --enable=rhel-7-server-openstack-8-rpms --enable=rhel-7-server-openstack-8-director-rpms Error: rhel-7-server-openstack-8-rpms is not a valid repository ID. Use --list option to see valid repositories. Error: rhel-7-server-openstack-8-director-rpms is not a valid repository ID. Use --list option to see valid repositories. [root@overcloud-controller-0 ~]# subscription-manager attach --pool=8a85f9814d368e7d014d44509b5e4eef Successfully attached a subscription for: Employee SKU [root@overcloud-controller-0 ~]# subscription-manager repos --enable=rhel-7-server-openstack-8-rpms --enable=rhel-7-server-openstack-8-director-rpms Repository 'rhel-7-server-openstack-8-director-rpms' is enabled for this system. Repository 'rhel-7-server-openstack-8-rpms' is enabled for this system. [root@overcloud-controller-0 ~]# which then leads again to [stack@undercloud ~]$ heat deployment-output-show 2eb51934-ea2c-425b-9e7f-8902ec10ea2d --all { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show 5c78bf87-575d-4dac-888a-8f92e0c3f0a5 --all { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show aec131cd-6cee-4e02-b5f3-13e9c0785808 --all { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ ===> controller0, pcs cluster start as suggested in doc and new upgarde deployment [stack@undercloud ~]$ heat deployment-output-show fb5b093c-0b80-4e40-8315-8f9bcf959187 --all { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show 45a711f1-fd27-45fd-9284-e03b1d6f5f36 --all { "deploy_stdout": "OFFLINE: [ overcloud-controller-1 overcloud-controller-2 ]\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show d134deac-35d8-416f-969c-add72812a876 --all { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } Next update run: [stack@undercloud ~]$ heat deployment-output-show e2ddc8c5-af63-425e-840b-e760bd413681 --all { "deploy_stdout": "Error: cluster is not currently running on this node\nERROR: upgrade cannot start with some cluster nodes being offline\n", "deploy_stderr": "", "deploy_status_code": 1 } [stack@undercloud ~]$ heat deployment-output-show 71512745-2e0b-4e39-9ede-fd49535449fe --all null [stack@undercloud ~]$ heat deployment-output-show 9181865e-9542-43b7-9065-e7e5e90f6760 --all null [stack@undercloud ~]$ And yet again pcs is not running any more on any of the nodes. This is a 3 controller, 2 compute node scenario Hi Andreas, apologies for the late answer here. Can you still reproduce this with OSP8 updated? The link at c#4 is a 404. Do I presume correctly that we are talking about: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html-single/upgrading_red_hat_openstack_platform/#sect-Major-Upgrading_the_Overcloud-Controller (3.4.4) So you ran: The deploy command with major-upgrade-pacemaker-init.yaml environment file. Then upgrade-non-controller.sh on each Object Storage node. Then with major-upgrade-pacemaker.yaml environment file, it fails. Do you have an env where this happen that we can take a look at? Note that it seems that you have not configured any repos on the overcloud? Hi, "2016-04-21" ... I think this happened somewhere in a lab, no case attached. I am closing this as not a bug :-) - Andreas |