Upgrade from OSP8 -> OSP9 fails during the convergence step. All pcs services to become in the cluster become unmanaged. Environment: openstack-tripleo-heat-templates-kilo-2.0.0-15.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-15.el7ost.noarch python-heat-tests-6.0.0-7.el7ost.noarch openstack-heat-common-6.0.0-7.el7ost.noarch openstack-heat-engine-6.0.0-7.el7ost.noarch python-heatclient-1.2.0-1.el7ost.noarch openstack-heat-api-cfn-6.0.0-7.el7ost.noarch heat-cfntools-1.3.0-2.el7ost.noarch openstack-heat-templates-0-0.3.96a0b0bgit.el7ost.noarch openstack-tripleo-heat-templates-2.0.0-15.el7ost.noarch openstack-heat-api-6.0.0-7.el7ost.noarch python-keystonemiddleware-4.4.1-1.el7ost.noarch python-keystone-tests-9.0.2-1.el7ost.noarch python-keystoneauth1-2.4.1-1.el7ost.noarch openstack-keystone-9.0.2-1.el7ost.noarch python-keystoneclient-2.3.1-2.el7ost.noarch python-keystone-9.0.2-1.el7ost.noarch instack-undercloud-4.0.0-7.el7ost.noarch instack-0.0.8-3.el7ost.noarch Description: Upgrade from OSP8 -> OSP9 fails during the convergence step. All services to become in the cluster become unmanaged. 1. Deploy with: openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --swift-storage-scale 1 --block-storage-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml --ceph-storage-scale 1 Note: with standalone cinder and swift ^^^ 2. Upgrade undercloud 3. Successfully get through all steps from upgrade document until last step 4. Attempt to do the CONVERGE step -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml 2016-07-15 15:01:30 [0]: SIGNAL_IN_PROGRESS Signal: deployment failed (1) 2016-07-15 15:01:30 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-07-15 15:01:34 [0]: SIGNAL_COMPLETE Unknown 2016-07-15 15:01:35 [0]: SIGNAL_COMPLETE Unknown 2016-07-15 15:01:36 [0]: SIGNAL_COMPLETE Unknown 2016-07-15 15:01:38 [0]: SIGNAL_COMPLETE Unknown 2016-07-15 15:01:39 [ControllerDeployment]: SIGNAL_COMPLETE Unknown 2016-07-15 15:01:39 [2]: SIGNAL_IN_PROGRESS Signal: deployment succeeded 2016-07-15 15:01:40 [2]: CREATE_COMPLETE state changed 2016-07-15 15:01:43 [2]: SIGNAL_COMPLETE Unknown Stack overcloud UPDATE_FAILED Deployment failed: Heat Stack update failed. 5. Checked pcs resource and found them all unmanaged [heat-admin@overcloud-controller-0 ~]$ sudo pcs status | grep -i stopped -B2 [heat-admin@overcloud-controller-0 ~]$ sudo pcs status | grep -i unman -B2 Full list of resources: ip-192.0.2.6 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (unmanaged) Clone Set: haproxy-clone [haproxy] (unmanaged) haproxy (systemd:haproxy): Started overcloud-controller-2 (unmanaged) haproxy (systemd:haproxy): Started overcloud-controller-0 (unmanaged) haproxy (systemd:haproxy): Started overcloud-controller-1 (unmanaged) ip-192.168.200.180 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 (unmanaged) ip-192.168.100.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (unmanaged) ip-192.168.110.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (unmanaged) ip-192.168.100.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 (unmanaged) ip-192.168.120.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (unmanaged) Master/Slave Set: redis-master [redis] (unmanaged) redis (ocf::heartbeat:redis): Master overcloud-controller-2 (unmanaged) redis (ocf::heartbeat:redis): Started overcloud-controller-0 (unmanaged) redis (ocf::heartbeat:redis): Started overcloud-controller-1 (unmanaged) Master/Slave Set: galera-master [galera] (unmanaged) galera (ocf::heartbeat:galera): Master overcloud-controller-2 (unmanaged) galera (ocf::heartbeat:galera): Master overcloud-controller-0 (unmanaged) galera (ocf::heartbeat:galera): Master overcloud-controller-1 (unmanaged) Clone Set: mongod-clone [mongod] (unmanaged) mongod (systemd:mongod): Started overcloud-controller-2 (unmanaged) mongod (systemd:mongod): Started overcloud-controller-0 (unmanaged) mongod (systemd:mongod): Started overcloud-controller-1 (unmanaged) Clone Set: rabbitmq-clone [rabbitmq] (unmanaged) rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-2 (unmanaged) rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-0 (unmanaged) rabbitmq (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-1 (unmanaged) Clone Set: memcached-clone [memcached] (unmanaged) memcached (systemd:memcached): Started overcloud-controller-2 (unmanaged) memcached (systemd:memcached): Started overcloud-controller-0 (unmanaged) memcached (systemd:memcached): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] (unmanaged) openstack-nova-scheduler (systemd:openstack-nova-scheduler): Started overcloud-controller-2 (unmanaged) openstack-nova-scheduler (systemd:openstack-nova-scheduler): Started overcloud-controller-0 (unmanaged) openstack-nova-scheduler (systemd:openstack-nova-scheduler): Started overcloud-controller-1 (unmanaged) Clone Set: neutron-l3-agent-clone [neutron-l3-agent] (unmanaged) neutron-l3-agent (systemd:neutron-l3-agent): Started overcloud-controller-2 (unmanaged) neutron-l3-agent (systemd:neutron-l3-agent): Started overcloud-controller-0 (unmanaged) neutron-l3-agent (systemd:neutron-l3-agent): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-heat-engine-clone [openstack-heat-engine] (unmanaged) openstack-heat-engine (systemd:openstack-heat-engine): Started overcloud-controller-2 (unmanaged) openstack-heat-engine (systemd:openstack-heat-engine): Started overcloud-controller-0 (unmanaged) openstack-heat-engine (systemd:openstack-heat-engine): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] (unmanaged) openstack-ceilometer-api (systemd:openstack-ceilometer-api): Started overcloud-controller-2 (unmanaged) openstack-ceilometer-api (systemd:openstack-ceilometer-api): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-api (systemd:openstack-ceilometer-api): Started overcloud-controller-1 (unmanaged) Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] (unmanaged) neutron-metadata-agent (systemd:neutron-metadata-agent): Started overcloud-controller-2 (unmanaged) neutron-metadata-agent (systemd:neutron-metadata-agent): Started overcloud-controller-0 (unmanaged) neutron-metadata-agent (systemd:neutron-metadata-agent): Started overcloud-controller-1 (unmanaged) Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] (unmanaged) neutron-ovs-cleanup (ocf::neutron:OVSCleanup): Started overcloud-controller-2 (unmanaged) neutron-ovs-cleanup (ocf::neutron:OVSCleanup): Started overcloud-controller-0 (unmanaged) neutron-ovs-cleanup (ocf::neutron:OVSCleanup): Started overcloud-controller-1 (unmanaged) Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] (unmanaged) neutron-netns-cleanup (ocf::neutron:NetnsCleanup): Started overcloud-controller-2 (unmanaged) neutron-netns-cleanup (ocf::neutron:NetnsCleanup): Started overcloud-controller-0 (unmanaged) neutron-netns-cleanup (ocf::neutron:NetnsCleanup): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-heat-api-clone [openstack-heat-api] (unmanaged) openstack-heat-api (systemd:openstack-heat-api): Started overcloud-controller-2 (unmanaged) openstack-heat-api (systemd:openstack-heat-api): Started overcloud-controller-0 (unmanaged) openstack-heat-api (systemd:openstack-heat-api): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] (unmanaged) openstack-cinder-scheduler (systemd:openstack-cinder-scheduler): Started overcloud-controller-2 (unmanaged) openstack-cinder-scheduler (systemd:openstack-cinder-scheduler): Started overcloud-controller-0 (unmanaged) openstack-cinder-scheduler (systemd:openstack-cinder-scheduler): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-nova-api-clone [openstack-nova-api] (unmanaged) openstack-nova-api (systemd:openstack-nova-api): Started overcloud-controller-2 (unmanaged) openstack-nova-api (systemd:openstack-nova-api): Started overcloud-controller-0 (unmanaged) openstack-nova-api (systemd:openstack-nova-api): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] (unmanaged) openstack-heat-api-cloudwatch (systemd:openstack-heat-api-cloudwatch): Started overcloud-controller-2 (unmanaged) openstack-heat-api-cloudwatch (systemd:openstack-heat-api-cloudwatch): Started overcloud-controller-0 (unmanaged) openstack-heat-api-cloudwatch (systemd:openstack-heat-api-cloudwatch): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] (unmanaged) openstack-ceilometer-collector (systemd:openstack-ceilometer-collector): Started overcloud-controller-2 (unmanaged) openstack-ceilometer-collector (systemd:openstack-ceilometer-collector): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-collector (systemd:openstack-ceilometer-collector): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] (unmanaged) openstack-nova-consoleauth (systemd:openstack-nova-consoleauth): Started overcloud-controller-2 (unmanaged) openstack-nova-consoleauth (systemd:openstack-nova-consoleauth): Started overcloud-controller-0 (unmanaged) openstack-nova-consoleauth (systemd:openstack-nova-consoleauth): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-glance-registry-clone [openstack-glance-registry] (unmanaged) openstack-glance-registry (systemd:openstack-glance-registry): Started overcloud-controller-2 (unmanaged) openstack-glance-registry (systemd:openstack-glance-registry): Started overcloud-controller-0 (unmanaged) openstack-glance-registry (systemd:openstack-glance-registry): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] (unmanaged) openstack-ceilometer-notification (systemd:openstack-ceilometer-notification): Started overcloud-controller-2 (unmanaged) openstack-ceilometer-notification (systemd:openstack-ceilometer-notification): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-notification (systemd:openstack-ceilometer-notification): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-cinder-api-clone [openstack-cinder-api] (unmanaged) openstack-cinder-api (systemd:openstack-cinder-api): Started overcloud-controller-2 (unmanaged) openstack-cinder-api (systemd:openstack-cinder-api): Started overcloud-controller-0 (unmanaged) openstack-cinder-api (systemd:openstack-cinder-api): Started overcloud-controller-1 (unmanaged) Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] (unmanaged) neutron-dhcp-agent (systemd:neutron-dhcp-agent): Started overcloud-controller-2 (unmanaged) neutron-dhcp-agent (systemd:neutron-dhcp-agent): Started overcloud-controller-0 (unmanaged) neutron-dhcp-agent (systemd:neutron-dhcp-agent): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-glance-api-clone [openstack-glance-api] (unmanaged) openstack-glance-api (systemd:openstack-glance-api): Started overcloud-controller-2 (unmanaged) openstack-glance-api (systemd:openstack-glance-api): Started overcloud-controller-0 (unmanaged) openstack-glance-api (systemd:openstack-glance-api): Started overcloud-controller-1 (unmanaged) Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] (unmanaged) neutron-openvswitch-agent (systemd:neutron-openvswitch-agent): Started overcloud-controller-2 (unmanaged) neutron-openvswitch-agent (systemd:neutron-openvswitch-agent): Started overcloud-controller-0 (unmanaged) neutron-openvswitch-agent (systemd:neutron-openvswitch-agent): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] (unmanaged) openstack-nova-novncproxy (systemd:openstack-nova-novncproxy): Started overcloud-controller-2 (unmanaged) openstack-nova-novncproxy (systemd:openstack-nova-novncproxy): Started overcloud-controller-0 (unmanaged) openstack-nova-novncproxy (systemd:openstack-nova-novncproxy): Started overcloud-controller-1 (unmanaged) Clone Set: delay-clone [delay] (unmanaged) delay (ocf::heartbeat:Delay): Started overcloud-controller-2 (unmanaged) delay (ocf::heartbeat:Delay): Started overcloud-controller-0 (unmanaged) delay (ocf::heartbeat:Delay): Started overcloud-controller-1 (unmanaged) Clone Set: neutron-server-clone [neutron-server] (unmanaged) neutron-server (systemd:neutron-server): Started overcloud-controller-2 (unmanaged) neutron-server (systemd:neutron-server): Started overcloud-controller-0 (unmanaged) neutron-server (systemd:neutron-server): Started overcloud-controller-1 (unmanaged) Clone Set: httpd-clone [httpd] (unmanaged) httpd (systemd:httpd): Started overcloud-controller-2 (unmanaged) httpd (systemd:httpd): Started overcloud-controller-0 (unmanaged) httpd (systemd:httpd): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] (unmanaged) openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started overcloud-controller-2 (unmanaged) openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started overcloud-controller-0 (unmanaged) openstack-ceilometer-central (systemd:openstack-ceilometer-central): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] (unmanaged) openstack-heat-api-cfn (systemd:openstack-heat-api-cfn): Started overcloud-controller-2 (unmanaged) openstack-heat-api-cfn (systemd:openstack-heat-api-cfn): Started overcloud-controller-0 (unmanaged) openstack-heat-api-cfn (systemd:openstack-heat-api-cfn): Started overcloud-controller-1 (unmanaged) openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0 (unmanaged) Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] (unmanaged) openstack-nova-conductor (systemd:openstack-nova-conductor): Started overcloud-controller-2 (unmanaged) openstack-nova-conductor (systemd:openstack-nova-conductor): Started overcloud-controller-0 (unmanaged) openstack-nova-conductor (systemd:openstack-nova-conductor): Started overcloud-controller-1 (unmanaged) my-stonith-xvm-controller0 (stonith:fence_xvm): Started overcloud-controller-1 (unmanaged) my-stonith-xvm-controller1 (stonith:fence_xvm): Started overcloud-controller-1 (unmanaged) my-stonith-xvm-controller2 (stonith:fence_xvm): Started overcloud-controller-0 (unmanaged) Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener] (unmanaged) openstack-aodh-listener (systemd:openstack-aodh-listener): Started overcloud-controller-2 (unmanaged) openstack-aodh-listener (systemd:openstack-aodh-listener): Started overcloud-controller-0 (unmanaged) openstack-aodh-listener (systemd:openstack-aodh-listener): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier] (unmanaged) openstack-aodh-notifier (systemd:openstack-aodh-notifier): Started overcloud-controller-2 (unmanaged) openstack-aodh-notifier (systemd:openstack-aodh-notifier): Started overcloud-controller-0 (unmanaged) openstack-aodh-notifier (systemd:openstack-aodh-notifier): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator] (unmanaged) openstack-aodh-evaluator (systemd:openstack-aodh-evaluator): Started overcloud-controller-2 (unmanaged) openstack-aodh-evaluator (systemd:openstack-aodh-evaluator): Started overcloud-controller-0 (unmanaged) openstack-aodh-evaluator (systemd:openstack-aodh-evaluator): Started overcloud-controller-1 (unmanaged) Clone Set: openstack-core-clone [openstack-core] (unmanaged) openstack-core (ocf::heartbeat:Dummy): Started overcloud-controller-2 (unmanaged) openstack-core (ocf::heartbeat:Dummy): Started overcloud-controller-0 (unmanaged) openstack-core (ocf::heartbeat:Dummy): Started overcloud-controller-1 (unmanaged)
I wasn't able to reproduce this in my last run of upgrade to completion, the converge succeeded for me. I only saw stopped gnocchi services after the converge (which might have been bug 1338954 maybe). Otherwise all seemed ok. Can you please post the list of failed resources in Heat (via `heat resource-list -n5 | grep -vi complete`) and os-collect-config log from the node(s) where the failure happened?
We looked into the issue with Mike. The failure reason is in puppet: Error: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api The cause is that packages didn't get updated to Mitaka versions: [root@overcloud-controller-0 ~]# rpm -q openstack-puppet-modules openstack-puppet-modules-7.0.17-1.el7ost.noarch There was probably a workflow issue in the upgrade init step or converge was run too early, as there were no OSP 9 repos present: [root@overcloud-controller-0 ~]# ls /etc/yum.repos.d redhat.repo rhos-release-8-director.repo rhos-release-8.repo rhos-release.repo rhos-release-rhel-7.2.repo
I have his this same issue again on 9.0 GA candidate (2016-08-18.1) so I am re-opening it. Upgrade Failed: This error as well as cluster in "unmanaged state" "Error: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api at /var/lib/heat-config/heat-config-puppet" [root@overcloud-controller-0 ~]# rpm -qa | grep openstack-puppet-modules openstack-puppet-modules-7.0.17-1.el7ost.noarch | [root@overcloud-controller-0 ~]# ls /etc/yum.repos.d/ | redhat.repo rhos-release-8-director.repo rhos-release-8.repo rhos-release-9-director.repo rhos-release-9.repo rhos-release.repo rhos-release-rhel-7.2.repo please see attached sosreport [stack@instack ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+---------------+---------------------+---------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+---------------+---------------------+---------------------+ | 0ce536ea-2b17-494e-bee2-bebae9fec808 | overcloud | UPDATE_FAILED | 2016-08-19T16:10:52 | 2016-08-19T19:47:12 | +--------------------------------------+------------+---------------+---------------------+---------------------+ [stack@instack ~]$ heat resource-list overcloud -n5 | grep -v COMPLETE WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead +----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ | ControllerNodesPostDeployment | 5642b3e7-3b8d-4c2c-b914-62512613cdd2 | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2016-08-19T19:53:12 | overcloud | | ControllerServicesBaseDeployment_Step2 | 5ff114f9-874c-4d49-974d-da7cb353d5d1 | OS::Heat::StructuredDeployments | CREATE_FAILED | 2016-08-19T19:53:14 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r | | 0 | a104456c-e032-452d-b23a-96f31dcc5dc9 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2016-08-19T19:56:42 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r-ControllerServicesBaseDeployment_Step2-odzw46576x2k | | 1 | 8942d662-cae2-499c-b5db-5bfb35bfc799 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2016-08-19T19:56:42 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r-ControllerServicesBaseDeployment_Step2-odzw46576x2k | | 2 | 1477254e-3456-45ff-8c89-4e34a87d90db | OS::Heat::StructuredDeployment | CREATE_FAILED | 2016-08-19T19:56:42 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r-ControllerServicesBaseDeployment_Step2-odzw46576x2k | +----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ [stack@instack ~]$ heat deployment-list | grep FAILED WARNING (shell) "heat deployment-list" is deprecated, please use "openstack software deployment list" instead | 8942d662-cae2-499c-b5db-5bfb35bfc799 | ef915feb-9ef5-4741-a7d5-78e3c99cd73b | 1cd0fb36-1ba4-4b23-afb4-73c6f5735754 | CREATE | FAILED | 2016-08-19T19:56:44 | deploy_status_code : Deployment exited with non-zero status code: 6 | | a104456c-e032-452d-b23a-96f31dcc5dc9 | 86a087d7-f257-4367-8367-85d64fb7fa2b | b96ccfbd-eed3-4cde-9a6d-cf9ee9f9d458 | CREATE | FAILED | 2016-08-19T19:56:46 | deploy_status_code : Deployment exited with non-zero status code: 1 | | 1477254e-3456-45ff-8c89-4e34a87d90db | 37a0045f-4ddc-43cc-a19e-3601ef5eb9b5 | 9175c1b4-3832-4427-adfb-c7177997e893 | CREATE | FAILED | 2016-08-19T19:56:48 | deploy_status_code : Deployment exited with non-zero status code: 6 | [stack@instack ~]$ heat deployment-show 37a0045f-4ddc-43cc-a19e-3601ef5eb9b5 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead Deployment not found: 37a0045f-4ddc-43cc-a19e-3601ef5eb9b5 [stack@instack ~]$ heat deployment-show 86a087d7-f257-4367-8367-85d64fb7fa2b WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead Deployment not found: 86a087d7-f257-4367-8367-85d64fb7fa2b [stack@instack ~]$ heat deployment-show ef915feb-9ef5-4741-a7d5-78e3c99cd73b WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead Deployment not found: ef915feb-9ef5-4741-a7d5-78e3c99cd73b [stack@instack ~]$ heat deployment-show 1477254e-3456-45ff-8c89-4e34a87d90db WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "FAILED", "server_id": "9175c1b4-3832-4427-adfb-c7177997e893", "config_id": "37a0045f-4ddc-43cc-a19e-3601ef5eb9b5", "output_values": { "deploy_stdout": "\u001b[mNotice: Compiled catalog for overcloud-controller-2.localdomain in environment production in 8.99 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Package_manifest[/var/lib/tripleo/installed-packages/overcloud_controller_pacemaker2]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Exec[create-root-sysconfig-clustercheck]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: added entity client.openstack auth auth(auid = 18446744073709551615 key=AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: + ceph --name mon. --keyring /var/lib/ceph/mon/ceph-overcloud-controller-2/keyring auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: Error EINVAL: entity client.openstack exists but key does not match\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[enable-not-start-tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[Set password for hacluster user on tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Pacemaker has reported quorum achieved\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Notify[pacemaker settled]/message: defined 'message' as 'Pacemaker has reported quorum achieved'\u001b[0m\n\u001b[mNotice: Finished catalog run in 10.17 seconds\u001b[0m\n", "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-2/keyring' auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-2/keyring' auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n", "deploy_status_code": 6 }, "creation_time": "2016-08-19T19:56:48", "updated_time": "2016-08-19T19:58:10", "input_values": { "step": 2, "update_identifier": { "deployment_identifier": 1471636026, "controller_config": { "1": "os-apply-config deployment 53288ed1-3332-4074-b3da-623eb709e727 completed,Root CA cert injection not enabled.,TLS not enabled.,None,", "0": "os-apply-config deployment 15c5ad8b-499f-4fe6-ba6c-1c394b1e3f11 completed,Root CA cert injection not enabled.,TLS not enabled.,None,", "2": "os-apply-config deployment 702d69cf-9226-4921-a07f-7d5dad818405 completed,Root CA cert injection not enabled.,TLS not enabled.,None," }, "allnodes_extra": "none" } }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6", "id": "1477254e-3456-45ff-8c89-4e34a87d90db" } [stack@instack ~]$ heat deployment-show 8942d662-cae2-499c-b5db-5bfb35bfc799 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "FAILED", "server_id": "1cd0fb36-1ba4-4b23-afb4-73c6f5735754", "config_id": "ef915feb-9ef5-4741-a7d5-78e3c99cd73b", "output_values": { "deploy_stdout": "\u001b[mNotice: Compiled catalog for overcloud-controller-1.localdomain in environment production in 9.75 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Package_manifest[/var/lib/tripleo/installed-packages/overcloud_controller_pacemaker2]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Exec[create-root-sysconfig-clustercheck]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: added entity client.openstack auth auth(auid = 18446744073709551615 key=AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: + ceph --name mon. --keyring /var/lib/ceph/mon/ceph-overcloud-controller-1/keyring auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: Error EINVAL: entity client.openstack exists but key does not match\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[enable-not-start-tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[Set password for hacluster user on tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Pacemaker has reported quorum achieved\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Notify[pacemaker settled]/message: defined 'message' as 'Pacemaker has reported quorum achieved'\u001b[0m\n\u001b[mNotice: Finished catalog run in 9.78 seconds\u001b[0m\n", "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-1/keyring' auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-1/keyring' auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n", "deploy_status_code": 6 }, "creation_time": "2016-08-19T19:56:44", "updated_time": "2016-08-19T19:58:07", "input_values": { "step": 2, "update_identifier": { "deployment_identifier": 1471636026, "controller_config": { "1": "os-apply-config deployment 53288ed1-3332-4074-b3da-623eb709e727 completed,Root CA cert injection not enabled.,TLS not enabled.,None,", "0": "os-apply-config deployment 15c5ad8b-499f-4fe6-ba6c-1c394b1e3f11 completed,Root CA cert injection not enabled.,TLS not enabled.,None,", "2": "os-apply-config deployment 702d69cf-9226-4921-a07f-7d5dad818405 completed,Root CA cert injection not enabled.,TLS not enabled.,None," }, "allnodes_extra": "none" } }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6", "id": "8942d662-cae2-499c-b5db-5bfb35bfc799" } [stack@instack ~]$ [stack@instack ~]$ heat deployment-show a104456c-e032-452d-b23a-96f31dcc5dc9 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "FAILED", "server_id": "b96ccfbd-eed3-4cde-9a6d-cf9ee9f9d458", "config_id": "86a087d7-f257-4367-8367-85d64fb7fa2b", "output_values": { "deploy_stdout": "", "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mError: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api at /var/lib/heat-config/heat-config-puppet/86a087d7-f257-4367-8367-85d64fb7fa2b.pp:524 on node overcloud-controller-0.localdomain\nWrapped exception:\nCould not find declared class ::nova::db::mysql_api\u001b[0m\n\u001b[1;31mError: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api at /var/lib/heat-config/heat-config-puppet/86a087d7-f257-4367-8367-85d64fb7fa2b.pp:524 on node overcloud-controller-0.localdomain\u001b[0m\n", "deploy_status_code": 1 }, "creation_time": "2016-08-19T19:56:46", "updated_time": "2016-08-19T19:58:02", "input_values": { "step": 2, "update_identifier": { "deployment_identifier": 1471636026, "controller_config": { "1": "os-apply-config deployment 53288ed1-3332-4074-b3da-623eb709e727 completed,Root CA cert injection not enabled.,TLS not enabled.,None,", "0": "os-apply-config deployment 15c5ad8b-499f-4fe6-ba6c-1c394b1e3f11 completed,Root CA cert injection not enabled.,TLS not enabled.,None,", "2": "os-apply-config deployment 702d69cf-9226-4921-a07f-7d5dad818405 completed,Root CA cert injection not enabled.,TLS not enabled.,None," }, "allnodes_extra": "none" } }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "a104456c-e032-452d-b23a-96f31dcc5dc9" } [stack@instack ~]$ heat deployment-show 5ff114f9-874c-4d49-974d-da7cb353d5d1 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead Deployment not found: 5ff114f9-874c-4d49-974d-da7cb353d5d1 [stack@instack ~]$ [stack@instack ~]$ heat deployment-show 5642b3e7-3b8d-4c2c-b914-62512613cdd2 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead Deployment not found: 5642b3e7-3b8d-4c2c-b914-62512613cdd2
sosreport can be found here http://rhos-release.virt.bos.redhat.com/log/bz1357112
The journal has been already rotated for os-collect-config when i logged into the env, but i tried to reconstruct what happened and we may have an issue indeed, probably in the mariadb upgrade code. The converge failed because we ran it on an unupgraded cloud, and the upgrade failed probably because the mariadb upgrade logic tried to trigger itself, and we don't have /root/.my.cnf present. The solution may be that on an update from mariadb 5.5.47 to 5.5.50 we maybe don't want the mariadb dump/restore logic triggered at all? (needinfo'd bandini and dciabrin for confirmation) ---- Debugging info follows: Contents of /var/run/heat-config/deployed/e6c9fc41-a964-40ea-91f5-a2321ba979ef.notify.json: { "deploy_stdout": "mysql upgrade required: 1\n", "deploy_stderr": "Could not open required defaults file: /root/.my.cnf\nFatal error in defaults handling. Program aborted\n", "deploy_status_code": 1 } I pulled the definition of is_mysql_upgrade_needed function from that script and executed it, it echoes 1 to signify that upgrade is needed: [root@overcloud-controller-0 deployed]# set -x ++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0 /var/run/heat-config/deployed [root@overcloud-controller-0 deployed]# is_mysql_upgrade_needed + is_mysql_upgrade_needed + local name=mariadb + local output + local ret + set +e ++ yum -q check-update mariadb + output=' mariadb.x86_64 1:5.5.50-1.el7_2 rhelosp-rhel-7.2-z' + ret=100 + set -e + '[' 100 -ne 100 ']' ++ rpm -q --qf '%{epoch}' mariadb + local currentepoch=1 ++ rpm -q --qf '%{version}' mariadb + local currentversion=5.5.47 ++ rpm -q --qf '%{release}' mariadb + local currentrelease=1.el7_2 ++ repoquery -a --pkgnarrow=updates --qf '%{epoch} %{version} %{release}\n' mariadb + local 'newoutput=1 5.5.50 1.el7_2' ++ awk '{ print $1 }' ++ echo '1 5.5.50 1.el7_2' + local newepoch=1 ++ echo '1 5.5.50 1.el7_2' ++ awk '{ print $2 }' + local newversion=5.5.50 ++ echo '1 5.5.50 1.el7_2' ++ awk '{ print $3 }' + local newrelease=1.el7_2 ++ python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5.47", None), ("1", "5.5.50", None)); print rc' + output=-1 + '[' -1 '!=' -1 ']' + echo 1 1 ++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0 /var/run/heat-config/deployed Michele/Damien, should we perhaps only look at the first two components of the version string when testing if mariadb upgrade is needed? (Just "5.5" instead of full "5.5.47".) [root@overcloud-controller-0 deployed]# python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5.47", None), ("1", "5.5.50", None)); print rc' -1 [root@overcloud-controller-0 deployed]# python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5", None), ("1", "5.5", None)); print rc' 0
(In reply to Jiri Stransky from comment #7) > The journal has been already rotated for os-collect-config when i logged > into the env, but i tried to reconstruct what happened and we may have an > issue indeed, probably in the mariadb upgrade code. > > The converge failed because we ran it on an unupgraded cloud, and the > upgrade failed probably because the mariadb upgrade logic tried to trigger > itself, and we don't have /root/.my.cnf present. The solution may be that on > an update from mariadb 5.5.47 to 5.5.50 we maybe don't want the mariadb > dump/restore logic triggered at all? (needinfo'd bandini and dciabrin for > confirmation) > > ---- > > Debugging info follows: > > Contents of > /var/run/heat-config/deployed/e6c9fc41-a964-40ea-91f5-a2321ba979ef.notify. > json: > > { > "deploy_stdout": "mysql upgrade required: 1\n", > "deploy_stderr": "Could not open required defaults file: > /root/.my.cnf\nFatal error in defaults handling. Program aborted\n", > "deploy_status_code": 1 > } > > I pulled the definition of is_mysql_upgrade_needed function from that script > and executed it, it echoes 1 to signify that upgrade is needed: > > [root@overcloud-controller-0 deployed]# set -x > ++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0 > /var/run/heat-config/deployed > [root@overcloud-controller-0 deployed]# is_mysql_upgrade_needed > + is_mysql_upgrade_needed > + local name=mariadb > + local output > + local ret > + set +e > ++ yum -q check-update mariadb > + output=' > mariadb.x86_64 1:5.5.50-1.el7_2 > rhelosp-rhel-7.2-z' > + ret=100 > + set -e > + '[' 100 -ne 100 ']' > ++ rpm -q --qf '%{epoch}' mariadb > + local currentepoch=1 > ++ rpm -q --qf '%{version}' mariadb > + local currentversion=5.5.47 > ++ rpm -q --qf '%{release}' mariadb > + local currentrelease=1.el7_2 > ++ repoquery -a --pkgnarrow=updates --qf '%{epoch} %{version} %{release}\n' > mariadb > + local 'newoutput=1 5.5.50 1.el7_2' > ++ awk '{ print $1 }' > ++ echo '1 5.5.50 1.el7_2' > + local newepoch=1 > ++ echo '1 5.5.50 1.el7_2' > ++ awk '{ print $2 }' > + local newversion=5.5.50 > ++ echo '1 5.5.50 1.el7_2' > ++ awk '{ print $3 }' > + local newrelease=1.el7_2 > ++ python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5.47", None), ("1", > "5.5.50", None)); print rc' > + output=-1 > + '[' -1 '!=' -1 ']' > + echo 1 > 1 > ++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0 > /var/run/heat-config/deployed > > > Michele/Damien, should we perhaps only look at the first two components of > the version string when testing if mariadb upgrade is needed? (Just "5.5" > instead of full "5.5.47".) So the reason we went for comparing the full version instead of X.Y only is two-fold: 1) Given that we have little to no guarantees from upstream as to how the numbering scheme will work (10.1 vs 5.5.10 vs 10) we did not want to add a lot of boilerplate code that then might have become fragile 2) We did not really expect a minor upgrade only of mariadb (we expected major upgrades or only minor ones where only the release field changed) I guess since point 2) was clearly a wrong assumption on our part, we will need to add code to deal with the parsing of the version string. Note that as a workaround the operator can disable the automatic detection of the upgrade path via the MySqlMajorUpgrade set to 'no'. Now, having said this. Why is /root/.my.conf not present anyway? That means the starting cloud is not fully uptodate, no? (IIRC the fixes for the galera root password missing went out for RHOS8 already) > [root@overcloud-controller-0 deployed]# python -c 'import rpm; rc = > rpm.labelCompare(("1", "5.5.47", None), ("1", "5.5.50", None)); print rc' > -1 > > [root@overcloud-controller-0 deployed]# python -c 'import rpm; rc = > rpm.labelCompare(("1", "5.5", None), ("1", "5.5", None)); print rc' > 0
(In reply to Michele Baldessari from comment #8) > So the reason we went for comparing the full version instead of X.Y only is > two-fold: > 1) Given that we have little to no guarantees from upstream as to how the > numbering scheme will work (10.1 vs 5.5.10 vs 10) > we did not want to add a lot of boilerplate code that then might have > become fragile > 2) We did not really expect a minor upgrade only of mariadb (we expected > major upgrades or only minor ones where only the release field changed) > > I guess since point 2) was clearly a wrong assumption on our part, we will > need to add code to deal with the parsing of the version string. Ack, i'll propose a patch. > > Note that as a workaround the operator can disable the automatic detection > of the upgrade path via the MySqlMajorUpgrade set to 'no'. Thanks! > > Now, having said this. Why is /root/.my.conf not present anyway? That means > the starting cloud is not fully uptodate, no? (IIRC the fixes for > the galera root password missing went out for RHOS8 already) Yea this is key i think, and it may be the reason why the environment in question hit it, but i didn't hit it in my testing. (I didn't know /root/.my.conf got created only during OSP 8 lifecycle so it didn't occur to me the cause may be unupdated environment.) E.g. i can see: [root@overcloud-controller-0 ~]# rpm -q openstack-puppet-modules openstack-puppet-modules-7.0.17-1.el7ost.noarch Looking into brew, that package has been built in March, so the environment is not starting the upgrade with latest OSP 8 indeed. We should probably be either deploying the latest (not GA) OSP 8, or deploying OSP 8 and doing a minor update first before going forward with the major upgrade.
This review add a check for the presence of the /root/my.cnf and would avoid having the cluster in a unknown state if the operator has not updated the overcloud, see previous comment.
Thanks Sofer, however it would be a bit better to not trigger the mariadb dump/restore logic at all given that we update just from 5.5.47 to 5.5.50. (The dump/restore can take some time if the database is large.) Checking for only the first two significant parts of version string could hopefully be achieved with a small patch, i'm yet about to test it during upgrade though, so far i've just checked the oneliners alone: https://review.openstack.org/#/c/358755/
It could also fail if there's not enough free space....
(In reply to Thierry Vignaud from comment #13) > It could also fail if there's not enough free space.... There is a check for that in the scripts
Was able to verify https://review.openstack.org/#/c/358755/ Yum log contains: Aug 23 11:31:43 Updated: 1:mariadb-libs-5.5.50-1.el7_2.x86_64 Aug 23 11:31:46 Updated: 1:mariadb-5.5.50-1.el7_2.x86_64 And the software deployment output contains: "deploy_stdout": "mysql upgrade required: 0 (snipped away the rest)
Mitaka backport: https://review.openstack.org/#/c/359218/
Had the same issue during upgrade from 8puddle to 9puddle on Aug 19th
So just reformatting the info from comment #8 -- the workaround that we could try would be an environment file with: parameter_defaults: MySqlMajorUpgrade: 'no' passed as the last environment file during controller upgrade (the step where we pass major-upgrade-pacemaker.yaml).
I haven't had success with the workaround from comments #8 / #18. Upon investigating why, i noticed the code checks for 0 the script: https://github.com/openstack/tripleo-heat-templates/blob/6919263857284d505d3734217dc054f24b000f9d/extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh#L53 but i don't think we can actually pass MySqlMajorUpgrade: 0 because there's Heat parameter validation on that parmeter only allowing values yes/no/auto: https://github.com/openstack/tripleo-heat-templates/blob/072404b5693439b728d49d26c2c11ed69172a40d/extraconfig/tasks/major_upgrade_pacemaker.yaml#L23-L28 I'm not sure if this can be made to work, we probably need the proper fix (and another one, less urgent, for the manual control check).
Yea i tried with: MySqlMajorUpgrade: 0 and "no" without quotes in case a boolean would get converted to 0 later: MySqlMajorUpgrade: no but neither passes Heat parameter validation.
I copied this file /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml to /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-bz1357112.yaml and used it with my deployment step for the # controller step [stack@instack ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-bz1357112.yaml parameter_defaults: UpgradeLevelNovaCompute: liberty MySqlMajorUpgrade: 'no' resource_registry: OS::TripleO::Tasks::UpdateWorkflow: ../extraconfig/tasks/major_upgrade_pacemaker.yaml OS::TripleO::ControllerPostDeployment: OS::Heat::None OS::TripleO::ComputePostDeployment: OS::Heat::None OS::TripleO::ObjectStoragePostDeployment: OS::Heat::None OS::TripleO::BlockStoragePostDeployment: OS::Heat::None OS::TripleO::CephStoragePostDeployment: OS::Heat::None The step deployed successfully. There were no issues with unmanaged service, failed, or stopped. I was able to complete the final steps of upgrade and successfully launch an instance as well.
(In reply to mlammon from comment #21) > I copied this file > /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade- > pacemaker.yaml to > /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade- > pacemaker-bz1357112.yaml and used it with my deployment step for the # > controller step > > [stack@instack ~]$ cat > /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade- > pacemaker-bz1357112.yaml > parameter_defaults: > UpgradeLevelNovaCompute: liberty > MySqlMajorUpgrade: 'no' > > resource_registry: > OS::TripleO::Tasks::UpdateWorkflow: > ../extraconfig/tasks/major_upgrade_pacemaker.yaml > OS::TripleO::ControllerPostDeployment: OS::Heat::None > OS::TripleO::ComputePostDeployment: OS::Heat::None > OS::TripleO::ObjectStoragePostDeployment: OS::Heat::None > OS::TripleO::BlockStoragePostDeployment: OS::Heat::None > OS::TripleO::CephStoragePostDeployment: OS::Heat::None > > The step deployed successfully. There were no issues with unmanaged > service, failed, or stopped. I was able to complete the final steps of > upgrade and successfully launch an instance as well. I inspected the environment, and while the upgrade worked, the workaround didn't, it seems the mariadb dump/restore logic still got triggered. Attaching a .notify.json file for the software deployment of controller upgrade step 1, mainly the stderr part at the end makes it apparent that the mariadb related logic got triggered anyway. We probably don't have to be recommending the workaround, as it doesn't seem to do anything. I had a similar experience on my environment previously -- the workaround didn't work, but i didn't see anything obviously wrong with the environment after the controller upgrade. The impact of this may vary though, based on properties of individual environments (e.g. the size of the data stored in mariadb).
Created attachment 1193556 [details] controller-step1.notify.json
Merged to stable/mitaka, downstream backport submitted: https://code.engineering.redhat.com/gerrit/82472
Deployed 8 and upgraded to 9 latest without failure. There was not any sign off the database backup/restore so looks like we can move to verify now. [root@overcloud-controller-0 ~]# ls -l /var/tmp/mysql_upgrade_osp/openstack_database.sql ls: cannot access /var/tmp/mysql_upgrade_osp/openstack_database.sql: No such file or directory
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1918.html