Bug 1357112 - Upgrade from OSP8 -> OSP9 mariadb dump/restore triggered even when unnecessary
Summary: Upgrade from OSP8 -> OSP9 mariadb dump/restore triggered even when unnecessary
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 9.0 (Mitaka)
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: async
: 9.0 (Mitaka)
Assignee: Jiri Stransky
QA Contact: mlammon
URL: http://rhos-release.virt.bos.redhat.c...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-15 19:08 UTC by mlammon
Modified: 2019-12-16 06:07 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-2.0.0-34.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-21 16:07:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
controller-step1.notify.json (58.42 KB, text/plain)
2016-08-24 09:39 UTC, Jiri Stransky
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1615721 0 None None None 2016-08-22 16:22:18 UTC
OpenStack gerrit 357750 0 None None None 2016-08-22 16:36:31 UTC
OpenStack gerrit 358755 0 None None None 2016-08-23 08:54:25 UTC
OpenStack gerrit 359218 0 None None None 2016-08-23 14:06:29 UTC
Red Hat Product Errata RHBA-2016:1918 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 director Bug Fix Advisory 2016-09-21 20:06:11 UTC

Description mlammon 2016-07-15 19:08:47 UTC
Upgrade from OSP8 -> OSP9 fails during the convergence step. All pcs services to become in the cluster become unmanaged.


Environment:
openstack-tripleo-heat-templates-kilo-2.0.0-15.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-15.el7ost.noarch
python-heat-tests-6.0.0-7.el7ost.noarch
openstack-heat-common-6.0.0-7.el7ost.noarch
openstack-heat-engine-6.0.0-7.el7ost.noarch
python-heatclient-1.2.0-1.el7ost.noarch
openstack-heat-api-cfn-6.0.0-7.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch
openstack-heat-templates-0-0.3.96a0b0bgit.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-15.el7ost.noarch
openstack-heat-api-6.0.0-7.el7ost.noarch
python-keystonemiddleware-4.4.1-1.el7ost.noarch
python-keystone-tests-9.0.2-1.el7ost.noarch
python-keystoneauth1-2.4.1-1.el7ost.noarch
openstack-keystone-9.0.2-1.el7ost.noarch
python-keystoneclient-2.3.1-2.el7ost.noarch
python-keystone-9.0.2-1.el7ost.noarch
instack-undercloud-4.0.0-7.el7ost.noarch
instack-0.0.8-3.el7ost.noarch

Description:
Upgrade from OSP8 -> OSP9 fails during the convergence step. All services to become in the cluster become unmanaged.

1. Deploy with: 
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --swift-storage-scale 1 --block-storage-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml --ceph-storage-scale 1

Note: with standalone cinder and swift ^^^

2. Upgrade undercloud
3. Successfully get through all steps from upgrade document until last step
4. Attempt to do the CONVERGE step
 -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml


2016-07-15 15:01:30 [0]: SIGNAL_IN_PROGRESS Signal: deployment failed (1)
2016-07-15 15:01:30 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2016-07-15 15:01:34 [0]: SIGNAL_COMPLETE Unknown
2016-07-15 15:01:35 [0]: SIGNAL_COMPLETE Unknown
2016-07-15 15:01:36 [0]: SIGNAL_COMPLETE Unknown
2016-07-15 15:01:38 [0]: SIGNAL_COMPLETE Unknown
2016-07-15 15:01:39 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-07-15 15:01:39 [2]: SIGNAL_IN_PROGRESS Signal: deployment succeeded
2016-07-15 15:01:40 [2]: CREATE_COMPLETE state changed
2016-07-15 15:01:43 [2]: SIGNAL_COMPLETE Unknown
Stack overcloud UPDATE_FAILED
Deployment failed:  Heat Stack update failed.

5. Checked pcs resource and found them all unmanaged

[heat-admin@overcloud-controller-0 ~]$ sudo pcs status | grep -i stopped -B2
[heat-admin@overcloud-controller-0 ~]$ sudo pcs status | grep -i unman -B2
Full list of resources:

 ip-192.0.2.6   (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0 (unmanaged)
 Clone Set: haproxy-clone [haproxy] (unmanaged)
     haproxy    (systemd:haproxy):      Started overcloud-controller-2 (unmanaged)
     haproxy    (systemd:haproxy):      Started overcloud-controller-0 (unmanaged)
     haproxy    (systemd:haproxy):      Started overcloud-controller-1 (unmanaged)
 ip-192.168.200.180     (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1 (unmanaged)
 ip-192.168.100.10      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2 (unmanaged)
 ip-192.168.110.10      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0 (unmanaged)
 ip-192.168.100.11      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1 (unmanaged)
 ip-192.168.120.10      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2 (unmanaged)
 Master/Slave Set: redis-master [redis] (unmanaged)
     redis      (ocf::heartbeat:redis): Master overcloud-controller-2 (unmanaged)
     redis      (ocf::heartbeat:redis): Started overcloud-controller-0 (unmanaged)
     redis      (ocf::heartbeat:redis): Started overcloud-controller-1 (unmanaged)
 Master/Slave Set: galera-master [galera] (unmanaged)
     galera     (ocf::heartbeat:galera):        Master overcloud-controller-2 (unmanaged)
     galera     (ocf::heartbeat:galera):        Master overcloud-controller-0 (unmanaged)
     galera     (ocf::heartbeat:galera):        Master overcloud-controller-1 (unmanaged)
 Clone Set: mongod-clone [mongod] (unmanaged)
     mongod     (systemd:mongod):       Started overcloud-controller-2 (unmanaged)
     mongod     (systemd:mongod):       Started overcloud-controller-0 (unmanaged)
     mongod     (systemd:mongod):       Started overcloud-controller-1 (unmanaged)
 Clone Set: rabbitmq-clone [rabbitmq] (unmanaged)
     rabbitmq   (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-2 (unmanaged)
     rabbitmq   (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-0 (unmanaged)
     rabbitmq   (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-1 (unmanaged)
 Clone Set: memcached-clone [memcached] (unmanaged)
     memcached  (systemd:memcached):    Started overcloud-controller-2 (unmanaged)
     memcached  (systemd:memcached):    Started overcloud-controller-0 (unmanaged)
     memcached  (systemd:memcached):    Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] (unmanaged)
     openstack-nova-scheduler   (systemd:openstack-nova-scheduler):     Started overcloud-controller-2 (unmanaged)
     openstack-nova-scheduler   (systemd:openstack-nova-scheduler):     Started overcloud-controller-0 (unmanaged)
     openstack-nova-scheduler   (systemd:openstack-nova-scheduler):     Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-l3-agent-clone [neutron-l3-agent] (unmanaged)
     neutron-l3-agent   (systemd:neutron-l3-agent):     Started overcloud-controller-2 (unmanaged)
     neutron-l3-agent   (systemd:neutron-l3-agent):     Started overcloud-controller-0 (unmanaged)
     neutron-l3-agent   (systemd:neutron-l3-agent):     Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine] (unmanaged)
     openstack-heat-engine      (systemd:openstack-heat-engine):        Started overcloud-controller-2 (unmanaged)
     openstack-heat-engine      (systemd:openstack-heat-engine):        Started overcloud-controller-0 (unmanaged)
     openstack-heat-engine      (systemd:openstack-heat-engine):        Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] (unmanaged)
     openstack-ceilometer-api   (systemd:openstack-ceilometer-api):     Started overcloud-controller-2 (unmanaged)
     openstack-ceilometer-api   (systemd:openstack-ceilometer-api):     Started overcloud-controller-0 (unmanaged)
     openstack-ceilometer-api   (systemd:openstack-ceilometer-api):     Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] (unmanaged)
     neutron-metadata-agent     (systemd:neutron-metadata-agent):       Started overcloud-controller-2 (unmanaged)
     neutron-metadata-agent     (systemd:neutron-metadata-agent):       Started overcloud-controller-0 (unmanaged)
     neutron-metadata-agent     (systemd:neutron-metadata-agent):       Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] (unmanaged)
     neutron-ovs-cleanup        (ocf::neutron:OVSCleanup):      Started overcloud-controller-2 (unmanaged)
     neutron-ovs-cleanup        (ocf::neutron:OVSCleanup):      Started overcloud-controller-0 (unmanaged)
     neutron-ovs-cleanup        (ocf::neutron:OVSCleanup):      Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] (unmanaged)
     neutron-netns-cleanup      (ocf::neutron:NetnsCleanup):    Started overcloud-controller-2 (unmanaged)
     neutron-netns-cleanup      (ocf::neutron:NetnsCleanup):    Started overcloud-controller-0 (unmanaged)
     neutron-netns-cleanup      (ocf::neutron:NetnsCleanup):    Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-heat-api-clone [openstack-heat-api] (unmanaged)
     openstack-heat-api (systemd:openstack-heat-api):   Started overcloud-controller-2 (unmanaged)
     openstack-heat-api (systemd:openstack-heat-api):   Started overcloud-controller-0 (unmanaged)
     openstack-heat-api (systemd:openstack-heat-api):   Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] (unmanaged)
     openstack-cinder-scheduler (systemd:openstack-cinder-scheduler):   Started overcloud-controller-2 (unmanaged)
     openstack-cinder-scheduler (systemd:openstack-cinder-scheduler):   Started overcloud-controller-0 (unmanaged)
     openstack-cinder-scheduler (systemd:openstack-cinder-scheduler):   Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-nova-api-clone [openstack-nova-api] (unmanaged)
     openstack-nova-api (systemd:openstack-nova-api):   Started overcloud-controller-2 (unmanaged)
     openstack-nova-api (systemd:openstack-nova-api):   Started overcloud-controller-0 (unmanaged)
     openstack-nova-api (systemd:openstack-nova-api):   Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] (unmanaged)
     openstack-heat-api-cloudwatch      (systemd:openstack-heat-api-cloudwatch):        Started overcloud-controller-2 (unmanaged)
     openstack-heat-api-cloudwatch      (systemd:openstack-heat-api-cloudwatch):        Started overcloud-controller-0 (unmanaged)
     openstack-heat-api-cloudwatch      (systemd:openstack-heat-api-cloudwatch):        Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] (unmanaged)
     openstack-ceilometer-collector     (systemd:openstack-ceilometer-collector):       Started overcloud-controller-2 (unmanaged)
     openstack-ceilometer-collector     (systemd:openstack-ceilometer-collector):       Started overcloud-controller-0 (unmanaged)
     openstack-ceilometer-collector     (systemd:openstack-ceilometer-collector):       Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] (unmanaged)
     openstack-nova-consoleauth (systemd:openstack-nova-consoleauth):   Started overcloud-controller-2 (unmanaged)
     openstack-nova-consoleauth (systemd:openstack-nova-consoleauth):   Started overcloud-controller-0 (unmanaged)
     openstack-nova-consoleauth (systemd:openstack-nova-consoleauth):   Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry] (unmanaged)
     openstack-glance-registry  (systemd:openstack-glance-registry):    Started overcloud-controller-2 (unmanaged)
     openstack-glance-registry  (systemd:openstack-glance-registry):    Started overcloud-controller-0 (unmanaged)
     openstack-glance-registry  (systemd:openstack-glance-registry):    Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] (unmanaged)
     openstack-ceilometer-notification  (systemd:openstack-ceilometer-notification):    Started overcloud-controller-2 (unmanaged)
     openstack-ceilometer-notification  (systemd:openstack-ceilometer-notification):    Started overcloud-controller-0 (unmanaged)
     openstack-ceilometer-notification  (systemd:openstack-ceilometer-notification):    Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-cinder-api-clone [openstack-cinder-api] (unmanaged)
     openstack-cinder-api       (systemd:openstack-cinder-api): Started overcloud-controller-2 (unmanaged)
     openstack-cinder-api       (systemd:openstack-cinder-api): Started overcloud-controller-0 (unmanaged)
     openstack-cinder-api       (systemd:openstack-cinder-api): Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] (unmanaged)
     neutron-dhcp-agent (systemd:neutron-dhcp-agent):   Started overcloud-controller-2 (unmanaged)
     neutron-dhcp-agent (systemd:neutron-dhcp-agent):   Started overcloud-controller-0 (unmanaged)
     neutron-dhcp-agent (systemd:neutron-dhcp-agent):   Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-glance-api-clone [openstack-glance-api] (unmanaged)
     openstack-glance-api       (systemd:openstack-glance-api): Started overcloud-controller-2 (unmanaged)
     openstack-glance-api       (systemd:openstack-glance-api): Started overcloud-controller-0 (unmanaged)
     openstack-glance-api       (systemd:openstack-glance-api): Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] (unmanaged)
     neutron-openvswitch-agent  (systemd:neutron-openvswitch-agent):    Started overcloud-controller-2 (unmanaged)
     neutron-openvswitch-agent  (systemd:neutron-openvswitch-agent):    Started overcloud-controller-0 (unmanaged)
     neutron-openvswitch-agent  (systemd:neutron-openvswitch-agent):    Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] (unmanaged)
     openstack-nova-novncproxy  (systemd:openstack-nova-novncproxy):    Started overcloud-controller-2 (unmanaged)
     openstack-nova-novncproxy  (systemd:openstack-nova-novncproxy):    Started overcloud-controller-0 (unmanaged)
     openstack-nova-novncproxy  (systemd:openstack-nova-novncproxy):    Started overcloud-controller-1 (unmanaged)
 Clone Set: delay-clone [delay] (unmanaged)
     delay      (ocf::heartbeat:Delay): Started overcloud-controller-2 (unmanaged)
     delay      (ocf::heartbeat:Delay): Started overcloud-controller-0 (unmanaged)
     delay      (ocf::heartbeat:Delay): Started overcloud-controller-1 (unmanaged)
 Clone Set: neutron-server-clone [neutron-server] (unmanaged)
     neutron-server     (systemd:neutron-server):       Started overcloud-controller-2 (unmanaged)
     neutron-server     (systemd:neutron-server):       Started overcloud-controller-0 (unmanaged)
     neutron-server     (systemd:neutron-server):       Started overcloud-controller-1 (unmanaged)
 Clone Set: httpd-clone [httpd] (unmanaged)
     httpd      (systemd:httpd):        Started overcloud-controller-2 (unmanaged)
     httpd      (systemd:httpd):        Started overcloud-controller-0 (unmanaged)
     httpd      (systemd:httpd):        Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] (unmanaged)
     openstack-ceilometer-central       (systemd:openstack-ceilometer-central): Started overcloud-controller-2 (unmanaged)
     openstack-ceilometer-central       (systemd:openstack-ceilometer-central): Started overcloud-controller-0 (unmanaged)
     openstack-ceilometer-central       (systemd:openstack-ceilometer-central): Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] (unmanaged)
     openstack-heat-api-cfn     (systemd:openstack-heat-api-cfn):       Started overcloud-controller-2 (unmanaged)
     openstack-heat-api-cfn     (systemd:openstack-heat-api-cfn):       Started overcloud-controller-0 (unmanaged)
     openstack-heat-api-cfn     (systemd:openstack-heat-api-cfn):       Started overcloud-controller-1 (unmanaged)
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started overcloud-controller-0 (unmanaged)
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] (unmanaged)
     openstack-nova-conductor   (systemd:openstack-nova-conductor):     Started overcloud-controller-2 (unmanaged)
     openstack-nova-conductor   (systemd:openstack-nova-conductor):     Started overcloud-controller-0 (unmanaged)
     openstack-nova-conductor   (systemd:openstack-nova-conductor):     Started overcloud-controller-1 (unmanaged)
 my-stonith-xvm-controller0     (stonith:fence_xvm):    Started overcloud-controller-1 (unmanaged)
 my-stonith-xvm-controller1     (stonith:fence_xvm):    Started overcloud-controller-1 (unmanaged)
 my-stonith-xvm-controller2     (stonith:fence_xvm):    Started overcloud-controller-0 (unmanaged)
 Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener] (unmanaged)
     openstack-aodh-listener    (systemd:openstack-aodh-listener):      Started overcloud-controller-2 (unmanaged)
     openstack-aodh-listener    (systemd:openstack-aodh-listener):      Started overcloud-controller-0 (unmanaged)
     openstack-aodh-listener    (systemd:openstack-aodh-listener):      Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier] (unmanaged)
     openstack-aodh-notifier    (systemd:openstack-aodh-notifier):      Started overcloud-controller-2 (unmanaged)
     openstack-aodh-notifier    (systemd:openstack-aodh-notifier):      Started overcloud-controller-0 (unmanaged)
     openstack-aodh-notifier    (systemd:openstack-aodh-notifier):      Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator] (unmanaged)
     openstack-aodh-evaluator   (systemd:openstack-aodh-evaluator):     Started overcloud-controller-2 (unmanaged)
     openstack-aodh-evaluator   (systemd:openstack-aodh-evaluator):     Started overcloud-controller-0 (unmanaged)
     openstack-aodh-evaluator   (systemd:openstack-aodh-evaluator):     Started overcloud-controller-1 (unmanaged)
 Clone Set: openstack-core-clone [openstack-core] (unmanaged)
     openstack-core     (ocf::heartbeat:Dummy): Started overcloud-controller-2 (unmanaged)
     openstack-core     (ocf::heartbeat:Dummy): Started overcloud-controller-0 (unmanaged)
     openstack-core     (ocf::heartbeat:Dummy): Started overcloud-controller-1 (unmanaged)

Comment 2 Jiri Stransky 2016-07-22 14:43:34 UTC
I wasn't able to reproduce this in my last run of upgrade to completion, the converge succeeded for me. I only saw stopped gnocchi services after the converge (which might have been bug 1338954 maybe). Otherwise all seemed ok.

Can you please post the list of failed resources in Heat (via `heat resource-list -n5 | grep -vi complete`) and os-collect-config log from the node(s) where the failure happened?

Comment 3 Jiri Stransky 2016-07-25 14:55:00 UTC
We looked into the issue with Mike.

The failure reason is in puppet:

Error: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api

The cause is that packages didn't get updated to Mitaka versions:

[root@overcloud-controller-0 ~]# rpm -q openstack-puppet-modules
openstack-puppet-modules-7.0.17-1.el7ost.noarch

There was probably a workflow issue in the upgrade init step or converge was run too early, as there were no OSP 9 repos present:

[root@overcloud-controller-0 ~]# ls /etc/yum.repos.d
redhat.repo  rhos-release-8-director.repo  rhos-release-8.repo  rhos-release.repo  rhos-release-rhel-7.2.repo

Comment 5 mlammon 2016-08-19 21:17:12 UTC
I have his this same issue again on 9.0 GA candidate (2016-08-18.1) so I am re-opening it.   

Upgrade Failed:  This error as well as cluster in "unmanaged state"
"Error: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api at /var/lib/heat-config/heat-config-puppet"

[root@overcloud-controller-0 ~]# rpm -qa | grep openstack-puppet-modules
openstack-puppet-modules-7.0.17-1.el7ost.noarch                                                                                                                           |
[root@overcloud-controller-0 ~]# ls /etc/yum.repos.d/                                                                                                                     |
redhat.repo  rhos-release-8-director.repo  rhos-release-8.repo  rhos-release-9-director.repo  rhos-release-9.repo  rhos-release.repo  rhos-release-rhel-7.2.repo

please see attached sosreport


[stack@instack ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+---------------+---------------------+---------------------+
| id                                   | stack_name | stack_status  | creation_time       | updated_time        |
+--------------------------------------+------------+---------------+---------------------+---------------------+
| 0ce536ea-2b17-494e-bee2-bebae9fec808 | overcloud  | UPDATE_FAILED | 2016-08-19T16:10:52 | 2016-08-19T19:47:12 |
+--------------------------------------+------------+---------------+---------------------+---------------------+
[stack@instack ~]$ heat resource-list overcloud -n5 | grep -v COMPLETE
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
+----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| resource_name                                | physical_resource_id                          | resource_type                                                                   | resource_status | updated_time        | stack_name                                                                                                                                      |
+----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| ControllerNodesPostDeployment                | 5642b3e7-3b8d-4c2c-b914-62512613cdd2          | OS::TripleO::ControllerPostDeployment                                           | CREATE_FAILED   | 2016-08-19T19:53:12 | overcloud                                                                                                                                       |
| ControllerServicesBaseDeployment_Step2       | 5ff114f9-874c-4d49-974d-da7cb353d5d1          | OS::Heat::StructuredDeployments                                                 | CREATE_FAILED   | 2016-08-19T19:53:14 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r                                                                                            |
| 0                                            | a104456c-e032-452d-b23a-96f31dcc5dc9          | OS::Heat::StructuredDeployment                                                  | CREATE_FAILED   | 2016-08-19T19:56:42 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r-ControllerServicesBaseDeployment_Step2-odzw46576x2k                                        |
| 1                                            | 8942d662-cae2-499c-b5db-5bfb35bfc799          | OS::Heat::StructuredDeployment                                                  | CREATE_FAILED   | 2016-08-19T19:56:42 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r-ControllerServicesBaseDeployment_Step2-odzw46576x2k                                        |
| 2                                            | 1477254e-3456-45ff-8c89-4e34a87d90db          | OS::Heat::StructuredDeployment                                                  | CREATE_FAILED   | 2016-08-19T19:56:42 | overcloud-ControllerNodesPostDeployment-7h2fdufb275r-ControllerServicesBaseDeployment_Step2-odzw46576x2k                                        |
+----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
[stack@instack ~]$ heat deployment-list | grep FAILED
WARNING (shell) "heat deployment-list" is deprecated, please use "openstack software deployment list" instead
| 8942d662-cae2-499c-b5db-5bfb35bfc799 | ef915feb-9ef5-4741-a7d5-78e3c99cd73b | 1cd0fb36-1ba4-4b23-afb4-73c6f5735754 | CREATE | FAILED   | 2016-08-19T19:56:44 | deploy_status_code : Deployment exited with non-zero status code: 6 |
| a104456c-e032-452d-b23a-96f31dcc5dc9 | 86a087d7-f257-4367-8367-85d64fb7fa2b | b96ccfbd-eed3-4cde-9a6d-cf9ee9f9d458 | CREATE | FAILED   | 2016-08-19T19:56:46 | deploy_status_code : Deployment exited with non-zero status code: 1 |
| 1477254e-3456-45ff-8c89-4e34a87d90db | 37a0045f-4ddc-43cc-a19e-3601ef5eb9b5 | 9175c1b4-3832-4427-adfb-c7177997e893 | CREATE | FAILED   | 2016-08-19T19:56:48 | deploy_status_code : Deployment exited with non-zero status code: 6 |
[stack@instack ~]$ heat deployment-show 37a0045f-4ddc-43cc-a19e-3601ef5eb9b5
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
Deployment not found: 37a0045f-4ddc-43cc-a19e-3601ef5eb9b5
[stack@instack ~]$ heat deployment-show 86a087d7-f257-4367-8367-85d64fb7fa2b
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
Deployment not found: 86a087d7-f257-4367-8367-85d64fb7fa2b
[stack@instack ~]$ heat deployment-show ef915feb-9ef5-4741-a7d5-78e3c99cd73b
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
Deployment not found: ef915feb-9ef5-4741-a7d5-78e3c99cd73b
[stack@instack ~]$ heat deployment-show 1477254e-3456-45ff-8c89-4e34a87d90db
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED",
  "server_id": "9175c1b4-3832-4427-adfb-c7177997e893",
  "config_id": "37a0045f-4ddc-43cc-a19e-3601ef5eb9b5",
  "output_values": {
    "deploy_stdout": "\u001b[mNotice: Compiled catalog for overcloud-controller-2.localdomain in environment production in 8.99 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Package_manifest[/var/lib/tripleo/installed-packages/overcloud_controller_pacemaker2]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Exec[create-root-sysconfig-clustercheck]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: added entity client.openstack auth auth(auid = 18446744073709551615 key=AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: + ceph --name mon. --keyring /var/lib/ceph/mon/ceph-overcloud-controller-2/keyring auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: Error EINVAL: entity client.openstack exists but key does not match\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[enable-not-start-tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[Set password for hacluster user on tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Pacemaker has reported quorum achieved\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Notify[pacemaker settled]/message: defined 'message' as 'Pacemaker has reported quorum achieved'\u001b[0m\n\u001b[mNotice: Finished catalog run in 10.17 seconds\u001b[0m\n",
    "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-2/keyring'  auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-2/keyring'  auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n",
    "deploy_status_code": 6
  },
  "creation_time": "2016-08-19T19:56:48",
  "updated_time": "2016-08-19T19:58:10",
  "input_values": {
    "step": 2,
    "update_identifier": {
      "deployment_identifier": 1471636026,
      "controller_config": {
        "1": "os-apply-config deployment 53288ed1-3332-4074-b3da-623eb709e727 completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "0": "os-apply-config deployment 15c5ad8b-499f-4fe6-ba6c-1c394b1e3f11 completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "2": "os-apply-config deployment 702d69cf-9226-4921-a07f-7d5dad818405 completed,Root CA cert injection not enabled.,TLS not enabled.,None,"
      },
      "allnodes_extra": "none"
    }
  },
  "action": "CREATE",
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6",
  "id": "1477254e-3456-45ff-8c89-4e34a87d90db"
}
[stack@instack ~]$ heat deployment-show 8942d662-cae2-499c-b5db-5bfb35bfc799
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED",
  "server_id": "1cd0fb36-1ba4-4b23-afb4-73c6f5735754",
  "config_id": "ef915feb-9ef5-4741-a7d5-78e3c99cd73b",
  "output_values": {
    "deploy_stdout": "\u001b[mNotice: Compiled catalog for overcloud-controller-1.localdomain in environment production in 9.75 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Package_manifest[/var/lib/tripleo/installed-packages/overcloud_controller_pacemaker2]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Exec[create-root-sysconfig-clustercheck]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: added entity client.openstack auth auth(auid = 18446744073709551615 key=AQAfoaRXSAy/HxAAShHIViinopC2xtPW+RceQA== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: + ceph --name mon. --keyring /var/lib/ceph/mon/ceph-overcloud-controller-1/keyring auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: Error EINVAL: entity client.openstack exists but key does not match\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[enable-not-start-tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[Set password for hacluster user on tripleo_cluster]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Pacemaker has reported quorum achieved\u001b[0m\n\u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Notify[pacemaker settled]/message: defined 'message' as 'Pacemaker has reported quorum achieved'\u001b[0m\n\u001b[mNotice: Finished catalog run in 9.78 seconds\u001b[0m\n",
    "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-1/keyring'  auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-1/keyring'  auth add client.openstack --in-file=/etc/ceph/ceph.client.openstack.keyring returned 22 instead of one of [0]\u001b[0m\n",
    "deploy_status_code": 6
  },
  "creation_time": "2016-08-19T19:56:44",
  "updated_time": "2016-08-19T19:58:07",
  "input_values": {
    "step": 2,
    "update_identifier": {
      "deployment_identifier": 1471636026,
      "controller_config": {
        "1": "os-apply-config deployment 53288ed1-3332-4074-b3da-623eb709e727 completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "0": "os-apply-config deployment 15c5ad8b-499f-4fe6-ba6c-1c394b1e3f11 completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "2": "os-apply-config deployment 702d69cf-9226-4921-a07f-7d5dad818405 completed,Root CA cert injection not enabled.,TLS not enabled.,None,"
      },
      "allnodes_extra": "none"
    }
  },
  "action": "CREATE",
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6",
  "id": "8942d662-cae2-499c-b5db-5bfb35bfc799"
}
[stack@instack ~]$
[stack@instack ~]$ heat deployment-show a104456c-e032-452d-b23a-96f31dcc5dc9
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED",
  "server_id": "b96ccfbd-eed3-4cde-9a6d-cf9ee9f9d458",
  "config_id": "86a087d7-f257-4367-8367-85d64fb7fa2b",
  "output_values": {
    "deploy_stdout": "",
    "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mError: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api at /var/lib/heat-config/heat-config-puppet/86a087d7-f257-4367-8367-85d64fb7fa2b.pp:524 on node overcloud-controller-0.localdomain\nWrapped exception:\nCould not find declared class ::nova::db::mysql_api\u001b[0m\n\u001b[1;31mError: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::nova::db::mysql_api at /var/lib/heat-config/heat-config-puppet/86a087d7-f257-4367-8367-85d64fb7fa2b.pp:524 on node overcloud-controller-0.localdomain\u001b[0m\n",
    "deploy_status_code": 1
  },
  "creation_time": "2016-08-19T19:56:46",
  "updated_time": "2016-08-19T19:58:02",
  "input_values": {
    "step": 2,
    "update_identifier": {
      "deployment_identifier": 1471636026,
      "controller_config": {
        "1": "os-apply-config deployment 53288ed1-3332-4074-b3da-623eb709e727 completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "0": "os-apply-config deployment 15c5ad8b-499f-4fe6-ba6c-1c394b1e3f11 completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "2": "os-apply-config deployment 702d69cf-9226-4921-a07f-7d5dad818405 completed,Root CA cert injection not enabled.,TLS not enabled.,None,"
      },
      "allnodes_extra": "none"
    }
  },
  "action": "CREATE",
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1",
  "id": "a104456c-e032-452d-b23a-96f31dcc5dc9"
}
[stack@instack ~]$ heat deployment-show 5ff114f9-874c-4d49-974d-da7cb353d5d1
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
Deployment not found: 5ff114f9-874c-4d49-974d-da7cb353d5d1
[stack@instack ~]$
[stack@instack ~]$ heat deployment-show 5642b3e7-3b8d-4c2c-b914-62512613cdd2
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
Deployment not found: 5642b3e7-3b8d-4c2c-b914-62512613cdd2

Comment 6 mlammon 2016-08-19 21:18:37 UTC
sosreport can be found here
http://rhos-release.virt.bos.redhat.com/log/bz1357112

Comment 7 Jiri Stransky 2016-08-22 13:56:08 UTC
The journal has been already rotated for os-collect-config when i logged into the env, but i tried to reconstruct what happened and we may have an issue indeed, probably in the mariadb upgrade code.

The converge failed because we ran it on an unupgraded cloud, and the upgrade failed probably because the mariadb upgrade logic tried to trigger itself, and we don't have /root/.my.cnf present. The solution may be that on an update from mariadb 5.5.47 to 5.5.50 we maybe don't want the mariadb dump/restore logic triggered at all? (needinfo'd bandini and dciabrin for confirmation)

----

Debugging info follows:

Contents of /var/run/heat-config/deployed/e6c9fc41-a964-40ea-91f5-a2321ba979ef.notify.json:

{
  "deploy_stdout": "mysql upgrade required: 1\n", 
  "deploy_stderr": "Could not open required defaults file: /root/.my.cnf\nFatal error in defaults handling. Program aborted\n", 
  "deploy_status_code": 1
}

I pulled the definition of is_mysql_upgrade_needed function from that script and executed it, it echoes 1 to signify that upgrade is needed:

[root@overcloud-controller-0 deployed]# set -x
++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0 /var/run/heat-config/deployed
[root@overcloud-controller-0 deployed]# is_mysql_upgrade_needed 
+ is_mysql_upgrade_needed
+ local name=mariadb
+ local output
+ local ret
+ set +e
++ yum -q check-update mariadb
+ output='
mariadb.x86_64                1:5.5.50-1.el7_2                rhelosp-rhel-7.2-z'
+ ret=100
+ set -e
+ '[' 100 -ne 100 ']'
++ rpm -q --qf '%{epoch}' mariadb
+ local currentepoch=1
++ rpm -q --qf '%{version}' mariadb
+ local currentversion=5.5.47
++ rpm -q --qf '%{release}' mariadb
+ local currentrelease=1.el7_2
++ repoquery -a --pkgnarrow=updates --qf '%{epoch} %{version} %{release}\n' mariadb
+ local 'newoutput=1 5.5.50 1.el7_2'
++ awk '{ print $1 }'
++ echo '1 5.5.50 1.el7_2'
+ local newepoch=1
++ echo '1 5.5.50 1.el7_2'
++ awk '{ print $2 }'
+ local newversion=5.5.50
++ echo '1 5.5.50 1.el7_2'
++ awk '{ print $3 }'
+ local newrelease=1.el7_2
++ python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5.47", None), ("1", "5.5.50", None)); print rc'
+ output=-1
+ '[' -1 '!=' -1 ']'
+ echo 1
1
++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0 /var/run/heat-config/deployed


Michele/Damien, should we perhaps only look at the first two components of the version string when testing if mariadb upgrade is needed? (Just "5.5" instead of full "5.5.47".)


[root@overcloud-controller-0 deployed]# python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5.47", None), ("1", "5.5.50", None)); print rc'
-1

[root@overcloud-controller-0 deployed]# python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5", None), ("1", "5.5", None)); print rc'
0

Comment 8 Michele Baldessari 2016-08-22 15:16:06 UTC
(In reply to Jiri Stransky from comment #7)
> The journal has been already rotated for os-collect-config when i logged
> into the env, but i tried to reconstruct what happened and we may have an
> issue indeed, probably in the mariadb upgrade code.
> 
> The converge failed because we ran it on an unupgraded cloud, and the
> upgrade failed probably because the mariadb upgrade logic tried to trigger
> itself, and we don't have /root/.my.cnf present. The solution may be that on
> an update from mariadb 5.5.47 to 5.5.50 we maybe don't want the mariadb
> dump/restore logic triggered at all? (needinfo'd bandini and dciabrin for
> confirmation)
> 
> ----
> 
> Debugging info follows:
> 
> Contents of
> /var/run/heat-config/deployed/e6c9fc41-a964-40ea-91f5-a2321ba979ef.notify.
> json:
> 
> {
>   "deploy_stdout": "mysql upgrade required: 1\n", 
>   "deploy_stderr": "Could not open required defaults file:
> /root/.my.cnf\nFatal error in defaults handling. Program aborted\n", 
>   "deploy_status_code": 1
> }
> 
> I pulled the definition of is_mysql_upgrade_needed function from that script
> and executed it, it echoes 1 to signify that upgrade is needed:
> 
> [root@overcloud-controller-0 deployed]# set -x
> ++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0
> /var/run/heat-config/deployed
> [root@overcloud-controller-0 deployed]# is_mysql_upgrade_needed 
> + is_mysql_upgrade_needed
> + local name=mariadb
> + local output
> + local ret
> + set +e
> ++ yum -q check-update mariadb
> + output='
> mariadb.x86_64                1:5.5.50-1.el7_2               
> rhelosp-rhel-7.2-z'
> + ret=100
> + set -e
> + '[' 100 -ne 100 ']'
> ++ rpm -q --qf '%{epoch}' mariadb
> + local currentepoch=1
> ++ rpm -q --qf '%{version}' mariadb
> + local currentversion=5.5.47
> ++ rpm -q --qf '%{release}' mariadb
> + local currentrelease=1.el7_2
> ++ repoquery -a --pkgnarrow=updates --qf '%{epoch} %{version} %{release}\n'
> mariadb
> + local 'newoutput=1 5.5.50 1.el7_2'
> ++ awk '{ print $1 }'
> ++ echo '1 5.5.50 1.el7_2'
> + local newepoch=1
> ++ echo '1 5.5.50 1.el7_2'
> ++ awk '{ print $2 }'
> + local newversion=5.5.50
> ++ echo '1 5.5.50 1.el7_2'
> ++ awk '{ print $3 }'
> + local newrelease=1.el7_2
> ++ python -c 'import rpm; rc = rpm.labelCompare(("1", "5.5.47", None), ("1",
> "5.5.50", None)); print rc'
> + output=-1
> + '[' -1 '!=' -1 ']'
> + echo 1
> 1
> ++ printf '\033]0;%s@%s:%s\007' root overcloud-controller-0
> /var/run/heat-config/deployed
> 
> 
> Michele/Damien, should we perhaps only look at the first two components of
> the version string when testing if mariadb upgrade is needed? (Just "5.5"
> instead of full "5.5.47".)

So the reason we went for comparing the full version instead of X.Y only is two-fold:
1) Given that we have little to no guarantees from upstream as to how the numbering scheme will work (10.1 vs 5.5.10 vs 10)
   we did not want to add a lot of boilerplate code that then might have become fragile
2) We did not really expect a minor upgrade only of mariadb (we expected major upgrades or only minor ones where only the release field changed)

I guess since point 2) was clearly a wrong assumption on our part, we will need to add code to deal with the parsing of the version string.

Note that as a workaround the operator can disable the automatic detection of the upgrade path via the MySqlMajorUpgrade set to 'no'.

Now, having said this. Why is /root/.my.conf not present anyway? That means the starting cloud is not fully uptodate, no? (IIRC the fixes for
the galera root password missing went out for RHOS8 already)


> [root@overcloud-controller-0 deployed]# python -c 'import rpm; rc =
> rpm.labelCompare(("1", "5.5.47", None), ("1", "5.5.50", None)); print rc'
> -1
> 
> [root@overcloud-controller-0 deployed]# python -c 'import rpm; rc =
> rpm.labelCompare(("1", "5.5", None), ("1", "5.5", None)); print rc'
> 0

Comment 9 Jiri Stransky 2016-08-22 15:39:22 UTC
(In reply to Michele Baldessari from comment #8)
> So the reason we went for comparing the full version instead of X.Y only is
> two-fold:
> 1) Given that we have little to no guarantees from upstream as to how the
> numbering scheme will work (10.1 vs 5.5.10 vs 10)
>    we did not want to add a lot of boilerplate code that then might have
> become fragile
> 2) We did not really expect a minor upgrade only of mariadb (we expected
> major upgrades or only minor ones where only the release field changed)
> 
> I guess since point 2) was clearly a wrong assumption on our part, we will
> need to add code to deal with the parsing of the version string.

Ack, i'll propose a patch.

> 
> Note that as a workaround the operator can disable the automatic detection
> of the upgrade path via the MySqlMajorUpgrade set to 'no'.

Thanks!

> 
> Now, having said this. Why is /root/.my.conf not present anyway? That means
> the starting cloud is not fully uptodate, no? (IIRC the fixes for
> the galera root password missing went out for RHOS8 already)

Yea this is key i think, and it may be the reason why the environment in question hit it, but i didn't hit it in my testing. (I didn't know /root/.my.conf got created only during OSP 8 lifecycle so it didn't occur to me the cause may be unupdated environment.)

E.g. i can see:

[root@overcloud-controller-0 ~]# rpm -q openstack-puppet-modules
openstack-puppet-modules-7.0.17-1.el7ost.noarch

Looking into brew, that package has been built in March, so the environment is not starting the upgrade with latest OSP 8 indeed. We should probably be either deploying the latest (not GA) OSP 8, or deploying OSP 8 and doing a minor update first before going forward with the major upgrade.

Comment 10 Sofer Athlan-Guyot 2016-08-22 16:36:32 UTC
This review add a check for the presence of the /root/my.cnf and would avoid having the cluster in a unknown state if the operator has not updated the overcloud, see previous comment.

Comment 12 Jiri Stransky 2016-08-23 08:54:25 UTC
Thanks Sofer, however it would be a bit better to not trigger the mariadb dump/restore logic at all given that we update just from 5.5.47 to 5.5.50. (The dump/restore can take some time if the database is large.)

Checking for only the first two significant parts of version string could hopefully be achieved with a small patch, i'm yet about to test it during upgrade though, so far i've just checked the oneliners alone:

https://review.openstack.org/#/c/358755/

Comment 13 Thierry Vignaud 2016-08-23 09:04:09 UTC
It could also fail if there's not enough free space....

Comment 14 Michele Baldessari 2016-08-23 09:19:18 UTC
(In reply to Thierry Vignaud from comment #13)
> It could also fail if there's not enough free space....

There is a check for that in the scripts

Comment 15 Jiri Stransky 2016-08-23 11:56:28 UTC
Was able to verify https://review.openstack.org/#/c/358755/

Yum log contains:

Aug 23 11:31:43 Updated: 1:mariadb-libs-5.5.50-1.el7_2.x86_64
Aug 23 11:31:46 Updated: 1:mariadb-5.5.50-1.el7_2.x86_64

And the software deployment output contains:

  "deploy_stdout": "mysql upgrade required: 0 (snipped away the rest)

Comment 16 Jiri Stransky 2016-08-23 14:06:30 UTC
Mitaka backport: https://review.openstack.org/#/c/359218/

Comment 17 Dan Yasny 2016-08-23 14:50:05 UTC
Had the same issue during upgrade from 8puddle to 9puddle on Aug 19th

Comment 18 Jiri Stransky 2016-08-23 15:38:57 UTC
So just reformatting the info from comment #8 -- the workaround that we could try would be an environment file with:

parameter_defaults:
  MySqlMajorUpgrade: 'no'

passed as the last environment file during controller upgrade (the step where we pass major-upgrade-pacemaker.yaml).

Comment 19 Jiri Stransky 2016-08-23 17:01:52 UTC
I haven't had success with the workaround from comments #8 / #18.

Upon investigating why, i noticed the code checks for 0 the script:

https://github.com/openstack/tripleo-heat-templates/blob/6919263857284d505d3734217dc054f24b000f9d/extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh#L53

but i don't think we can actually pass

MySqlMajorUpgrade: 0

because there's Heat parameter validation on that parmeter only allowing values yes/no/auto:

https://github.com/openstack/tripleo-heat-templates/blob/072404b5693439b728d49d26c2c11ed69172a40d/extraconfig/tasks/major_upgrade_pacemaker.yaml#L23-L28

I'm not sure if this can be made to work, we probably need the proper fix (and another one, less urgent, for the manual control check).

Comment 20 Jiri Stransky 2016-08-23 17:17:01 UTC
Yea i tried with:

MySqlMajorUpgrade: 0

and "no" without quotes in case a boolean would get converted to 0 later:

MySqlMajorUpgrade: no

but neither passes Heat parameter validation.

Comment 21 mlammon 2016-08-23 23:01:02 UTC
I copied this file /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml to  /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-bz1357112.yaml and used it with my deployment step for the # controller step

[stack@instack ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-bz1357112.yaml
parameter_defaults:
  UpgradeLevelNovaCompute: liberty
  MySqlMajorUpgrade: 'no'

resource_registry:
  OS::TripleO::Tasks::UpdateWorkflow: ../extraconfig/tasks/major_upgrade_pacemaker.yaml
  OS::TripleO::ControllerPostDeployment: OS::Heat::None
  OS::TripleO::ComputePostDeployment: OS::Heat::None
  OS::TripleO::ObjectStoragePostDeployment: OS::Heat::None
  OS::TripleO::BlockStoragePostDeployment: OS::Heat::None
  OS::TripleO::CephStoragePostDeployment: OS::Heat::None

The step deployed successfully.  There were no issues with unmanaged service, failed, or stopped.  I was able to complete the final steps of upgrade and successfully launch an instance as well.

Comment 22 Jiri Stransky 2016-08-24 09:38:04 UTC
(In reply to mlammon from comment #21)
> I copied this file
> /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-
> pacemaker.yaml to 
> /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-
> pacemaker-bz1357112.yaml and used it with my deployment step for the #
> controller step
> 
> [stack@instack ~]$ cat
> /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-
> pacemaker-bz1357112.yaml
> parameter_defaults:
>   UpgradeLevelNovaCompute: liberty
>   MySqlMajorUpgrade: 'no'
> 
> resource_registry:
>   OS::TripleO::Tasks::UpdateWorkflow:
> ../extraconfig/tasks/major_upgrade_pacemaker.yaml
>   OS::TripleO::ControllerPostDeployment: OS::Heat::None
>   OS::TripleO::ComputePostDeployment: OS::Heat::None
>   OS::TripleO::ObjectStoragePostDeployment: OS::Heat::None
>   OS::TripleO::BlockStoragePostDeployment: OS::Heat::None
>   OS::TripleO::CephStoragePostDeployment: OS::Heat::None
> 
> The step deployed successfully.  There were no issues with unmanaged
> service, failed, or stopped.  I was able to complete the final steps of
> upgrade and successfully launch an instance as well.

I inspected the environment, and while the upgrade worked, the workaround didn't, it seems the mariadb dump/restore logic still got triggered.

Attaching a .notify.json file for the software deployment of controller upgrade step 1, mainly the stderr part at the end makes it apparent that the mariadb related logic got triggered anyway. We probably don't have to be recommending the workaround, as it doesn't seem to do anything.

I had a similar experience on my environment previously -- the workaround didn't work, but i didn't see anything obviously wrong with the environment after the controller upgrade. The impact of this may vary though, based on properties of individual environments (e.g. the size of the data stored in mariadb).

Comment 23 Jiri Stransky 2016-08-24 09:39:46 UTC
Created attachment 1193556 [details]
controller-step1.notify.json

Comment 24 Jiri Stransky 2016-08-24 11:17:06 UTC
Merged to stable/mitaka, downstream backport submitted:

https://code.engineering.redhat.com/gerrit/82472

Comment 29 mlammon 2016-09-20 14:44:13 UTC
Deployed 8 and upgraded to 9 latest without failure. There was not any sign off the database backup/restore so looks like we can move to verify now.

[root@overcloud-controller-0 ~]# ls -l /var/tmp/mysql_upgrade_osp/openstack_database.sql
ls: cannot access /var/tmp/mysql_upgrade_osp/openstack_database.sql: No such file or directory

Comment 31 errata-xmlrpc 2016-09-21 16:07:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1918.html


Note You need to log in before you can comment on or make changes to this bug.