Description of problem:
A RHOS 10 with HA LVM backend was configured and volumes were created
An upgrade procedure was implemented on the system according to upgrade team document :
https://gitlab.cee.redhat.com/mcornea/osp10-13-ffu/blob/master/README.md
while running the last step the installation failed while running the following command :
openstack overcloud ffwd-upgrade converge + all needed parameters
with the following error :
2018-07-16 11:27:19Z [overcloud-BlockStorageServiceChain-x5ja2twato4m]: UPDATE_FAILED (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:23Z [BlockStorageServiceChain]: UPDATE_FAILED resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:24Z [overcloud]: UPDATE_FAILED Resource UPDATE failed: resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:29Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt]: UPDATE_FAILED (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:30Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.46]: UPDATE_IN_PROGRESS state changed
2018-07-16 11:27:41Z [overcloud-ObjectStorageServiceChain-d23v7ultu7ii]: UPDATE_COMPLETE Stack UPDATE completed successfully
2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.55]: UPDATE_IN_PROGRESS state changed
2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.97]: UPDATE_IN_PROGRESS state changed
Stack overcloud UPDATE_FAILED
overcloud.ControllerServiceChain.ServiceChain:
resource_type: OS::Heat::ResourceChain
physical_resource_id: aea59dde-1ca0-48d9-be49-73f7f509712b
status: UPDATE_FAILED
status_reason: |
resources.ServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
overcloud.BlockStorageServiceChain:
resource_type: OS::TripleO::Services
physical_resource_id: 6e19c9c0-13d0-451d-b0f2-d55b8c2860c4
status: UPDATE_FAILED
status_reason: |
resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
Heat Stack update failed.
Heat Stack update failed.
Version-Release number of selected component (if applicable):
openstack-neutron-common-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-nova-common-17.0.3-0.20180420001141.el7ost.noarch
openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch
openstack-mistral-executor-6.0.2-1.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-heat-agents-1.5.4-0.20180308153305.ecf43c7.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-nova-placement-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-api-10.1.2-4.el7ost.noarch
openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-ironic-common-10.1.2-4.el7ost.noarch
openstack-nova-conductor-17.0.3-0.20180420001141.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-tripleo-ui-8.3.1-3.el7ost.noarch
openstack-heat-common-10.0.1-0.20180411125640.el7ost.noarch
python2-openstacksdk-0.11.3-1.el7ost.noarch
openstack-nova-scheduler-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-23.el7ost.noarch
python-openstackclient-lang-3.14.1-1.el7ost.noarch
openstack-nova-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-heat-api-10.0.1-0.20180411125640.el7ost.noarch
openstack-tripleo-0.0.8-0.3.4de13b3git.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch
openstack-selinux-0.8.14-13.el7ost.noarch
openstack-puppet-modules-11.0.0-1.el7ost.noarch
openstack-zaqar-6.0.1-1.el7ost.noarch
openstack-nova-compute-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-conductor-10.1.2-4.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch
openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch
openstack-tripleo-common-8.6.1-23.el7ost.noarch
openstack-cinder-12.0.1-0.20180418194614.c476898.el7ost.noarch
openstack-mistral-engine-6.0.2-1.el7ost.noarch
openstack-heat-engine-10.0.1-0.20180411125640.el7ost.noarch
openstack-mistral-common-6.0.2-1.el7ost.noarch
openstack-glance-16.0.1-2.el7ost.noarch
openstack-mistral-api-6.0.2-1.el7ost.noarch
openstack-heat-api-cfn-10.0.1-0.20180411125640.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch
openstack-tripleo-validations-8.4.1-5.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch
python2-openstackclient-3.14.1-1.el7ost.noarch
How reproducible:
Steps to Reproduce:
1.install rhos10 HA with LVM backend setup
2.Create volumes backups snapshots and other Cinder elements
3.Run upgrade according to document
Actual results:
Upgrade failed
Expected results:
Successful upgrade
Additional info:
I did some poking around and the upgrade failure is unrelated to Cinder on the overcloud nodes. As Marius noted in an email thread, the issue is likely something (haproxy?) affecting SQL connectivity on the undercloud.
We had a conversion with Mike Bayer that pointed me to https://bugzilla.redhat.com/show_bug.cgi?id=1464146#c42
To see if that's the problem, do "show global status;" in mysql and look for "Aborted_connects".
It sounds like a timeout when opening a connection to mysql. That timeout is 10 by default, we should probably increase it in the undercloud.
I'll look at the suggestion to decrease the number of greenlets, but I don't think have that many of them.