Description of problem: A RHOS 10 with HA LVM backend was configured and volumes were created An upgrade procedure was implemented on the system according to upgrade team document : https://gitlab.cee.redhat.com/mcornea/osp10-13-ffu/blob/master/README.md while running the last step the installation failed while running the following command : openstack overcloud ffwd-upgrade converge + all needed parameters with the following error : 2018-07-16 11:27:19Z [overcloud-BlockStorageServiceChain-x5ja2twato4m]: UPDATE_FAILED (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-07-16 11:27:23Z [BlockStorageServiceChain]: UPDATE_FAILED resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-07-16 11:27:24Z [overcloud]: UPDATE_FAILED Resource UPDATE failed: resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-07-16 11:27:29Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt]: UPDATE_FAILED (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-07-16 11:27:30Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.46]: UPDATE_IN_PROGRESS state changed 2018-07-16 11:27:41Z [overcloud-ObjectStorageServiceChain-d23v7ultu7ii]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.55]: UPDATE_IN_PROGRESS state changed 2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.97]: UPDATE_IN_PROGRESS state changed Stack overcloud UPDATE_FAILED overcloud.ControllerServiceChain.ServiceChain: resource_type: OS::Heat::ResourceChain physical_resource_id: aea59dde-1ca0-48d9-be49-73f7f509712b status: UPDATE_FAILED status_reason: | resources.ServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) overcloud.BlockStorageServiceChain: resource_type: OS::TripleO::Services physical_resource_id: 6e19c9c0-13d0-451d-b0f2-d55b8c2860c4 status: UPDATE_FAILED status_reason: | resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) Heat Stack update failed. Heat Stack update failed. Version-Release number of selected component (if applicable): openstack-neutron-common-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-nova-common-17.0.3-0.20180420001141.el7ost.noarch openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch openstack-mistral-executor-6.0.2-1.el7ost.noarch openstack-neutron-openvswitch-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-heat-agents-1.5.4-0.20180308153305.ecf43c7.el7ost.noarch openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch openstack-nova-placement-api-17.0.3-0.20180420001141.el7ost.noarch openstack-ironic-api-10.1.2-4.el7ost.noarch openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch openstack-ironic-common-10.1.2-4.el7ost.noarch openstack-nova-conductor-17.0.3-0.20180420001141.el7ost.noarch openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch openstack-tripleo-ui-8.3.1-3.el7ost.noarch openstack-heat-common-10.0.1-0.20180411125640.el7ost.noarch python2-openstacksdk-0.11.3-1.el7ost.noarch openstack-nova-scheduler-17.0.3-0.20180420001141.el7ost.noarch openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch openstack-tripleo-common-containers-8.6.1-23.el7ost.noarch python-openstackclient-lang-3.14.1-1.el7ost.noarch openstack-nova-api-17.0.3-0.20180420001141.el7ost.noarch openstack-heat-api-10.0.1-0.20180411125640.el7ost.noarch openstack-tripleo-0.0.8-0.3.4de13b3git.el7ost.noarch openstack-neutron-ml2-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-tempest-18.0.0-2.el7ost.noarch openstack-selinux-0.8.14-13.el7ost.noarch openstack-puppet-modules-11.0.0-1.el7ost.noarch openstack-zaqar-6.0.1-1.el7ost.noarch openstack-nova-compute-17.0.3-0.20180420001141.el7ost.noarch openstack-ironic-conductor-10.1.2-4.el7ost.noarch openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch openstack-tripleo-common-8.6.1-23.el7ost.noarch openstack-cinder-12.0.1-0.20180418194614.c476898.el7ost.noarch openstack-mistral-engine-6.0.2-1.el7ost.noarch openstack-heat-engine-10.0.1-0.20180411125640.el7ost.noarch openstack-mistral-common-6.0.2-1.el7ost.noarch openstack-glance-16.0.1-2.el7ost.noarch openstack-mistral-api-6.0.2-1.el7ost.noarch openstack-heat-api-cfn-10.0.1-0.20180411125640.el7ost.noarch openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch openstack-tripleo-validations-8.4.1-5.el7ost.noarch openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch python2-openstackclient-3.14.1-1.el7ost.noarch How reproducible: Steps to Reproduce: 1.install rhos10 HA with LVM backend setup 2.Create volumes backups snapshots and other Cinder elements 3.Run upgrade according to document Actual results: Upgrade failed Expected results: Successful upgrade Additional info:
I did some poking around and the upgrade failure is unrelated to Cinder on the overcloud nodes. As Marius noted in an email thread, the issue is likely something (haproxy?) affecting SQL connectivity on the undercloud.
We had a conversion with Mike Bayer that pointed me to https://bugzilla.redhat.com/show_bug.cgi?id=1464146#c42 To see if that's the problem, do "show global status;" in mysql and look for "Aborted_connects". It sounds like a timeout when opening a connection to mysql. That timeout is 10 by default, we should probably increase it in the undercloud. I'll look at the suggestion to decrease the number of greenlets, but I don't think have that many of them.
@Thomas, is there a way to change things manually now to test increased timeouts?
You can set connect_timeout in the mysqld section to 60.