Bug 1601449

Summary:	FFU: overcloud upgrade converge failed on "Lost connection to MySQL server during query"
Product:	Red Hat OpenStack	Reporter:	Avi Avraham <aavraham>
Component:	instack-undercloud	Assignee:	Thomas Hervé <therve>
Status:	CLOSED WONTFIX	QA Contact:	Amit Ugol <augol>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	13.0 (Queens)	CC:	abishop, aschultz, augol, chjones, jfrancoa, jschluet, mbayer, mburns, srevivo, therve
Target Milestone:	z3	Keywords:	Triaged, ZStream
Target Release:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	instack-undercloud-8.4.3-5.el7ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-15 11:18:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Avi Avraham 2018-07-16 12:25:04 UTC

Description of problem:
A RHOS 10 with HA LVM backend was configured and volumes were created 
An upgrade procedure was implemented on the system according to upgrade team document : 
https://gitlab.cee.redhat.com/mcornea/osp10-13-ffu/blob/master/README.md
while running the last step the installation failed while running the following command :
openstack overcloud ffwd-upgrade converge + all needed parameters

with the following error :
2018-07-16 11:27:19Z [overcloud-BlockStorageServiceChain-x5ja2twato4m]: UPDATE_FAILED  (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:23Z [BlockStorageServiceChain]: UPDATE_FAILED  resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:24Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:29Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt]: UPDATE_FAILED  (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:30Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.46]: UPDATE_IN_PROGRESS  state changed
2018-07-16 11:27:41Z [overcloud-ObjectStorageServiceChain-d23v7ultu7ii]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.55]: UPDATE_IN_PROGRESS  state changed
2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.97]: UPDATE_IN_PROGRESS  state changed

 Stack overcloud UPDATE_FAILED

overcloud.ControllerServiceChain.ServiceChain:
  resource_type: OS::Heat::ResourceChain
  physical_resource_id: aea59dde-1ca0-48d9-be49-73f7f509712b
  status: UPDATE_FAILED
  status_reason: |
    resources.ServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
overcloud.BlockStorageServiceChain:
  resource_type: OS::TripleO::Services
  physical_resource_id: 6e19c9c0-13d0-451d-b0f2-d55b8c2860c4
  status: UPDATE_FAILED
  status_reason: |
    resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
Heat Stack update failed.
Heat Stack update failed.

Version-Release number of selected component (if applicable):
openstack-neutron-common-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-nova-common-17.0.3-0.20180420001141.el7ost.noarch
openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch
openstack-mistral-executor-6.0.2-1.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-heat-agents-1.5.4-0.20180308153305.ecf43c7.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-nova-placement-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-api-10.1.2-4.el7ost.noarch
openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-ironic-common-10.1.2-4.el7ost.noarch
openstack-nova-conductor-17.0.3-0.20180420001141.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-tripleo-ui-8.3.1-3.el7ost.noarch
openstack-heat-common-10.0.1-0.20180411125640.el7ost.noarch
python2-openstacksdk-0.11.3-1.el7ost.noarch
openstack-nova-scheduler-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-23.el7ost.noarch
python-openstackclient-lang-3.14.1-1.el7ost.noarch
openstack-nova-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-heat-api-10.0.1-0.20180411125640.el7ost.noarch
openstack-tripleo-0.0.8-0.3.4de13b3git.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch
openstack-selinux-0.8.14-13.el7ost.noarch
openstack-puppet-modules-11.0.0-1.el7ost.noarch
openstack-zaqar-6.0.1-1.el7ost.noarch
openstack-nova-compute-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-conductor-10.1.2-4.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch
openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch
openstack-tripleo-common-8.6.1-23.el7ost.noarch
openstack-cinder-12.0.1-0.20180418194614.c476898.el7ost.noarch
openstack-mistral-engine-6.0.2-1.el7ost.noarch
openstack-heat-engine-10.0.1-0.20180411125640.el7ost.noarch
openstack-mistral-common-6.0.2-1.el7ost.noarch
openstack-glance-16.0.1-2.el7ost.noarch
openstack-mistral-api-6.0.2-1.el7ost.noarch
openstack-heat-api-cfn-10.0.1-0.20180411125640.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch
openstack-tripleo-validations-8.4.1-5.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch
python2-openstackclient-3.14.1-1.el7ost.noarch



How reproducible:


Steps to Reproduce:
1.install rhos10 HA with LVM backend setup  
2.Create volumes backups snapshots and other Cinder elements 
3.Run upgrade according to document

Actual results:
Upgrade failed

Expected results:
Successful upgrade 

Additional info:

Comment 2 Alan Bishop 2018-07-16 18:13:48 UTC

I did some poking around and the upgrade failure is unrelated to Cinder on the overcloud nodes. As Marius noted in an email thread, the issue is likely something (haproxy?) affecting SQL connectivity on the undercloud.

Comment 5 Thomas Hervé 2018-07-26 13:46:57 UTC

We had a conversion with Mike Bayer that pointed me to https://bugzilla.redhat.com/show_bug.cgi?id=1464146#c42

To see if that's the problem, do "show global status;" in mysql and look for "Aborted_connects".

It sounds like a timeout when opening a connection to mysql. That timeout is 10 by default, we should probably increase it in the undercloud.

I'll look at the suggestion to decrease the number of greenlets, but I don't think have that many of them.

Comment 7 Amit Ugol 2018-07-30 08:33:51 UTC

@Thomas, is there a way to change things manually now to test increased timeouts?

Comment 8 Thomas Hervé 2018-07-30 08:39:10 UTC

You can set connect_timeout in the mysqld section to 60.