1601449 – FFU: overcloud upgrade converge failed on "Lost connection to MySQL server during query"

Bug 1601449 - FFU: overcloud upgrade converge failed on "Lost connection to MySQL server during query"

Summary: FFU: overcloud upgrade converge failed on "Lost connection to MySQL server du...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	instack-undercloud
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z3
Target Release:	13.0 (Queens)
Assignee:	Thomas Hervé
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-16 12:25 UTC by Avi Avraham
Modified:	2018-10-15 11:18 UTC (History)
CC List:	10 users (show)
Fixed In Version:	instack-undercloud-8.4.3-5.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-15 11:18:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	586275	0	None	MERGED	Set connect_timeout in mysql	2021-01-14 00:39:15 UTC
OpenStack gerrit	589179	0	None	MERGED	Set connect_timeout in mysql	2021-01-14 00:39:15 UTC

Description Avi Avraham 2018-07-16 12:25:04 UTC

Description of problem:
A RHOS 10 with HA LVM backend was configured and volumes were created 
An upgrade procedure was implemented on the system according to upgrade team document : 
https://gitlab.cee.redhat.com/mcornea/osp10-13-ffu/blob/master/README.md
while running the last step the installation failed while running the following command :
openstack overcloud ffwd-upgrade converge + all needed parameters

with the following error :
2018-07-16 11:27:19Z [overcloud-BlockStorageServiceChain-x5ja2twato4m]: UPDATE_FAILED  (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:23Z [BlockStorageServiceChain]: UPDATE_FAILED  resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:24Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:29Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt]: UPDATE_FAILED  (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-07-16 11:27:30Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.46]: UPDATE_IN_PROGRESS  state changed
2018-07-16 11:27:41Z [overcloud-ObjectStorageServiceChain-d23v7ultu7ii]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.55]: UPDATE_IN_PROGRESS  state changed
2018-07-16 11:27:49Z [overcloud-ControllerServiceChain-w5sgmn47auuu-ServiceChain-dqdvn5fblzjt.97]: UPDATE_IN_PROGRESS  state changed

 Stack overcloud UPDATE_FAILED

overcloud.ControllerServiceChain.ServiceChain:
  resource_type: OS::Heat::ResourceChain
  physical_resource_id: aea59dde-1ca0-48d9-be49-73f7f509712b
  status: UPDATE_FAILED
  status_reason: |
    resources.ServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
overcloud.BlockStorageServiceChain:
  resource_type: OS::TripleO::Services
  physical_resource_id: 6e19c9c0-13d0-451d-b0f2-d55b8c2860c4
  status: UPDATE_FAILED
  status_reason: |
    resources.BlockStorageServiceChain: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
Heat Stack update failed.
Heat Stack update failed.

Version-Release number of selected component (if applicable):
openstack-neutron-common-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-nova-common-17.0.3-0.20180420001141.el7ost.noarch
openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch
openstack-mistral-executor-6.0.2-1.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-heat-agents-1.5.4-0.20180308153305.ecf43c7.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-nova-placement-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-api-10.1.2-4.el7ost.noarch
openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-ironic-common-10.1.2-4.el7ost.noarch
openstack-nova-conductor-17.0.3-0.20180420001141.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-tripleo-ui-8.3.1-3.el7ost.noarch
openstack-heat-common-10.0.1-0.20180411125640.el7ost.noarch
python2-openstacksdk-0.11.3-1.el7ost.noarch
openstack-nova-scheduler-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-23.el7ost.noarch
python-openstackclient-lang-3.14.1-1.el7ost.noarch
openstack-nova-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-heat-api-10.0.1-0.20180411125640.el7ost.noarch
openstack-tripleo-0.0.8-0.3.4de13b3git.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch
openstack-selinux-0.8.14-13.el7ost.noarch
openstack-puppet-modules-11.0.0-1.el7ost.noarch
openstack-zaqar-6.0.1-1.el7ost.noarch
openstack-nova-compute-17.0.3-0.20180420001141.el7ost.noarch
openstack-ironic-conductor-10.1.2-4.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch
openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch
openstack-tripleo-common-8.6.1-23.el7ost.noarch
openstack-cinder-12.0.1-0.20180418194614.c476898.el7ost.noarch
openstack-mistral-engine-6.0.2-1.el7ost.noarch
openstack-heat-engine-10.0.1-0.20180411125640.el7ost.noarch
openstack-mistral-common-6.0.2-1.el7ost.noarch
openstack-glance-16.0.1-2.el7ost.noarch
openstack-mistral-api-6.0.2-1.el7ost.noarch
openstack-heat-api-cfn-10.0.1-0.20180411125640.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch
openstack-tripleo-validations-8.4.1-5.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch
python2-openstackclient-3.14.1-1.el7ost.noarch



How reproducible:


Steps to Reproduce:
1.install rhos10 HA with LVM backend setup  
2.Create volumes backups snapshots and other Cinder elements 
3.Run upgrade according to document

Actual results:
Upgrade failed

Expected results:
Successful upgrade 

Additional info:

Comment 2 Alan Bishop 2018-07-16 18:13:48 UTC

I did some poking around and the upgrade failure is unrelated to Cinder on the overcloud nodes. As Marius noted in an email thread, the issue is likely something (haproxy?) affecting SQL connectivity on the undercloud.

Comment 5 Thomas Hervé 2018-07-26 13:46:57 UTC

We had a conversion with Mike Bayer that pointed me to https://bugzilla.redhat.com/show_bug.cgi?id=1464146#c42

To see if that's the problem, do "show global status;" in mysql and look for "Aborted_connects".

It sounds like a timeout when opening a connection to mysql. That timeout is 10 by default, we should probably increase it in the undercloud.

I'll look at the suggestion to decrease the number of greenlets, but I don't think have that many of them.

Comment 7 Amit Ugol 2018-07-30 08:33:51 UTC

@Thomas, is there a way to change things manually now to test increased timeouts?

Comment 8 Thomas Hervé 2018-07-30 08:39:10 UTC

You can set connect_timeout in the mysqld section to 60.

Note You need to log in before you can comment on or make changes to this bug.