1128285 – Rubygem-Staypuft: HA-neutron deployment fails - puppet agent run fails on keystone which reports OperationalError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None

Bug 1128285 - Rubygem-Staypuft: HA-neutron deployment fails - puppet agent run fails on keystone which reports OperationalError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None

Summary: Rubygem-Staypuft: HA-neutron deployment fails - puppet agent run fails on ke...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rubygem-staypuft
Sub Component:
Version:	5.0 (RHEL 7)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ga
Target Release:	Installer
Assignee:	Scott Seago
QA Contact:	Leonid Natapov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-08 19:34 UTC by Alexander Chuzhoy
Modified:	2014-08-21 18:08 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-08-21 18:08:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
mariadb.log where the error happens. (54.64 KB, text/x-log) 2014-08-08 19:34 UTC, Alexander Chuzhoy	no flags	Details
keystone.log where the error happens. (976.22 KB, text/x-log) 2014-08-08 19:35 UTC, Alexander Chuzhoy	no flags	Details
the messages log from the host where the error happens. (3.22 MB, text/plain) 2014-08-08 19:38 UTC, Alexander Chuzhoy	no flags	Details
mariadb.log where the error doesn't happen. (55.08 KB, text/x-log) 2014-08-08 19:47 UTC, Alexander Chuzhoy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1090	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory	2014-08-22 15:28:08 UTC

Description Alexander Chuzhoy 2014-08-08 19:34:17 UTC

Created attachment 925288 [details]
mariadb.log where the error happens.

Rubygem-Staypuft:  HA-neutron deployment fails - puppet agent run fails on keystone which reports OperationalError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None


Environment:
rhel-osp-installer-0.1.8-1.el6ost.noarch
openstack-foreman-installer-2.0.18-1.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-puppet-modules-2014.1-19.9.el6ost.noarch

Steps to reproduce:
1. Install rhel-osp-installer
2. Configure/run an HA neutron deployment (3 controllers +2 compute)

Result:
The deployment gets paused with error (after several hours of being stuck) on 60% installing the controllers.

Running puppet agent on one controller - I get the following:
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_role[_member_]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ role-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)                                                                                                                                                             
Error: /Stage[main]/Neutron::Keystone::Auth/Keystone_service[neutron]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ service-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)                                                                                                                                                       
Error: /Stage[main]/Ceilometer::Keystone::Auth/Keystone_service[ceilometer]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ service-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)                                                                                                                                                 
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[admin]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ tenant-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)
Error: /Stage[main]/Ceilometer::Keystone::Auth/Keystone_role[ResellerAdmin]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ role-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)
Error: /Stage[main]/Nova::Keystone::Auth/Keystone_service[nova_ec2]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ service-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)
Error: /Stage[main]/Cinder::Keystone::Auth/Keystone_service[cinderv2]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ service-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)
Error: /Stage[main]/Heat::Keystone::Auth/Keystone_service[heat]: Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ service-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)
Error: Could not prefetch keystone_endpoint provider 'keystone': Execution of '/usr/bin/keystone --os-endpoint http://10.8.29.214:35357/v2.0/ endpoint-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)



Checking the keystone log file:
2014-08-08 19:17:35.525 6649 TRACE keystone.common.wsgi OperationalError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None

Comment 1 Alexander Chuzhoy 2014-08-08 19:35:42 UTC

Created attachment 925289 [details]
keystone.log where the error happens.

Comment 2 Alexander Chuzhoy 2014-08-08 19:38:16 UTC

Created attachment 925290 [details]
the messages log from the host where the error happens.

Comment 3 Alexander Chuzhoy 2014-08-08 19:47:39 UTC

Created attachment 925292 [details]
mariadb.log where the error doesn't happen.

Comment 4 Crag Wolfe 2014-08-08 21:17:52 UTC

This looks suspicious in /var/log/messages:

Aug  8 14:56:29 maca25400702876 mysqld_safe: 140808 14:56:29 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.29hlr6' --pid-file='/var/lib/mysql/maca25400702876.example.com-recover.pid'
Aug  8 14:56:32 maca25400702876 mysqld_safe: 140808 14:56:32 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Aug  8 14:56:35 maca25400702876 rsyncd[16020]: rsyncd version 3.0.9 starting, listening on port 4444
Aug  8 14:56:35 maca25400702876 rsyncd[16034]: name lookup failed for 192.168.100.138: Name or service not known
Aug  8 14:56:35 maca25400702876 rsyncd[16034]: connect from UNKNOWN (192.168.100.138)
Aug  8 14:56:35 maca25400702876 rsyncd[16034]: rsync to rsync_sst/ from UNKNOWN (192.168.100.138)
Aug  8 14:56:35 maca25400702876 rsyncd[16034]: receiving file list
Aug  8 14:56:36 maca25400702876 rsyncd[16048]: name lookup failed for 192.168.100.138: Name or service not known
Aug  8 14:56:36 maca25400702876 rsyncd[16048]: connect from UNKNOWN (192.168.100.138)
Aug  8 14:56:36 maca25400702876 rsyncd[16034]: sent 72 bytes  received 18877038 bytes  total size 18874368
Aug  8 14:56:36 maca25400702876 rsyncd[16048]: rsync to rsync_sst-log_dir/ from UNKNOWN (192.168.100.138)
Aug  8 14:56:36 maca25400702876 rsyncd[16048]: receiving file list
Aug  8 14:56:37 maca25400702876 rsyncd[16048]: sent 73 bytes  received 10487256 bytes  total size 10485760
Aug  8 14:56:37 maca25400702876 rsyncd[16050]: name lookup failed for 192.168.100.138: Name or service not known
Aug  8 14:56:37 maca25400702876 rsyncd[16050]: connect from UNKNOWN (192.168.100.138)
Aug  8 14:56:37 maca25400702876 rsyncd[16050]: rsync to rsync_sst/./mysql from UNKNOWN (192.168.100.138)
Aug  8 14:56:37 maca25400702876 rsyncd[16050]: receiving file list

In my working setup, similar rsync messages (except that they succeed) reference the cluster_control_ip unlike the above.  So the question is, where is 192.168.100.138 coming from (either cluster_control_ip is set wrong, or galera or puppet is doing some extra (bad) inference)?  Based on messages attached, the cluster_control_ip *should* be one of 192.168.0.9, 192.168.0.10, 192.168.0.11.

Comment 5 Alexander Chuzhoy 2014-08-08 21:22:18 UTC

Reproduced with rhel-osp-installer-0.1.9-1.el6ost.noarch

Comment 6 Alexander Chuzhoy 2014-08-08 21:26:47 UTC

Instead of previous comment:
Reproduced with rhel-osp-installer-0.1.9-1.el6ost.noarch with HA-Nova deployment.

Comment 7 Crag Wolfe 2014-08-08 21:44:40 UTC

Further investigation with sseago and sasha on IRC revealed private_ip was not set in the same subnet as the pacemaker cluster members / cluster_control_ip, most likely the underlying cause to this bug.

Comment 8 Mike Burns 2014-08-11 12:28:40 UTC

Moving back to staypuft based on comment 7

Comment 9 Mike Burns 2014-08-12 20:03:07 UTC

related to the ip address changing randomly bug which is fixed in staypuft 0.2.5, please retest with that version.

Comment 11 Leonid Natapov 2014-08-18 10:01:58 UTC

Didn't reproduced on rhel-osp-installer-0.1.10-2.el6ost.noarch

Comment 12 errata-xmlrpc 2014-08-21 18:08:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1090.html

Note You need to log in before you can comment on or make changes to this bug.